In [1]:
text = "The person's phone number is 408-555-1234. Call soon!"

In [2]:
'phone' in text

True

But let's show the format for regular expressions, because later on we will be searching for patterns that won't have such a simple solution.

In [3]:
# importing regular expression library
import re

In [7]:
pattern = 'phone' # this is the one i am looking for

re.search(pattern, text) # pattern here refers the keyword i assigned the word i am looking for and text is the one i am looking in

<re.Match object; span=(13, 18), match='phone'>

In [5]:
pattern = 'not found' # there is no such word in the text so it will result nothing

re.search(pattern, text)

In [8]:
pattern = 'phone'
match = re.search(pattern, text)  #assigned it to by variable match

match.span() # results the index number of the word

(13, 18)

In [12]:
text2 = 'my phones one, my phone two'

In [13]:
match = re.search('phone', text2) # here it'll find the only  the first one

In [14]:
match

<re.Match object; span=(3, 8), match='phone'>

In [15]:
# to find all the searches
matches = re.findall('phone', text2)

In [16]:
matches

['phone', 'phone']

In [17]:
# i can do like 
len(matches)

2

In [18]:
# if i want to use for loop to iterate all i can use finditer
for match in re.finditer('phone', text2):
    print (match.span())

(3, 8)
(18, 23)


Characters such as a digit or a single string have different codes that represent them. You can use these to build up a pattern string. Notice how these make heavy use of the backwards slash \ . Because of this when defining a pattern string for regular expression we use the format:

r'mypattern'

placing the r in front of the string allows python to understand that the \ in the pattern string are not meant to be escape slashes.

Below you can find a table of all the possible identifiers:

<style>
  table {
    border-collapse: collapse;
    width: 100%;
  }

  th, td {
    border: 0px solid #dddddd;
    text-align: left;
    padding: 8px;
  }

  tr:nth-child(even) {
    background-color: #666666; /* Light blue color */
  }
</style>

| Character      | Description           | Example Pattern Code | Example Match |
| -------------- | --------------------- | -------------------- | ------------- |
| `\d`           | A digit               | `file_\d\d`          | file_25       |
| `\w`           | Alphanumeric          | `\w-\w\w\w`          | A-b_1         |
| `\s`           | White space           | `a\sb\sc`            | a b c         |
| `\D`           | A non-digit           | `\D\D\D`             | ABC           |
| `\W`           | Non-alphanumeric      | `\W\W\W\W\W`         | *-+=)         |
| `\S`           | Non-whitespace        | `\S\S\S\S`           | Yoyo          |


For example:

In [None]:
text = "My telephone number is 408-555-1234"

In [19]:
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d',text) # u can find with pattern '\d\d\d-\d\d\d-\d\d\d\d'

In [25]:
phone.group() # and then grouping them how ever it has only one so consider as finding the result

'408-555-1234'

Notice the repetition of \d. That is a bit of an annoyance, especially if we are looking for very long strings of numbers. Let's explore the possible quantifiers.

### Quantifiers
Now that we know the special character designations, we can use them along with quantifiers to define how many we expect.

<style>
  table {
    border-collapse: collapse;
    width: 100%;
  }

  th, td {
    border: 0px solid #dddddd;
    text-align: left;
    padding: 8px;
  }

  tr:nth-child(even) {
    background-color: #666666; /* Light blue color */
  }
</style>

| Character | Description             | Example Pattern Code | Example Match    |
|-----------|-------------------------|----------------------|------------------|
| `+`       | Occurs one or more times | `Version \w-\w+`    | Version A-b1_1  |
| `{3}`     | Occurs exactly 3 times   | `\D{3}`             | abc              |
| `{2,4}`   | Occurs 2 to 4 times      | `\d{2,4}`           | 123              |
| `{3,}`    | Occurs 3 or more          | `\w{3,}`            | anycharacters    |
| `\*`      | Occurs zero or more times | `A\*B\*C*`          | AAACC            |
| `?`       | Once or none             | `plurals?`          | plural           |

Let's rewrite our pattern using these quantifiers:

In [22]:
re.search(r'\d{3}-\d{3}-\d{4}',text)

(29, 41)

### Groups

What if we wanted to do two tasks, find phone numbers, but also be able to quickly extract their area code (the first three digits). We can use groups for any general task that involves grouping together regular expressions (so that we can later break them down).

Using the phone number example, we can separate groups of regular expressions using parenthesis:

In [26]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [27]:
results = re.search(phone_pattern,text)

In [28]:
# The entire result
results.group()

'408-555-1234'

In [29]:
# Can then also call by group position.
# remember groups were separated by parenthesis ()
# Something to note is that group ordering starts at 1. Passing in 0 returns everything
results.group(1)

'408'

In [30]:
results.group(2)

'555'

## Additional Regex Syntax

Or operator |

Use the pipe operator to have an or statment. For example

In [31]:
re.search(r"man|woman","This man was here.")

<re.Match object; span=(5, 8), match='man'>

In [32]:
re.search(r'dog|cat', 'there was a cat')

<re.Match object; span=(12, 15), match='cat'>

In [33]:
re.findall(r'.at', 'the cat with a hat sat there.') # . shows all before or after the given pattern. consider it like a wildcard

['cat', 'hat', 'sat']

In [34]:
re.findall(r'...at', 'the cat with the hat went splat') # the number of periods(.) show how many characters should go

['e cat', 'e hat', 'splat']

We can use the ^ to signal starts with, and the $ to signal ends with:

In [35]:
# Ends with a number
re.findall(r'\d$','This ends with a number 2') # if number is between or anywhere except end, its not goinf=g to work

['2']

In [36]:
# Starts with a number
re.findall(r'^\d','1 is the loneliest number.')

['1']

Note that this is for the entire string, not individual words!

### Exclusion
To exclude characters, we can use the ^ symbol in conjunction with a set of brackets []. Anything inside the brackets is excluded. For example:

In [37]:
phrase = "there are 3 numbers 34 inside 5 this sentence."

In [38]:
re.findall(r'[^\d]',phrase)

['t',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'a',
 'r',
 'e',
 ' ',
 ' ',
 'n',
 'u',
 'm',
 'b',
 'e',
 'r',
 's',
 ' ',
 ' ',
 'i',
 'n',
 's',
 'i',
 'd',
 'e',
 ' ',
 ' ',
 't',
 'h',
 'i',
 's',
 ' ',
 's',
 'e',
 'n',
 't',
 'e',
 'n',
 'c',
 'e',
 '.']

In [39]:
re.findall(r'[^\d]+',phrase) # remember + sign tells us Occurs one or more times

['there are ', ' numbers ', ' inside ', ' this sentence.']

In [40]:
test_phrase = 'This is a string! But it has punctuation. How can we remove it?'

In [42]:
re.findall(r'[^!.?]+', test_phrase)

['This is a string', ' But it has punctuation', ' How can we remove it']

In [43]:
clean = re.findall(r'[^!.?]+', test_phrase)

In [44]:
' '.join(clean)

'This is a string  But it has punctuation  How can we remove it'

### Brackets for Grouping
As we showed above we can use brackets to group together options, for example if we wanted to find hyphenated words:

In [45]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

In [46]:
re.findall(r'[\w]+-[\w]+',text)

['hypen-words', 'long-ish']

### Parenthesis for Multiple Options
If we have multiple options for matching, we can use parenthesis to list out these options. For Example:


In [47]:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw'
text = 'Hello, would you like some catfish?'
texttwo = "Hello, would you like to take a catnap?"
textthree = "Hello, have you seen this caterpillar?"

In [49]:
re.search(r'cat(fish|nap|claw)', text)

<re.Match object; span=(27, 34), match='catfish'>

In [50]:
re.search(r'cat(fish|nap|claw)', texttwo)

<re.Match object; span=(32, 38), match='catnap'>

In [51]:
re.search(r'cat(fish|nap|claw)', textthree) # nothing in here

For more [https://docs.python.org/3/howto/regex.html](https://docs.python.org/3/howto/regex.html)