<a href="https://colab.research.google.com/github/carloslme/automating-boring-stuff/blob/main/Chapter_7_Pattern_Matching_with_Regular_Expressions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Finding Patterns of Text Without Regular Expressions
Create a function that evaluate a phone number with the next format: 415-555-*1234*

In [None]:
def isPhoneNumber(text):
  
  # Checks if the text given is exactly 12 characters
  if len(text) != 12:
    return False

  # Checks if the first three numbers are only numeric characters
  for i in range(0,3):
    if not text[i].isdecimal():
      return False
    # Checks if the fourth character is '-'
    if text[3] != '-':
      return False 

  # Checks if the next three numbers are only numeric characters
  for i in range(4, 7):
    if not text[i].isdecimal():
      return False 
    # Checks if the eighth character is '-'
    if text[7] != '-': 
      return False 

  # Checks if the next three numbers are only numeric characters
  for i in range(8, 12):
    if not text[i].isdecimal():
      return False 
  return True

In [None]:
print(isPhoneNumber('415-555-4242'))
print(isPhoneNumber('Hello world!'))

True
False


In [None]:
# Find the pattern of text in a larger string
message = 'Call me at 415-555-1011 tomorrow. 411-555-9999 is my office.'
for i in range(len(message)):
  chunk = message[i:i+12]
  if isPhoneNumber(chunk):
    print('Phone number found: ' + chunk)
print('Done')

Phone number found: 415-555-1011
Phone number found: 411-555-9999
Done


##Finding Patterns of Text With Regular Expressions

### *Matching Regex Objects*

In [None]:
import re
# For this example, \d means any number 0-9
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') #r' means raw string

In [None]:
mo = phoneNumRegex.search('My number is 415-555-4242')

In [None]:
print('Phone number found: ' + mo.group())

Phone number found: 415-555-4242


### *Grouping with Parentheses*

In [None]:
import re

''' 
Adding () -> (\d\d\d) group the numbers that contains the expression
group() match object method to grab the matching text from just one group. 
'''

phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
mo.group(1)

'415'

In [None]:
mo.group(2)

'555-4242'

In [None]:
mo.group(0)

'415-555-4242'

In [None]:
mo.group()

'415-555-4242'

In [None]:
'''
If you would like to retrieve all the groups at once, use the groups() method.
'''
mo.groups()

('415', '555-4242')

In [None]:
areaCode, mainNumber = mo.groups()
print(areaCode)
print(mainNumber)

415
555-4242


In [None]:
''' 
To escape the ( and ) characters the next code can be added
'''

phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')

In [None]:
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')

In [None]:
mo.group(1)

'(415)'

In [None]:
mo.group(2)

'555-4242'

### *Matching Multiple Groups with the Pipe*
The | character is called a pipe . You can use it anywhere you want to match one of many expressions. For example, the regular expression r'Batman|Tina Fey' will match either 'Batman' or 'Tina Fey'

In [None]:
'''
When both Batman and Tina Fey occur in the searched string, the first occurrence of matching text will be returned as the Match object.
'''

In [1]:
import re
heroRegex = re.compile(r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey')
mo1.group()

'Batman'

In [2]:
mo2 = heroRegex.search('Tina Fey and Batman.')
mo2.group()

'Tina Fey'

In [6]:
''' 
Specifying one prefix 
'''
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')

# Return the full matched word
mo.group() 

'Batmobile'

In [7]:
# Return just the part of the matched text inside the first parentheses
mo.group(1)

'mobile'

### *Optional Matching with the Question Mark*
Sometimes there is a pattern that you want to match only optionally. That is, the regex should find a match whether or not that bit of text is there. The ? character flags the group that precedes it as an optional part of the pattern.

In [13]:
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The adventure of Batman')
mo1.group()

'Batman'

In [14]:
mo2 = batRegex.search('The adventures of Batwoman')
mo2.group()

'Batwoman'

In [15]:
'''
The (wo)? part of the regular expression means that the pattern wo is an optional group. The regex will match text that has zero instances or one instance of wo in it. This is why the regex matches both 'Batwoman' and 'Batman'
'''

"\nThe (wo)? part of the regular expression means that the pattern wo is an optional group. The regex will match text that has zero instances or one instance of wo in it. This is why the regex matches both 'Batwoman' and 'Batman'\n"

In [18]:
# Using it in the previous phone number examples
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')
mo1.group()

'415-555-4242'

In [20]:
mo2 = phoneRegex.search('My number is 649-5490')
mo2.group()

'649-5490'

### *Matching Zero or More with the Star*
The * (called the star or asterisk ) means “match zero or more”—the group that precedes the star can occur any number of times in the text. It can 

In [21]:
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('My name is Batman.')
mo1.group()

'Batman'

In [26]:
mo2 = batRegex.search('She is is Batwoman.')
mo2.group()

'Batwoman'

In [31]:
mo3 = batRegex.search('The Adventures of Batwowowowoman')
mo3.group()

'Batwowowowoman'