# Patterns

**Finding Patterns of Text Without Regular Expressions**

---
A function to check whether a string matches a pattern, returning True or False


In [None]:
def isPhoneNumber(text):
  if len(text) != 12:
    return False
  for i in range(0, 3):
    if not text[i].isdecimal():
      return False
  if text[3] != '-':
    return False
  for i in range(4, 7):
    if not text[i].isdecimal():
      return False
  if text[7] != '-':
    return False
    for i in range(8, 12):
      if not text[i].isdecimal():
        return False
  return True

message = 'Call me at 415-555-1011 tomorrow. 415-555-999 is my office'
for i in range(len(message)):
  chunk = message[i:i+12]
  if isPhoneNumber(chunk):
    print('Phone number found: ' + chunk)

print('Done')

**Regex Objects**

In [None]:
import re
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d{4}')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

By putting an *r* before the first quote of the string value, you can mark the string as a raw string, which does not escape characters.

---
If .search() doesn't find any coincidence, then mo object is None


**Grouping with parentheses**

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My number is 415-555-4242.')
mo.group(1)

In [None]:
mo.group(2)

In [None]:
mo.group(0)

In [None]:
mo.group()

In [None]:
mo.groups()

In [None]:
areaCode, mainNumber = mo.groups()
print(areaCode)
print(mainNumber)

**Matching Multiple Patterns with Pipe**

In [None]:
heroRegex = re.compile(r'Batman|Tina Fey')
mo = heroRegex.search('Batman and Tina Fey')
mo.group()

In [None]:
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a wheel')
mo.group()

In [None]:
mo.group(1)

**Question Matching with Question Mark**

In [None]:
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()

In [None]:
mo2 = batRegex.search('The Adventures of Batwoman')
mo2.group()

Using the earlier phone number example, you can make the regex look for phone numbers that do or do not have an area code

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)?(\d\d\d-\d\d\d\d)')
mo1 = phoneNumRegex.search('My number is 415-555-4242.')
mo1.group()

In [None]:
mo2 = phoneNumRegex.search('My number is 555-4242.')
mo2.group()

**Matching Zero or More with the Star**

In [None]:
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')
mo1.group()

In [None]:
mo3 = batRegex.search('The Adventures of Batwowowowoman')
mo3.group()

In [None]:
batRegex = re.compile(r'Bat(wo)+man')
mo4 = batRegex.search('The Adventures of Batman')
print(mo4 == None)

**Greedy and Nongreedy Matching**

In [None]:
greedyHaRegex = re.compile(r'(Ha){3,5}')
mo1 = greedyHaRegex.search('HaHaHaHaHa')
mo1.group()

Notice it is 'HaHaHaHaHa' even if it could also be 'HaHaHa' or 'HaHaHaHa'

In [None]:
nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
mo2.group()

**findall() Method**

In [None]:
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')   # has no groups
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')   # has groups
phoneNumRegex.findall('Cell: 415-555-9999 Work: 212-555-0000')

In [None]:
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')   # has groups
print(phoneNumRegex.findall('No Phone'))

**Character Class Example**

In [None]:
xmasRegex = re.compile(r'\d+\s\w+')
xmasRegex.findall('12 drummers, 11 pipers, 10 lords, 9 ladies,'
                   '8 maids, 7 swans, 6 geese, 5 rings, 4 birds'
                  '3 hens, 2, doves, 1 partridge')

**Making your own character classes**

In [None]:
consonantRegex=re.compile(r'[^aeiouAEIOU]')
consonantRegex.findall('RoboCop eats baby food. BABY FOOD')

Instead of matching every vowel, we are matching every character that isn't a vowel.

**The Wildcard Character**

In [None]:
atRegex=re.compile(r'.at')
atRegex.findall('The cat in the hat sat on the flat mat. Somewhat!')

The dot character will mtch just one character, which is why the match for the text flat matched only lat.

**Matching Everything with Dot-Star**

In [None]:
nameRegex=re.compile(r'First Name: (.*) Last Name: (.*)')
mo=nameRegex.search('First Name: Pam Last Name: Russell_75+')
mo.group(1)

In [None]:
mo.group(2)

**Case-Insensitive Matching**

In [None]:
robocop=re.compile(r'robocop',re.I)        # Or re.IGNORECASE
robocop.search('RoboCop is part man and part machine').group()

In [None]:
robocop.search('ROBOCOP portects the innocent').group()

In [None]:
robocop.search('It is robocop?').group()

**Substituting Strings**

In [None]:
namesRegex=re.compile(r'Agent \w+')
namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')

In [None]:
agentNamesRegex=re.compile(r'Agent (\w)\w*')
agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent Eve'
                      'knew Agent Bob was a double agent.')