**Regular Expressions in Python**

In [0]:
import re
text = 'Awesome, I am doing coding challenges in Python'

In [2]:
text.startswith('Awesome')

True

In [3]:
text.endswith('challenge')

False

In [4]:
'coding' in text.lower()

True

In [5]:
text.replace('Awesome', 'Great')

'Great, I am doing coding challenges in Python'

**search vs match**

The main methods you want to know about are search and match, former matches a substring, latter matches the string from beginning to end. I always embed my regex in r'' to avoid having to escape special characters like \d (digit), \w (char), \s (space), \S (non-space), etc (I think \\d and \\s clutters up the regex)

In [0]:
text = 'Awesome, I am doing coding challenges in Python'

In [7]:
re.search(r'I am', text)

<_sre.SRE_Match object; span=(9, 13), match='I am'>

In [0]:
re.match(r'I am', text)

In [10]:
re.match(r'Awesome.*challenges', text)

<_sre.SRE_Match object; span=(0, 37), match='Awesome, I am doing coding challenges'>

**Capturing strings**

A common task is to retrieve a match, you can use capturing () parenthesis for that:

In [3]:
hundred = 'Awesome, I am doing the 100 code challenges'
two_hundred = 'Awesome, I am doing the 200 code challenges'

m = re.match(r'.*( \d+ code).*', hundred)
m.groups()[0]

' 100 code'

In [4]:
m = re.search(r'( \d+ code)', two_hundred)
m.groups()[0]

' 200 code'

In [1]:
movies = '''1. Citizen Kane (1941)
2. The Godfather (1972)
3. Casablanca (1942)
4. Raging Bull (1980)
5. Singin' in the Rain (1952)
6. Gone with the Wind (1939)
7. Lawrence of Arabia (1962)
8. Schindler's List (1993)
9. Vertigo (1958)
10. The Wizard of Oz (1939)'''.split('\n')
movies

['1. Citizen Kane (1941)',
 '2. The Godfather (1972)',
 '3. Casablanca (1942)',
 '4. Raging Bull (1980)',
 "5. Singin' in the Rain (1952)",
 '6. Gone with the Wind (1939)',
 '7. Lawrence of Arabia (1962)',
 "8. Schindler's List (1993)",
 '9. Vertigo (1958)',
 '10. The Wizard of Oz (1939)']

Let's find movie titles that have exactly 2 words

In [0]:
pat = re.compile(r'''
                  ^             # start of string
                  \d+           # one or more digits
                  \.            # a literal dot
                  \s+           # one or more spaces
                  (?:           # non-capturing parenthesis, so I don't want store this match in groups()
                  [A-Za-z']+\s  # character class (note inclusion of ' for "Schindler's"), followed by a space
                  )             # closing of non-capturing parenthesis
                  {2}           # exactly 2 of the previously grouped subpattern
                  \(            # literal opening parenthesis
                  \d{4}         # exactly 4 digits (year)
                  \)            # literal closing parenthesis
                  $             # end of string
                  ''', re.VERBOSE)

In [5]:
for movie in movies:
    print(movie, pat.match(movie))

1. Citizen Kane (1941) <_sre.SRE_Match object; span=(0, 22), match='1. Citizen Kane (1941)'>
2. The Godfather (1972) <_sre.SRE_Match object; span=(0, 23), match='2. The Godfather (1972)'>
3. Casablanca (1942) None
4. Raging Bull (1980) <_sre.SRE_Match object; span=(0, 21), match='4. Raging Bull (1980)'>
5. Singin' in the Rain (1952) None
6. Gone with the Wind (1939) None
7. Lawrence of Arabia (1962) None
8. Schindler's List (1993) <_sre.SRE_Match object; span=(0, 26), match="8. Schindler's List (1993)">
9. Vertigo (1958) None
10. The Wizard of Oz (1939) None
