### Basics of Regular Expression
**Special Characters:** . ^ $ \* + ? \{ \} \[ \] \ | \( \)  
**Rules:** 
- \d : matches digits only, equivalent to \[0-9\]
- \D : matches non-digits, equivalent to \[^0-9\]
- \s : matches any whitespace character, equivalent to \[\t\n\r\f\v\]
- \S : matches any non-whitespace character, equivalent to \[^\t\n\r\f\v\]
- \w : matches alphanumeric characters, equivalent to \[a-zA-Z0-9_\]
- \W : matches non-alphanumeric characters, equivalent to \[^a-zA-Z0-9_\]
- . : matches everything except new line character
- ca*t : matches ct, cat, caat, any number of a's, equivalent to ca\{0,\}t
- ca+t : matches cat, caat, one or more number of a's, equivalent to ca\{1,\}t
- ca?t : matches ct or cat, equivalent to ca\{0,1\}t
- ca\{2\}t : matches caat
- ca\{2,4\}t : matches caat, caaat, caaaat

### Using Regular Expressions
The `re` package contains functionality to work with regular expressions.

In [2]:
import re
p = re.compile('[a-z]+')
p

re.compile(r'[a-z]+', re.UNICODE)

It is suggested to use python's raw string when working with regular expressions because regular expressions themselves have backslash as special character.

### Performing Matches
The four main methods when it comes to performing matches are:
- match() : Determine if the RE matches at the beginning of the string.
- search() : Scan through a string, looking for any location where this RE matches.
- findall() : Find all substrings where the RE matches, and returns them as a list.
- finditer() : Find all substrings where the RE matches, and returns them as an iterator.

In [4]:
m = p.match('every word will be matched') # match and search return match object
m

<_sre.SRE_Match object; span=(0, 5), match='every'>

In [5]:
m2 = p.search('Determine if the RE matches at the beginning of the string')
print(m2.group())
print(m2.start(), m2.end())  # in case of match(), start() will always be 0
print(m2.span())

etermine
1 9
(1, 9)


In [6]:
p2 = re.compile(r'\d+')
p2.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')

['12', '11', '10']