# Python RegEx

---

__Step 1__ - `re.compile()` - returns a Regex pattern object or a regex object. e.g. `phoneRegex = re.compile('\d\d\d-\d\d\d-\d\d\d\d')`

__Step2__ - Search your string using the regex compiled above. e.g. `mo = phoneRegex.search("My phone number is 647-123-1234")`. This will return you a mo i.e. a matching object.

__Step3__ - display the string found, else mo will be `null`. `mo.group()`

Check the website : https://www.regexpal.com/ to simulate RegEx.

In [15]:
import re

phoneRegex = re.compile('\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneRegex.search("My phone number is 647-123-1234")
print(mo.group())

647-123-1234


## Grouping
You can create groups within the match you are trying to find as shown below. We are separating area code and the actual phone number into separate group.

In [23]:
phoneNumRegex = re.compile("(\d\d\d)-(\d\d\d-\d\d\d\d)")
mo = phoneNumRegex.search('My number is 415-555-4242.')
print(mo.group(1), mo.group(2))
print(mo.groups())

area_code, number = mo.groups()
print("Area code is : {} and number is {}".format(area_code, number))

415 555-4242
('415', '555-4242')
Area code is : 415 and number is 555-4242


- What if your phone number has paranthesis. Then you escape them using backslash as shown below.
- The `\(` and `\)` escape characters in the raw string passed to re.compile() will match actual parenthesis characters.

In [24]:
phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
print(mo.group(1))
print(mo.group(2))

(415)
555-4242


## Multiple Groups

`|` can be used to match one of many expressions. When both are found the first occurance will be returned.

In [31]:
heroRegex = re.compile("Batman |WonderWomen")
mo1 = heroRegex.search('Batman and WonderWomen')
mo2 = heroRegex.search('WonderWomen and Batman')
print(mo1.group())
print(mo2.group())

Batman 
WonderWomen


In [40]:
batRegex = re.compile('Bat(man|mobile|copter|bat)')  # this is matching Batman or Batmobile or Batcopter etc...
mo = batRegex.search('Batmobile lost a wheel')
print(mo.group())  # returns full matched word
print(mo.group(0)) # same as above
print(mo.group(1)) # returns the match inside 1st parenthesis

Batmobile
Batmobile
mobile


## Optionally

- `?` - Use `?` to match zero or one occurance. Here it denotes 0 or 1 occurance of `wo`.
- `*` - Use `*` to match zero or more occurances.

In [48]:
regex = re.compile("Bat(wo)?man")

mo1 = regex.search("Batman is cool")
print(mo1.group())

mo2 = regex.search("Batwoman is cool")
print(mo2.group())

mo2 = regex.search("Batwoman is cool")
print(mo2.group())

mo3 = regex.search("Batwowoman is cool")
print(mo3)

Batman
Batwoman
Batwoman
None


In [51]:
# Using * - Finds 0 ore more occurances

regex = re.compile("Bat(wo)*man")

mo1 = regex.search("Batman is cool")
print(mo1.group())

mo2 = regex.search("Batwoman is cool")
print(mo2.group())

mo2 = regex.search("Batwoman is cool")
print(mo2.group())

mo3 = regex.search("Batwowowowoman is cool")
print(mo3.group())

Batman
Batwoman
Batwoman
Batwowowowoman


## Specific occurances or at least 1 occurance

- `+` - One or more.
- `(Ha){3}` - Specific 3 occurances
- `(Ha){3,5}` - Min 3 and a max of 5
- `(Ha){3,}` - Min 3 and a max of infinity
- `(Ha){,5}` - Min 0 and a max of 5

In [54]:
batRegex = re.compile(r'Bat(wo)+man')  # at least 1 wo
batRegex = re.compile(r'Bat(wo){3}man')  # specifically 3 wo.

In [60]:
regex = re.compile('(Ha){3}')  # will find 3 HaHaHa

mo = regex.search("HaHaHaHa")
print(mo.group())

mo = regex.search("HaHa")
print(mo == None)

HaHaHa
True
