# Regular Expressions (regex)

## Cheatsheet

Work through the examples on the cheatsheet (https://github.com/ziishaned/learn-regex ) and https://www.regexr.com or https://regex101.com/.

### Main functions

There are 3 main regex functions: `match`, `search` and `findall` (multiple matches).     
    
The difference between `match` and `search` is that `match` matches on the beginning of the string (and `search` matches anywhere). 

Documentation: https://docs.python.org/3/library/re.html#module-contents

In [3]:
import re
print ( re.match("c", "abcdef")  )  # None
print ( re.search("c", "abcdef") )  # Matches c

None
<re.Match object; span=(2, 3), match='c'>


In [4]:
text = "He was carefully disguised but captured quickly by police."
re.findall(r"\w+ly", text)

['carefully', 'quickly']

### Compile

Regex can be 'compiled', this saves time if the regex is used multiple times.


```python
re.compile(pattern, flags = 0)
```
Compile a regular expression pattern into a regular expression object, which can be used for matching using its `match()`, `search()` and other methods, described below.

The sequence:
```python
prog = re.compile(pattern)
result = prog.match(string)
```

is equivalent to

```python
result = re.match(pattern, string)
```

The `findall()` function returns the non-overlapping matches of pattern in string as a list of strings (or list of tuples if there are multiple matching groups).


### Flags

Python regex flags:

    re.I (re.IGNORECASE): ignore case.    
    re.M (re.MULTILINE): make begin/end {^, $} consider each line.
    re.S (re.DOTALL): make '.' match newline too 

See [http://xahlee.info/python/python_regex_flags.html](http://xahlee.info/python/python_regex_flags.html) for the different flags you can use.

In [5]:
text = """
I
am
having
a
blast
today"""
re.findall(r"^\w", text, flags = re.MULTILINE)

['I', 'a', 'h', 'a', 'b', 't']

In [6]:
# .* doesn't match newline, so no result 
re.findall(r"I.*?blast", text)

[]

In [7]:
# .* matches anything, including newline
re.findall(r"I.*?blast", text, flags = re.DOTALL)

['I\nam\nhaving\na\nblast']

### Use raw string ('r')

In [8]:
print (r"As raw string \b shows up correctly with \ and b tab would be \t...")
print ("Without r: As raw string \b shows up correctly with \ and b and b tab would be \t...") # not correct

As raw string \b shows up correctly with \ and b tab would be \t...
Without r: As raw string  shows up correctly with \ and b and b tab would be 	...


## Capturing groups (retrieving something)

Captured groups are lists of tuples.

In [9]:
# without capturing
result = re.findall("\w+\s\d{3}", "The offices are as follows. Don 337, and Mike 335. Taylor 226.")  
print(type(result))
print(result)

<class 'list'>
['Don 337', 'Mike 335', 'Taylor 226']


In [10]:
# within parentheses () will be 'captured', here are 2 capturing groups
result = re.findall("(\w+)\s(\d{3})", "The offices are as follows. Don 337, and Mike 335. Taylor 226.")  
print(result)

[('Don', '337'), ('Mike', '335'), ('Taylor', '226')]


In [11]:
name, office = result[0]
print("name:", name, ", office:", office)

name: Don , office: 337


In [12]:
# capturing groups within a capturing group
result = re.findall("((\w+)\s(\d{3}))", "The offices are as follows. Don 337, and Mike 335. Taylor 226.")  
print(result)

[('Don 337', 'Don', '337'), ('Mike 335', 'Mike', '335'), ('Taylor 226', 'Taylor', '226')]


In [13]:
for r in result:
    print (  "Match: '{}', with person: {} and room number: {}".format(r[0], r[1], r[2])  )

Match: 'Don 337', with person: Don and room number: 337
Match: 'Mike 335', with person: Mike and room number: 335
Match: 'Taylor 226', with person: Taylor and room number: 226
