# Lesson 17: Regular Expressions
Regular expressions (regex) allow us to find general pattern in text data. Like:
* email: `user@site.com`
* phone number: `+20 1012345678`
* URL: `www.site.com`

Find phone numbers: `(555) 555-5555`

Pattern:
* `(\d\d\d) \d\d\d-\d\d\d\d`
* Using quantifiers: `(\d{3}) \d{3}-\d{4}`

In [1]:
text = "The agent's phone number is 123 555-4567"

### Imports
`re` module

In [5]:
import re

Simple search for text

In [3]:
"phone" in text

True

## Using regular expressions for searching

In [6]:
pattern = "phone"
re.search(pattern,text)

<re.Match object; span=(12, 17), match='phone'>

See the result span:

In [7]:
text[12:17]

'phone'

See for non-existing pattern (result is None):

In [8]:
pattern = "not existing"
re.search(pattern, text)

Search with multiple results البحث عن نمط موجود أكثر من مرة:

In [10]:
text = "My phone once, my phone again"
pattern = "phone"
re.search(pattern,text)

<re.Match object; span=(3, 8), match='phone'>

Get all results and print number of matches:

In [15]:
result = re.findall(pattern,text)
print(f"{len(result)} matches: {result}")

2 matches: ['phone', 'phone']


Iterate over all results to find more info:

In [14]:
for match in re.finditer(pattern,text):
	print(match.group(), match.span())

phone (3, 8)
phone (18, 23)


## Regex Special Sequences

| Sequence | Description | Example Pattern | Example Match |
| - | - | - | - |
| `\d` | Digit | `file_\d\d` | file_25 |
| `\D` | Non-digit | `\D\D\D` | Abc |
| `\s` | Whitespace | `a\sb\sc` | a b c |
| `\S` | Non-whitespace | `\S\S\S\S` | Yoyo |
| `\w` | Alphanumeric (letter, number or underscore) | `\w-\w\w\w` | A-b_1 |
| `\W` | Non-alphanumeric | `\W\W\W\W\W` | *-+=) |

* Raw string: `r`

References:
* [Regex Special Sequences](https://docs.python.org/3/library/re.html#re-special-sequences)
* [Python RegEx Special Sequences](https://www.w3schools.com/python/gloss_python_regex_sequences.asp)
* [Python Regex Special Sequences and Character classes](https://pynative.com/python-regex-special-sequences-and-character-classes)

Search for any phone number (11 digits):

In [17]:
text = "My phone number is 01234567891"
result = re.search(r"\d\d\d\d\d\d\d\d\d\d\d",text)
print(result)

<re.Match object; span=(19, 30), match='01234567891'>


## Quantifiers
| Quantifier | Description | Example Pattern | Example Match |
| - | - | - | - |
| `+` | One or more times | `\d+ years` | 125 years |
| `*` | Zero or more times | `X*L` | L |
