# Lesson 17: Regular Expressions
![Regular Expressions](https://files.realpython.com/media/Regular-Expressions-Regexes-in-Python-Part-1_Watermarked.0423050c5371.jpg)

Regular expressions (regex) allow us to find general pattern in text data. Like:
* email: `user@site.com`
* phone number: `+20 1012345678`
* URL: `www.site.com`

Find phone numbers: `(555) 555-5555`

Pattern:
* `(\d\d\d) \d\d\d-\d\d\d\d`
* Using quantifiers: `(\d{3}) \d{3}-\d{4}`

In [1]:
text = "The agent's phone number is 123 555-4567"

### Imports
`re` module

In [20]:
import re

Simple search for text

In [3]:
"phone" in text

True

## Using regular expressions for searching

In [6]:
pattern = "phone"
re.search(pattern,text)

<re.Match object; span=(12, 17), match='phone'>

See the result span:

In [7]:
text[12:17]

'phone'

See for non-existing pattern (result is None):

In [8]:
pattern = "not existing"
re.search(pattern, text)

Search with multiple results البحث عن نمط موجود أكثر من مرة:

In [10]:
text = "My phone once, my phone again"
pattern = "phone"
re.search(pattern,text)

<re.Match object; span=(3, 8), match='phone'>

Get all results and print number of matches:

In [15]:
result = re.findall(pattern,text)
print(f"{len(result)} matches: {result}")

2 matches: ['phone', 'phone']


Iterate over all results to find more info:

In [14]:
for match in re.finditer(pattern,text):
	print(match.group(), match.span())

phone (3, 8)
phone (18, 23)


## Regex Special Sequences

| Sequence | Description | Example Pattern | Example Match |
| - | - | - | - |
| `\d` | Digit | `file_\d\d` | file_25 |
| `\D` | Non-digit | `\D\D\D` | Abc |
| `\s` | Whitespace | `a\sb\sc` | a b c |
| `\S` | Non-whitespace | `\S\S\S\S` | Yoyo |
| `\w` | Alphanumeric (letter, number or underscore) | `\w-\w\w\w` | A-b_1 |
| `\W` | Non-alphanumeric | `\W\W\W\W\W` | *-+=) |

* Raw string: `r`

References:
* [Regex Special Sequences](https://docs.python.org/3/library/re.html#re-special-sequences)
* [Python RegEx Special Sequences](https://www.w3schools.com/python/gloss_python_regex_sequences.asp)
* [Python Regex Special Sequences and Character classes](https://pynative.com/python-regex-special-sequences-and-character-classes)

Search for any phone number (11 digits):

In [17]:
text = "My phone number is 01234567891"
result = re.search(r"\d\d\d\d\d\d\d\d\d\d\d",text)
print(result)

<re.Match object; span=(19, 30), match='01234567891'>


## Quantifiers
لتحديد عدد مرات تكرار نمط معين.

| Quantifier | Description | Example Pattern | Example Match |
| - | - | - | - |
| `+` | One or more times | `\d+ years` | 3 years<br/>12 years<br/>151 years<br/> |
| `*` | Zero or more times | `X*L` | L<br/>XL<br/>XXL |
| `?` | Zero or one time | `plurals?` | plural<br/>plurals |
| `{n}` | Exactly `n` time | `\w{3}` | ABC |
| `{min,max}` | `min` to `max` times | `\w{2,3}` | USA |
| `{min,}` | `min` or more times | `\w{3,}` | Ali |
| `{,max}` | Up to `max` times | `\w{,8}` | Password |

Exercise: Repeat the phone number exercise with quantifiers

In [20]:
text = "My phone number is 01234567891"
result = re.search(r"\d{11}",text)
print(result)

<re.Match object; span=(19, 30), match='01234567891'>


Exercise: Egyptian car plates

![لوحات السيارات](https://images.akhbarelyom.com/images/images/large/20240720034900406.jpg)

Extract the whole plate number:

In [5]:
text = "349 BDH"
result = re.search(r"(\d{,4}) (\w{2,4})",text)
result.group()

'349 BDH'

Extract match groups by position (1-based).

Example: Extract the plate characters and numbers separately.

In [8]:
print(result.group(1))
print(result.group(2))

349
BDH


## Or operator (`|`)

In [12]:
text = "I go to school by car"
result = re.search(r"car|taxi",text)
result

<re.Match object; span=(18, 21), match='car'>

## [findall](https://docs.python.org/3/library/re.html#re.findall)
Get all non-overlapping matches

In [14]:
re.findall(r"at","The cat in the hat sat there")

['at', 'at', 'at']

## [Special Characters](https://docs.python.org/3/library/re.html#regular-expression-syntax)
| Special Character | Meaning |
| - | - |
| `.` | Wildcard character, matches any character except a newline |
| `^` (Caret.) | Matches the start of the string, and in MULTILINE mode also matches immediately after each newline. |
| `$` | Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. |

In this text, find all `at` and 1 character before it:
> The cat in the hat sat there.

In [16]:
re.findall(r".at","The cat in the hat sat there.")

['cat', 'hat', 'sat']

In [23]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view. You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""

In [24]:
re.findall(r".at",text)

['hat', 'hat', 'mat', 'nat', 'mat', 'mat', 'hat']

### Match multiple charatcers
Match `ing` and 3 charatcers before it:

In [25]:
pattern = "...ing"
re.findall(pattern,text)

['tching', 'eading', 'eading', 'eading', 'eading']

Find all words (`\w`) that end with `at`:

In [27]:
re.findall(r"\w+ing",text)

['matching', 'heading', 'Reading', 'Reading', 'reading']

## [Flags](https://docs.python.org/3/library/re.html#flags)
| Flag | Meaning |
| - | - |
| `re.I`<br/>`re.IGNORECASE` | Case-insensitive matching |
| `re.M`<br/>`re.MULTILINE` | Makes `^` matche at the beginning of the string and at the beginning of each line; and the pattern character `$` matches at the end of the string and at the end of each line. |

Find all lines (`re.MULTILINE`) that start with `T`:

In [31]:
re.findall(r"^T\w+",text,re.MULTILINE)

['To', 'Themes']

Find last word of all sentences (ending with `.`):

In [35]:
re.findall(r"\w+\.$",text,re.MULTILINE)

['document.', 'galleries.', 'theme.', 'sign.', 'device.']

Find the first word after each full stop (`.`):

In [37]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view.You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""

for match in re.finditer(r"\.\s*(\w+)",text):
	print(match.group(1))

When
You
To
For
Click
Themes
When
When
Save
To
When
Reading
You
If


Find all numbers in a paragraph:

In [36]:
text = """Video provides a powerful way to help you prove your point 3.5 When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced 33, Word provides header 666 , footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated 666. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme 9900.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view.You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""
re.findall(r"\d+",text)

['3', '5', '33', '666', '666', '9900']

Find all emails in the following text:

In [None]:
text = """Contact us at contactus@gmail.com.
And my email is: ashhd2000@gmail.com.
Another email is: me@website.com.
False email is: Nuhhqkw(op)@yahoo.com.
Hind@moe.eg
menna@support.moe.gov.eg
Hosam@gmail.org"""

pattern = r"\w+@(\w+\.)+\w+"
for match in re.finditer(pattern,text):
	print(match.group())