# Lesson 18: Regular Expressions - Part 2
![Regular Expressions](https://files.realpython.com/media/Regular-Expressions-Regexes-in-Python-Part-1_Watermarked.0423050c5371.jpg)

Regular expressions (regex) allow us to find general patterns in text data. Like:
* email: `user@site.com`
* phone number: `+20 1012345678`
* URL: `www.site.com`

Regular expressions are used for:
* **Data Mining**: Quickly extract emails, phone numbers, URLs, etc. from large text blocks.
* **Validation**: Validate user inputs like email addresses, passwords, dates, etc.
* **Text Processing**: Replace or reformat strings to match required formats (e.g., date reformatting).

### Imports
`re` module

In [2]:
import re

## Regex Special Sequences

| Sequence | Description | Example Pattern | Example Match |
| - | - | - | - |
| `\b` | Word boundary | `r'\bat\b'` |  Matches: 'at', 'at.', '(at)', and 'as at ay'<br/>Does not match: 'attempt' or 'atlas' |
| `\d` | Digit | `file_\d\d` | file_25 |
| `\D` | Non-digit | `\D\D\D` | Abc |
| `\s` | Whitespace | `a\sb\sc` | a b c |
| `\S` | Non-whitespace | `\S\S\S\S` | Yoyo |
| `\w` | Alphanumeric (letter, number or underscore) | `\w-\w\w\w` | A-b_1 |
| `\W` | Non-alphanumeric | `\W\W\W\W\W` | *-+=) |

* Raw string: `r`

References:
* [Regex Special Sequences](https://docs.python.org/3/library/re.html#re-special-sequences)
* [Python RegEx Special Sequences](https://www.w3schools.com/python/gloss_python_regex_sequences.asp)
* [Python Regex Special Sequences and Character classes](https://pynative.com/python-regex-special-sequences-and-character-classes)

## Quantifiers (Metacharacters)
لتحديد عدد مرات تكرار نمط معين.

| Quantifier | Description | Example Pattern | Example Match |
| - | - | - | - |
| `+` | One or more times | `\d+ years` | 3 years<br/>12 years<br/>151 years<br/> |
| `*` | Zero or more times | `X*L` | L<br/>XL<br/>XXL |
| `?` | Zero or one time | `plurals?` | plural<br/>plurals |
| `{n}` | Exactly `n` time | `\w{3}` | ABC |
| `{min,max}` | `min` to `max` times | `\w{2,3}` | USA |
| `{min,}` | `min` or more times | `\w{3,}` | Ali |
| `{,max}` | Up to `max` times | `\w{,8}` | Password |
| `[]` | Set of characters | `[aeiou]\w+` | Umbrella |

References:
* [Python RegEx](https://www.w3schools.com/python/python_regex.asp)

## [Special Characters](https://docs.python.org/3/library/re.html#regular-expression-syntax)
| Special Character | Meaning |
| - | - |
| `.` | Wildcard character, matches any character except a newline |
| `^` (Caret.) | Matches the start of the string, and in MULTILINE mode also matches immediately after each newline. |
| `$` | Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. |


## [Flags](https://docs.python.org/3/library/re.html#flags)
| Flag | Meaning |
| - | - |
| `re.I`<br/>`re.IGNORECASE` | Case-insensitive matching |
| `re.M`<br/>`re.MULTILINE` | Makes `^` matche at the beginning of the string and at the beginning of each line; and the pattern character `$` matches at the end of the string and at the end of each line. |

In [2]:
text = "11 is a number, 22 is another number. Total is 33"

## Examples
Find any number in the text:

In [3]:
re.findall(r"\d+",text)

['11', '22', '33']

Find any number at the beginning of text:

In [8]:
re.findall(r"^\d+",text)

['11']

Find any number at the end of text:

In [9]:
re.findall(r"\d+$",text)

['33']

## Sets
| Set | Description |
| - | - |
| `[aeiou]` | Any vowel character |
| `[0-5][0-9]` | Any two-digit number from `00` to `59` |

Example: Extract hour, minutes and seconds

In [3]:
text = "2 days ago, it was 08:12:00"
re.findall(r"[0-5][0-9]:[0-5][0-9]:[0-5][0-9]",text)

['08:12:00']

Find all words that start with a vowel charatcer (`aeiou`):

In [4]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view. You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""

In [14]:
re.findall(r"\b[aeiou]\w+",text)

['in',
 'embed',
 'add',
 'also',
 'online',
 'and',
 'each',
 'other',
 'example',
 'add',
 'and',
 'and',
 'elements',
 'and',
 'also',
 'and',
 'and',
 'apply',
 'in',
 'up',
 'in',
 'it',
 'and',
 'options',
 'appears',
 'it',
 'on',
 'add',
 'or',
 'and',
 'is',
 'easier',
 'in',
 'of',
 'and',
 'on',
 'end',
 'off',
 'even',
 'on',
 'another']

Find all hexadecimal numbers:

In [4]:
text = """While debugging the firmware, Ahmed noticed that the memory dump contained
suspicious values like DEADBEEF and CAFEBABE, which are often used as placeholders
or markers in low-level programming.
The configuration file also referenced color codes such as #FF5733 for alert states
and #00FF00 for success. During the checksum validation, he compared the expected
hash a3f9c1d2e4b6 with the actual output b7e2a9c4d1f0, revealing a subtle mismatch
caused by a corrupted byte at address 7F3A. These hexadecimal clues helped him trace
the issue back to a faulty module in the bootloader sequence."""

In [5]:
re.findall(r"\b[0-9a-fA-F]+\b",text)

['DEADBEEF',
 'CAFEBABE',
 'FF5733',
 '00FF00',
 'a3f9c1d2e4b6',
 'b7e2a9c4d1f0',
 'a',
 'a',
 '7F3A',
 'a']

Find all words that do NOT start with a vowel charatcer (`aeiou`):

In [6]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view. You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""

In [7]:
re.findall(r"\b[^aeiou\s]\w+",text)

['Video',
 'provides',
 'powerful',
 'way',
 'to',
 'help',
 'you',
 'prove',
 'your',
 'point',
 'When',
 'you',
 'click',
 'Online',
 'Video',
 'you',
 'can',
 'paste',
 'the',
 'code',
 'for',
 'the',
 'video',
 'you',
 'want',
 'to',
 'You',
 'can',
 'type',
 'keyword',
 'to',
 'search',
 'for',
 'the',
 'video',
 'that',
 'best',
 'fits',
 'your',
 'document',
 'To',
 'make',
 'your',
 'document',
 'look',
 'professionally',
 'produced',
 'Word',
 'provides',
 'header',
 'footer',
 'cover',
 'page',
 'text',
 'box',
 'designs',
 'that',
 'complement',
 'For',
 'you',
 'can',
 'matching',
 'cover',
 'page',
 'header',
 'sidebar',
 'Click',
 'Insert',
 'then',
 'choose',
 'the',
 'you',
 'want',
 'from',
 'the',
 'different',
 'galleries',
 'Themes',
 'styles',
 'help',
 'keep',
 'your',
 'document',
 'coordinated',
 'When',
 'you',
 'click',
 'Design',
 'choose',
 'new',
 'Theme',
 'the',
 'pictures',
 'charts',
 'SmartArt',
 'graphics',
 'change',
 'to',
 'match',
 'your',
 'new',

## `re.split()`
Returns a list where the string has been split at each matchز

Example: Split by all punctuation marks.

In [5]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view. You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""
re.split(r"[.,;:?!]",text)

['Video provides a powerful way to help you prove your point',
 ' When you click Online Video',
 ' you can paste in the embed code for the video you want to add',
 ' You can also type a keyword to search online for the video that best fits your document',
 '\nTo make your document look professionally produced',
 ' Word provides header',
 ' footer',
 ' cover page',
 ' and text box designs that complement each other',
 ' For example',
 ' you can add a matching cover page',
 ' header',
 ' and sidebar',
 ' Click Insert and then choose the elements you want from the different galleries',
 '\nThemes and styles also help keep your document coordinated',
 ' When you click Design and choose a new Theme',
 ' the pictures',
 ' charts',
 ' and SmartArt graphics change to match your new theme',
 ' When you apply styles',
 ' your headings change to match the new theme',
 '\nSave time in Word with new buttons that show up where you need them',
 ' To change the way a picture fits in your document',
 '

Normal string splitting:

In [8]:
text.split(".")

['Video provides a powerful way to help you prove your point',
 ' When you click Online Video, you can paste in the embed code for the video you want to add',
 ' You can also type a keyword to search online for the video that best fits your document',
 '\nTo make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other',
 ' For example, you can add a matching cover page, header, and sidebar',
 ' Click Insert and then choose the elements you want from the different galleries',
 '\nThemes and styles also help keep your document coordinated',
 ' When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme',
 ' When you apply styles, your headings change to match the new theme',
 '\nSave time in Word with new buttons that show up where you need them',
 ' To change the way a picture fits in your document, click it and a button for layout options appears ne

Regular expressions splitting:

Example: Remove all punctuation marks

In [9]:
re.split(r"[.,;:?!]",text)

['Video provides a powerful way to help you prove your point',
 ' When you click Online Video',
 ' you can paste in the embed code for the video you want to add',
 ' You can also type a keyword to search online for the video that best fits your document',
 '\nTo make your document look professionally produced',
 ' Word provides header',
 ' footer',
 ' cover page',
 ' and text box designs that complement each other',
 ' For example',
 ' you can add a matching cover page',
 ' header',
 ' and sidebar',
 ' Click Insert and then choose the elements you want from the different galleries',
 '\nThemes and styles also help keep your document coordinated',
 ' When you click Design and choose a new Theme',
 ' the pictures',
 ' charts',
 ' and SmartArt graphics change to match your new theme',
 ' When you apply styles',
 ' your headings change to match the new theme',
 '\nSave time in Word with new buttons that show up where you need them',
 ' To change the way a picture fits in your document',
 '

## Exclusion
Exclude digits in the following text:
> There are 3 numbers 34 inside 5 this sentence

In [None]:
text = "There are 3 numbers 34 inside 5 this sentence"
re.findall(r"[^\d]+",text)

['There are ', ' numbers ', ' inside ', ' this sentence']

Exclude punctuation from the following text text:
> This is a string! But it has punctuation. How can we remove it?

In [18]:
text = "This is a string! But it has punctuation. How can we remove it?"
" ".join( re.findall(r"[^.!?,; ]+",text) )

'This is a string But it has punctuation How can we remove it'

Find the hyphen-words in this text. But you do not know how long-ish they are:
> Find the hyphen-words in this text. But you do not know how long-ish they are:

In [20]:
text = "Find the hyphen-words in this text. But you do not know how long-ish they are:"
re.findall(r"\w+-\w+",text)

['hyphen-words', 'long-ish']

Example: Find all hashtags in a tweet (case-insensitive)

In [11]:
text = """Just wrapped up an epic coding session! # this is not a hashtag.
Learned so much about regex, file handling, and Git workflows.
Feeling unstoppable! #CodingLife #python-rocks #GitGud #100_days_of_coding
#regexMaster #dev_diaries #TechTalk #100DaysOfCode #AIandMe
#debuggingChronicles #CodeNewbie #camelCaseIsCool #letsBuild #OpenSourceVibes"""

In [12]:
re.findall(r"#\w+",text)

['#CodingLife',
 '#python',
 '#GitGud',
 '#100_days_of_coding',
 '#regexMaster',
 '#dev_diaries',
 '#TechTalk',
 '#100DaysOfCode',
 '#AIandMe',
 '#debuggingChronicles',
 '#CodeNewbie',
 '#camelCaseIsCool',
 '#letsBuild',
 '#OpenSourceVibes']

Find all CAPS words in text:

In [13]:
text = """The sun had barely risen when the team gathered at the base camp,
their breath visible in the crisp morning air.
Everyone was READY, determined to conquer the summit before noon.
The leader shouted, "GEAR UP!" and within minutes, backpacks were strapped,
boots tightened, and spirits HIGH. As they ascended, the wind grew stronger,
whispering secrets of the mountain's ancient past.
PLEASE DON'T do this.
"KEEP MOVING!" echoed through the valley, a mantra that pushed them forward.
By midday, they reached the peak, faces flushed with triumph,
shouting "WE MADE IT!" into the vast, echoing silence."""

In [14]:
re.findall(r"\b[A-Z]+\b",text)

['READY',
 'GEAR',
 'UP',
 'HIGH',
 'PLEASE',
 'DON',
 'T',
 'KEEP',
 'MOVING',
 'WE',
 'MADE',
 'IT']

## Using `()` for grouping
Find same word in different patterns

In [15]:
text = """Would you like some catfish?
Do you want to take a catnap?
Did you see this caterpillar?"""
re.findall(r"cat(fish|nap|erpillar)",text)

['fish', 'nap', 'erpillar']

Find all lines that start with `The`, `That`, or `This` and end with `Spain` or `Spanish`:
> The rain in Spain.<br/>
> This man is Spanish.<br/>
> Because is was born in Spain.<br/>
> But this is not any Spanish thing.

In [10]:
text = """The rain in Spain.
This man is Spanish.
Because is was born in Spain
But this is not any Spanish thing."""

for match in re.finditer(r"^(The|That|This).*(Spain|Spanish)\.?$", text,re.MULTILINE):
	print(match.group())

The rain in Spain.
This man is Spanish.


Find full phrases in ALL CAPS:

In [16]:
text = """The sun had barely risen when the team gathered at the base camp,
their breath visible in the crisp morning air.
Everyone was READY, determined to conquer the summit before noon.
The leader shouted, "GEAR UP!" and within minutes, backpacks were strapped,
boots tightened, and spirits HIGH. As they ascended, the wind grew stronger,
whispering secrets of the mountain's ancient past.
PLEASE DON'T do this.
"KEEP MOVING!" echoed through the valley, a mantra that pushed them forward.
By midday, they reached the peak, faces flushed with triumph,
shouting "WE MADE IT!" into the vast, echoing silence."""

In [17]:
re.findall(r"\b[A-Z]+[\sA-Z']*[A-Z]+\b",text)

['READY', 'GEAR UP', 'HIGH', "PLEASE DON'T", 'KEEP MOVING', 'WE MADE IT']

## Using named groups
`(?P<name>...)`

Example: Extract year, month, and day

In [18]:
text = """Birth date is 2005-09-05. Today's date is 2025-09-25."""

In [19]:
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

for match in re.finditer(pattern, text):
	print(match.group("year"), match.group("month"), match.group("day"))

2005 09 05
2025 09 25


## Replacing text
Use `re.sub(pattern, repl, string)`

Remove all vowels from text:

In [20]:
text = """Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document.
To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme.
Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign.
Reading is easier, too, in the new Reading view. You can collapse parts of the document and focus on the text you want. If you need to stop reading before you reach the end, Word remembers where you left off - even on another device."""

In [21]:
re.sub(r"\b[aeiou]\w*\b","",text,flags=re.IGNORECASE)

'Video provides  powerful way to help you prove your point. When you click  Video, you can paste  the  code for the video you want to . You can  type  keyword to search  for the video that best fits your document.\nTo make your document look professionally produced, Word provides header, footer, cover page,  text box designs that complement  . For , you can   matching cover page, header,  sidebar. Click   then choose the  you want from the different galleries.\nThemes  styles  help keep your document coordinated. When you click Design  choose  new Theme, the pictures, charts,  SmartArt graphics change to match your new theme. When you  styles, your headings change to match the new theme.\nSave time  Word with new buttons that show  where you need them. To change the way  picture fits  your document, click    button for layout   next to . When you work   table, click where you want to   row   column,  then click the plus sign.\nReading  , too,  the new Reading view. You can collapse par

## Extracting URLs

References:
* [15 Examples for Text Processing using Regex - Example 1: Extracting URLs](https://medium.com/@monicanogueras/15-examples-for-advanced-text-processing-using-regex-48223adc720d)

## Regular Expressions Testing Tools
* [Regular Expressions 101](https://regex101.com)
* [pythex](https://pythex.org)

## Regular Expressions Examples
* [Community Pattern Library](https://regex101.com/library) - Regular Expressions 101