# Definition

**A regular expression** is a special sequence of characters that forms a search pattern, it can be used to check if a string contains a specified pattern, and it can also be used to extract all occurrences of that pattern and much more.

# Applications

Regex are everywhere, from validating email addresses, passwords, date formats, to being used in search engines, so it is an essential skill for any developer, and most of programming languages provide regex capabilities. In this tutorial, we will be using re module in Python.

# Tutorial

In [3]:
import re #Perform regular expressions in Python

## (1) Experiment with patterns

In [6]:
string = "Let's write RegEx!"


In [6]:
PATTERN1 = r"\s+" #Pattern to find all spaces

In [8]:
re.findall(PATTERN1,string)

[' ', ' ']

In [9]:
PATTERN2 = r"\w+" #Pattern to find all words

In [10]:
re.findall(PATTERN2,string)

['Let', 's', 'write', 'RegEx']

In [11]:
PATTERN3 = r"[a-z]" #Pattern to find all small letters

In [12]:
 re.findall(PATTERN3, my_string)

['e', 't', 's', 'w', 'r', 'i', 't', 'e', 'e', 'g', 'x']

In [1]:
PATTERN4 = r"\w" #Pattern to find all letters in string

In [7]:
 re.findall(PATTERN4, string)

['L', 'e', 't', 's', 'w', 'r', 'i', 't', 'e', 'R', 'e', 'g', 'E', 'x']

## (2) Splitting and findall

In [8]:
my_string1="Let's write RegEx!  Won't that be fun?  I sure think so.  Can you find 4 sentences?  Or perhaps, all 19 words?"


In [9]:
# Write a pattern to match sentence endings: sentence_endings
sentence_endings = r"[.?!]"

# Split my_string on sentence endings and print the result
print(re.split(sentence_endings, my_string1))


["Let's write RegEx", "  Won't that be fun", '  I sure think so', '  Can you find 4 sentences', '  Or perhaps, all 19 words', '']


In [10]:
# Find all capitalized words in my_string and print the result
capitalized_words = r"[A-Z]\w+"
print(re.findall(capitalized_words, my_string1))

['Let', 'RegEx', 'Won', 'Can', 'Or']


In [11]:
# Split my_string on spaces and print the result
spaces = r"\s+"
print(re.split(spaces, my_string1))

["Let's", 'write', 'RegEx!', "Won't", 'that', 'be', 'fun?', 'I', 'sure', 'think', 'so.', 'Can', 'you', 'find', '4', 'sentences?', 'Or', 'perhaps,', 'all', '19', 'words?']


In [13]:
# Find all digits in my_string and print the result
digits = r"\d+"
print(re.findall(digits, my_string1))


['4', '19']


## (3) Matching strings

For demonstration on how to use **re.match()** function, say you want to validate user passwords. For instance, you want to make sure the password they enter is at least 8 characters length and contain at least a single digit.

In [14]:
# a regular expression for validating a password
match_regex = r"^(?=.*[0-9]).{8,}$"
# a list of example passwords
passwords = ["pwd", "password", "password1"]
for pwd in passwords:
    m = re.match(match_regex, pwd)
    print(f"Password: {pwd}, validate password strength: {bool(m)}")

Password: pwd, validate password strength: False
Password: password, validate password strength: False
Password: password1, validate password strength: True


**match_regex** is the regular expression responsible for validating the password criteria we mentioned earlier:

* ^: Start character.
* (?=.*[0-9]): Ensure string has at least a digit.
* .{8,}: Ensure string has at least 8 characters.
* $: End character.

## (4) Search method 

In [15]:
import re

# part of ipconfig output
example_text = """
Wireless LAN adapter Wi-Fi:
   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::380e:9710:5172:caee%2
   IPv4 Address. . . . . . . . . . . : 192.168.1.100
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.1.1
"""
# regex for IPv4 address
ip_address_regex = r"((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.(?!$)|$)){4}"
# use re.search() method to get the match object
match = re.search(ip_address_regex, example_text)
print(match)

<re.Match object; span=(280, 291), match='192.168.1.1'>


**re.search()** returns a match object which has the start and end indices of the string found and the actual string

## (5) Replacing matches

If you have experience on web scraping, you may be encountered with a website that uses a service like CloudFlare to hide email addresses from email harvester tools. In this section, we will do exactly that, given a string that has email addresses, we will replace each one of the addresses by a **'[email protected]'** token:

In [16]:
import re

# a basic regular expression for email matching
email_regex = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
# example text to test with
example_text = """
Subject: This is a text email!
From: John Doe <john@doe.com>
Some text here!
===============================
Subject: This is another email!
From: Abdou Rockikz <example@domain.com>
Some other text!
"""
# substitute any email found with [email protected]
print(re.sub(email_regex, "[email protected]", example_text))


Subject: This is a text email!
From: John Doe <[email protected]>
Some text here!
Subject: This is another email!
From: Abdou Rockikz <[email protected]>
Some other text!



We used **re.sub()** method which takes 3 arguments, the first is the regular expression (the pattern), the second is the replacement of all patterns found, the third is the target string

# References

1) Official documentation at:

[Doc](https://docs.python.org/3/library/re.html)

2) This tutorial at:

[Tutorial](https://www.thepythoncode.com/article/work-with-regular-expressions-in-python?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter)

