To use regex in Python, we must first import the re module. Let's look at how we can use regex to search for patterns.

In [1]:
import re

text = "Hello, my email is example@email.com"
match = re.search('\S+@\S+', text)
print(match.group())  # Output: example@email.com


example@email.com


Using re.search() vs. re.findall()
•	re.search(pattern, text) returns the first match as an object.
•	re.findall(pattern, text) returns all matches as a list.


In [2]:
text = "My lucky numbers are 19, 42, and 88."
numbers = re.findall('[0-9]+', text)
print(numbers)  # Output: ['19', '42', '88']


['19', '42', '88']


Greedy vs. Non-Greedy Matching (10 minutes)
One tricky part of regex is greedy vs. non-greedy matching.
Consider this example:


In [3]:
text = "From: Using the : character"
match = re.findall('^F.+:', text)
print(match)  # Output: ['From: Using the :']


['From: Using the :']


Why did it match everything instead of stopping at the first :? This is because + is greedy—it matches as much as possible.
To make it non-greedy, add a ?:


In [4]:
match = re.findall('^F.+?:', text)
print(match)  # Output: ['From:']


['From:']


Extracting Emails from Text
Let’s practice extracting email addresses from a block of text!


In [5]:
text = "Contact us at support@website.com or sales@shop.com."
emails = re.findall('\S+@\S+', text)
print(emails)  # Output: ['support@website.com', 'sales@shop.com']


['support@website.com', 'sales@shop.com.']


We can refine this further:

In [6]:
emails = re.findall('[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print(emails)  # Output: ['support@website.com', 'sales@shop.com']


['support@website.com', 'sales@shop.com']


Real-World Example: Extracting Domain Names (10 minutes)
If we only want the domain name (e.g., website.com), we can use parentheses to define the part we want to extract.


In [7]:
text = "From john.doe@gmail.com Sat Jan 5 09:14:16 2023"
match = re.findall('@([^ ]+)', text)
print(match)  # Output: ['gmail.com']


['gmail.com']


Now, let’s go one step further and extract only emails that start with 'From':

In [8]:
match = re.findall('^From .*@([^ ]+)', text)
print(match)  # Output: ['gmail.com']


['gmail.com']


Using Regex to Extract Numbers and Currency Values (10 minutes)
Let’s say we need to extract a dollar amount from a sentence:


In [9]:
text = "We just received $50.75 for cookies."
match = re.findall('\$[0-9.]+', text)
print(match)  # Output: ['$50.75']


['$50.75']
