<a href="https://colab.research.google.com/github/RamziRBM/lab-py-regex/blob/main/lab-regex_in_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LAB | Regular Expressions (Regex) in Python

## Overview
This exercise notebook will help you practice using regular expressions in Python. Regular expressions are powerful tools for matching patterns in strings, which can be useful for validation, searching, and data manipulation.

## Instructions
- Complete each exercise by writing the appropriate regex pattern and Python code in the provided space.
- Test your code to ensure it works as expected.
<!-- - Use the hints provided if you get stuck. -->

### Exercise 1: Match Email Addresses
Write a regex pattern to match valid email addresses. An email address should contain an '@' symbol and a domain.

In [1]:
import re

# Example input
email = "example@example.com"
    # Regex pattern:
    # - enforces total length (<= 254 characters) and local part length (<= 64 characters)
    # - Local part: common characters A-Z, a-z, 0-9, and special characters .!#$%&'*+/=?^_`{|}~-
    # - Domain part: allows subdomains, alphanumeric characters, hyphens, and
    # - TLD: at least 2 alphabetic characters
# Your regex pattern here
pattern = r"""
^
(?=.{1,254}$)                 # whole email length 1..254
(?=.{1,64}@)                  # local part (before @) length 1..64

[A-Za-z0-9._%+-]+             # local part: common characters
@                             # at symbol

(?:                           # domain labels
  [A-Za-z0-9]                 # label must start alnum
  (?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?  # middle/end (no hyphen at ends)
  \.
)+
[A-Za-z]{2,}                  # TLD (>= 2 letters)
$
"""
# Compile with re.VERBOSE so comments/whitespace in the pattern are ignored
email_re = re.compile(pattern, re.VERBOSE | re.IGNORECASE)

# Test the regex
if email_re.fullmatch(email):
    print("Valid email")
else:
    print("Invalid email")

tests = [
    "name+tag@example.co.uk",
    "USER_1@example.com",
    "a@b.co",
    "bad@-host.com",
    "bad@host-.com",
    "bad@@example.com",
    "no-at-and-domain",
    "space in@domain.com"
]
for t in tests:
    print(t, "=>", bool(email_re.fullmatch(t)))


Valid email
name+tag@example.co.uk => True
USER_1@example.com => True
a@b.co => True
bad@-host.com => False
bad@host-.com => False
bad@@example.com => False
no-at-and-domain => False
space in@domain.com => False


### Exercise 2: Validate Phone Numbers
Create a regex pattern to validate phone numbers in the format (123) 456-7890 or 123-456-7890.

In [2]:
import re

# Example input
phone_number = "(123) 456-7890"

# Your regex pattern here
pattern = r"^(\(\d{3}\)\s\d{3}-\d{4}|\d{3}-\d{3}-\d{4})$"

# Test the regex
if re.match(pattern, phone_number):
    print("Valid phone number")
else:
    print("Invalid phone number")

# test cases
phone_number1 = "123-456-7890"
if re.match(pattern, phone_number1):
    print("Valid phone number")
else:
    print("Invalid phone number")

# test cases
phone_number2 = "1234567890"
if re.match(pattern, phone_number2):
    print("Valid phone number")
else:
    print("Invalid phone number")


Valid phone number
Valid phone number
Invalid phone number


### Exercise 3: Extract Dates
Write a regex pattern to extract dates in the format YYYY-MM-DD from a string.

In [3]:


import re

# Example input
text = "The event is scheduled for 2024-12-25."

# Your regex pattern here
pattern = r"\d{4}-\d{2}-\d{2}"

# Find all matches
dates = re.findall(pattern, text)
print(dates)

# Test with multiple dates
text1 = "The event is scheduled for 2024-12-25 and 2025-01-01."
text2 = "Dates: 1999-12-31, 2000-01-01, 2023-06-15"
text3 = "Invalid formats: 2024/12/25, 24-12-25, 2024-1-1"

pattern = r"\d{4}-\d{2}-\d{2}"

print("Text 1:", re.findall(pattern, text1))
print("Text 2:", re.findall(pattern, text2))
print("Text 3:", re.findall(pattern, text3))



['2024-12-25']
Text 1: ['2024-12-25', '2025-01-01']
Text 2: ['1999-12-31', '2000-01-01', '2023-06-15']
Text 3: []


### Exercise 4: Match URLs
Create a regex pattern to match URLs that start with http:// or https://.

In [4]:


import re

# Example input
url = "https://www.example.com"

# Your regex pattern here
pattern = r"^https?://"

# Test the regex
if re.match(pattern, url):
    print("Valid URL")
else:
    print("Invalid URL")


print("---------------")
# Additional test cases
test_urls = [
    "https://www.example.com",      # Valid
    "http://example.com",           # Valid
    "https://sub.domain.co.uk/path", # Valid
    "ftp://example.com",            # Invalid
    "www.example.com",              # Invalid
    "example.com",                  # Invalid
    "https://",                     # Invalid (incomplete)
]

pattern = r"^https?://"

for url in test_urls:
    if re.match(pattern, url):
        print("Valid URL")
    else:
        print("Invalid URL")



Valid URL
---------------
Valid URL
Valid URL
Valid URL
Invalid URL
Invalid URL
Invalid URL
Valid URL


### Exercise 5: Find Words Starting with a Specific Letter
Write a regex pattern to find all words starting with the letter 'a' in a given string.

In [5]:


import re

# Example input
text = "A quick brown fox jumps over a lazy dog."

# Your regex pattern here
# Using case-insensitive flag
pattern = r"\b[aA]\w*"

# Find all matches
words = re.findall(pattern, text, re.IGNORECASE)
print(words)



['A', 'a']


### Exercise 6: Match Hexadecimal Colors
Create a regex pattern to match hexadecimal color codes (e.g., #FFFFFF).

In [6]:


import re

# Example input
color_code = "#FFFFFF"

# Your regex pattern here
pattern = r"^#[0-9A-Fa-f]{6}$"

# Test the regex
if re.match(pattern, color_code, re.IGNORECASE):
    print("Valid hex color code")
else:
    print("Invalid hex color code")



print("---------------")
# test

test_colors = [
    "#FFFFFF",    # Valid
    "#ff0000",    # Valid
    "#abc",       # Valid (3-digit shorthand)
    "#123456",    # Valid
    "#GGGGGG",    # Invalid (G is not hex)
    "#12345",     # Invalid (5 digits)
    "#1234567",   # Invalid (7 digits)
    "FFFFFF",     # Invalid (no #)
    "#FF",        # Invalid (2 digits)
    "#ffg",       # Invalid (contains g)
]

pattern = r"^#([0-9A-Fa-f]{3}){1,2}$"

for color in test_colors:
    if re.match(pattern, color, re.IGNORECASE):
        print("Valid hex color code")
    else:
        print("Invalid hex color code")

# Compile with re.VERBOSE so comments/whitespace in the pattern are ignored



Valid hex color code
---------------
Valid hex color code
Valid hex color code
Valid hex color code
Valid hex color code
Invalid hex color code
Invalid hex color code
Invalid hex color code
Invalid hex color code
Invalid hex color code
Invalid hex color code


### Exercise 7: Validate Passwords
Write a regex pattern to validate passwords that must be at least 8 characters long and contain at least one uppercase letter, one lowercase letter, one digit, and one special character.

In [7]:


import re

# Example input
password = "Password123!"

# Your regex pattern here
pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]{};':\"\\|,.<>\/?]).{8,}$"

# Test the regex
if re.match(pattern, password):
    print("Valid password")
else:
    print("Invalid password")

print("---------------")
# tests
test_passwords = [
    "Password123!",    # Valid
    "Abc123!@#",       # Valid
    "Short1!",         # Invalid (too short)
    "password123!",    # Invalid (no uppercase)
    "PASSWORD123!",    # Invalid (no lowercase)
    "Password!!",      # Invalid (no digit)
    "Password123",     # Invalid (no special character)
    "Pass1!",          # Invalid (too short)
    "AbCdEfGh1@",      # Valid
]

pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]{};':\"\\|,.<>\/?]).{8,}$"

for pwd in test_passwords:
    if re.match(pattern, pwd):
        print("Valid password")
    else:
        print("Invalid password")




Valid password
---------------
Valid password
Valid password
Invalid password
Invalid password
Invalid password
Invalid password
Invalid password
Invalid password
Valid password


### Exercise 8: Remove Extra Spaces
Create a regex pattern that removes extra spaces from a string while keeping single spaces between words.

In [8]:


import re

# Example input
text = "This   is   an   example."

# Your regex pattern here
pattern = r" +"

# Replace extra spaces
cleaned_text = re.sub(pattern, " ", text)
print(cleaned_text)

print("---------------")
# tests
test_texts = [
    "This   is   an   example.",
    "Hello     world!    How are   you?",
    "Multiple     spaces     between     words.",
    "NoExtraSpacesHere",
    "  Leading  and  trailing  spaces  ",
    "Line\nbreaks\tand    spaces"
]

pattern = r" +"

for text in test_texts:
    cleaned = re.sub(pattern, " ", text)
    print(cleaned)



This is an example.
---------------
This is an example.
Hello world! How are you?
Multiple spaces between words.
NoExtraSpacesHere
 Leading and trailing spaces 
Line
breaks	and spaces


### Exercise 9: Match IP Addresses
Write a regex pattern to match valid IPv4 addresses.

In [9]:


import re

# Example input
ip_address = "192.168.1.1"

# Your regex pattern here
pattern = r"^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"

# Test the regex
if re.match(pattern, ip_address):
    print("Valid IP address")
else:
    print("Invalid IP address")

print("---------------")
# tests
test_ips = [
    "192.168.1.1",      # Valid
    "10.0.0.1",         # Valid
    "255.255.255.255",  # Valid
    "0.0.0.0",          # Valid
    "256.1.1.1",        # Invalid (256 > 255)
    "192.168.1",        # Invalid (only 3 octets)
    "192.168.1.1.1",    # Invalid (5 octets)
    "192.168.1.256",    # Invalid (256 > 255)
    "192.168.1.01",     # Valid (leading zero allowed)
    "192.168.1.-1",     # Invalid (negative)
    "abc.def.ghi.jkl",  # Invalid (non-numeric)
]

pattern = r"^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$"

for ip in test_ips:
    if re.match(pattern, ip):
        print("Valid IP address")
    else:
        print("Invalid IP address")



Valid IP address
---------------
Valid IP address
Valid IP address
Valid IP address
Valid IP address
Invalid IP address
Invalid IP address
Invalid IP address
Invalid IP address
Valid IP address
Invalid IP address
Invalid IP address


### Exercise 10: Extract Hashtags
Create a regex pattern to extract hashtags from a string.

In [10]:


import re

# Example input
text = "Here are some hashtags: #Python #Regex #Coding."

# Your regex pattern here
pattern = r"#\w+"

# Find all matches
hashtags = re.findall(pattern, text)
print(hashtags)

print("---------------")
# tests
test_texts = [
    "Here are some hashtags: #Python #Regex #Coding.",
    "Learning #MachineLearning and #AI in 2024! #Tech",
    "No hashtags in this text.",
    "Multiple ###hashtags #test123 #camelCase #under_scores",
    "Invalid#hashtag (no space) and #valid one",
    "Edge cases: #123 #a #",
    "Special chars: #python-org #c++ #c# (these won't match)"
]

pattern = r"#\w+"

for text in test_texts:
    hashtags = re.findall(pattern, text)
    print(hashtags)



['#Python', '#Regex', '#Coding']
---------------
['#Python', '#Regex', '#Coding']
['#MachineLearning', '#AI', '#Tech']
[]
['#hashtags', '#test123', '#camelCase', '#under_scores']
['#hashtag', '#valid']
['#123', '#a']
['#python', '#c', '#c']


## Bonus Exercises



### Bonus Exercise 1: Match All Digits
Write a regex pattern to match all digits in a given string.

In [11]:
import re

# Example input
text = "There are 2 apples and 3 oranges."

# Your regex pattern here
pattern = r"\d"

# Find all matches
digits = re.findall(pattern, text)
print(digits)

print("---------------")
# tests
test_texts = [
    "There are 2 apples and 3 oranges.",
    "Phone: 123-456-7890",
    "Year 2024 has 366 days",
    "Pi is approximately 3.14159",
    "No digits here!",
    "Multiple digits: 1, 22, 333, 4444",
    "Address: 123 Main Street, Apt 4B",
    "Temperature: -5°C and +25°C"
]

pattern = r"\d"

for text in test_texts:
    digits = re.findall(pattern, text)
    print(digits)

['2', '3']
---------------
['2', '3']
['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
['2', '0', '2', '4', '3', '6', '6']
['3', '1', '4', '1', '5', '9']
[]
['1', '2', '2', '3', '3', '3', '4', '4', '4', '4']
['1', '2', '3', '4']
['5', '2', '5']


### Bonus Exercise 2: Validate Credit Card Numbers  
Create a regex pattern to validate credit card numbers (16 digits).

In [12]:
import re

# Example input
credit_card_number = "1234-5678-9876-5432"

# Your regex pattern here
pattern = r"^(\d{4}[-\s]?){3}\d{4}$"

# Test the regex
if re.match(pattern, credit_card_number):
    print("Valid credit card number")
else:
    print("Invalid credit card number")

print("---------------")
# tests
test_cards = [
    "1234-5678-9876-5432",      # Valid
    "1234567898765432",         # Valid (no hyphens)
    "1234-5678-9876-543",       # Invalid (15 digits)
    "1234-5678-9876-54321",     # Invalid (17 digits)
    "1234-5678-9876-543a",      # Invalid (contains letter)
    "1234-5678-9876-543",       # Invalid (15 digits)
    "1234-5678-9876-5432-0000", # Invalid (20 digits)
    "1234 5678 9876 5432",      # Invalid (spaces instead of hyphens)
    "1234-5678-987-65432",      # Invalid (wrong grouping)
]

# Pattern that accepts with or without hyphens
pattern = r"^(\d{4}-?){3}\d{4}$"

for card in test_cards:
    if re.match(pattern, card):
        print("Valid credit card number")
    else:
        print("Invalid credit card number")

Valid credit card number
---------------
Valid credit card number
Valid credit card number
Invalid credit card number
Invalid credit card number
Invalid credit card number
Invalid credit card number
Invalid credit card number
Invalid credit card number
Invalid credit card number


### Bonus Exercise 3: Match Non-Alphanumeric Characters  
Write a regex pattern to match non-alphanumeric characters in a string.

In [13]:
import re

# Example input
text = "Hello! How are you? @Python3."

# Your regex pattern here
pattern = r"[^a-zA-Z0-9\s]"

# Find all matches
non_alphanumeric_chars = re.findall(pattern, text)
print(non_alphanumeric_chars)

print("---------------")
# tests
test_texts = [
    "Hello! How are you? @Python3.",
    "Email: user@example.com",
    "Price: $19.99 + tax = $21.45",
    "Special chars: #hashtag & symbol *asterisk",
    "Only alphanumeric and spaces",
    "Punctuation: , . ; : ! ? - _ ( ) [ ] { }",
    "Math symbols: + - * / = < >",
    "Quotes: 'single' and \"double\" quotes"
]

pattern = r"[^a-zA-Z0-9\s]"

for text in test_texts:
    chars = re.findall(pattern, text)
    print(non_alphanumeric_chars)

['!', '?', '@', '.']
---------------
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']
['!', '?', '@', '.']


### Bonus Exercise 4: Validate Date Format  
Create a regex pattern to validate dates in the format DD/MM/YYYY.

In [14]:


import re

# Example input
date_string = "25/12/2024"

# Your regex pattern here
pattern = r"^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[0-2])/\d{4}$"

# Test the regex
if re.match(pattern, date_string):
    print("Valid date format")
else:
    print("Invalid date format")

print("---------------")
# tests

test_dates = [
    "25/12/2024",      # Valid
    "01/01/2024",      # Valid
    "31/12/2024",      # Valid
    "15/06/2024",      # Valid
    "32/12/2024",      # Invalid (day > 31)
    "00/12/2024",      # Invalid (day = 0)
    "25/13/2024",      # Invalid (month > 12)
    "25/00/2024",      # Invalid (month = 0)
    "25/12/24",        # Invalid (year not 4 digits)
    "25/12/20245",     # Invalid (year 5 digits)
    "25-12-2024",      # Invalid (wrong separator)
    "25/12/2024/",     # Invalid (extra character)
    "5/12/2024",       # Invalid (day not 2 digits)
    "25/1/2024",       # Invalid (month not 2 digits)
]
pattern = r"^(0[1-9]|[12][0-9]|3[01])/(0[1-9]|1[0-2])/\d{4}$"

for date in test_dates:
    if re.match(pattern, date):
        print("Valid date format")
    else:
        print("Invalid date format")



Valid date format
---------------
Valid date format
Valid date format
Valid date format
Valid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format
Invalid date format


### Bonus Exercise 5: Extract Email Domains  
Write a regex pattern to extract domains from email addresses.

In [15]:


import re

# Example input
email_list = ["user@example.com", "admin@domain.org"]

# Your regex pattern here
pattern = r"@(.+)"

for email in email_list:
    domain = re.search(pattern, email)
    if domain:
        print(domain)  # Print extracted domain part.
    else:
        print(" No domain found")

print("---------------")
# tests
test_emails = [
    "user@example.com",
    "admin@domain.org",
    "name.lastname@company.co.uk",
    "user123@sub.domain.com",
    "test.email+tag@example.org",
    "invalid-email",
    "missing@domain",
    "@domain.com",
    "user@",
    "user@192.168.1.1"
]

pattern = r"@(.+)"

for email in test_emails:
    domain_match = re.search(pattern, email)
    if domain_match:
        print(domain)  # Print extracted domain part.
    else:
        print(" No domain found")



<re.Match object; span=(4, 16), match='@example.com'>
<re.Match object; span=(5, 16), match='@domain.org'>
---------------
<re.Match object; span=(5, 16), match='@domain.org'>
<re.Match object; span=(5, 16), match='@domain.org'>
<re.Match object; span=(5, 16), match='@domain.org'>
<re.Match object; span=(5, 16), match='@domain.org'>
<re.Match object; span=(5, 16), match='@domain.org'>
 No domain found
<re.Match object; span=(5, 16), match='@domain.org'>
<re.Match object; span=(5, 16), match='@domain.org'>
 No domain found
<re.Match object; span=(5, 16), match='@domain.org'>


### Exercise Completion  
Once you have completed all exercises:
- Review your solutions.
- Ensure your regular expressions and Python code are well-documented with comments explaining your logic.
- Save your notebook for submission or further review.

Happy coding! Enjoy practicing Regular Expressions in Python!