## Python Regular Expressions (Regex) - A Teaching Guide
Regular Expressions (Regex) in Python are patterns used to match, search, and manipulate text. Python provides the re module to work with regex.

In [2]:
# 1️. Importing the re Module
import re

<img src="tut_py_8_regex.jpg" width="600" height="300">

In [6]:

# 1. \d - Any digit (0-9)
print(re.findall(r"\d", "My age is 25"))  # Output: ['2', '5']

# 2. \D - Non-digit characters
print(re.findall(r"\D", "Room 101!"))  # Output: ['R', 'o', 'o', 'm', ' ', '!']

# 3. \w - Word characters (letters, digits, underscore)
print(re.findall(r"\w", "Hello_World! 123"))  # Output: ['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd', '1', '2', '3']

# 4. \W - Non-word characters (symbols, spaces)
print(re.findall(r"\W", "Hello_World! 123"))  # Output: [' ', '!']

# 5. \s - Whitespace characters (space, tab, newline)
print(re.findall(r"\s", "Hello World\tNew Line\n"))  # Output: [' ', '\t', '\n']

# 6. \S - Non-whitespace characters
print(re.findall(r"\S", "Hello World"))  # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']

# 7. ^ - Matches the start of a string
print(re.search(r"^Hello", "Hello World"))  # Output: Match object if found

# 8. $ - Matches the end of a string
print(re.search(r"World$", "Hello World"))  # Output: Match object if found

# 9. . - Matches any character except newline
print(re.findall(r"a.c", "abc adc aac axc"))  # Output: ['abc', 'adc', 'aac', 'axc']

# 10. * - 0 or more occurrences
print(re.findall(r"ba*", "b ba baa baaa"))  # Output: ['b', 'ba', 'baa', 'baaa']

# 11. + - 1 or more occurrences
print(re.findall(r"ba+", "b ba baa baaa"))  # Output: ['ba', 'baa', 'baaa']

# 12. ? - 0 or 1 occurrence
print(re.findall(r"colou?r", "color colour colouur"))  # Output: ['color', 'colour']

# 13. {n} - Exactly n occurrences
print(re.findall(r"\d{3}", "123 45 6789 101"))  # Output: ['123', '678', '101']

# 14. {n,} - At least n occurrences
print(re.findall(r"\d{2,}", "1 12 123 1234"))  # Output: ['12', '123', '1234']

# 15. {n,m} - Between n and m occurrences
print(re.findall(r"\d{2,4}", "1 12 123 1234 12345"))  # Output: ['12', '123', '1234']

# 16. [] - Character set (matches any character inside the brackets)
print(re.findall(r"[aeiou]", "hello world"))  # Output: ['e', 'o', 'o']

# 17. [^] - Negated set (matches any character NOT inside the brackets)
print(re.findall(r"[^aeiou]", "hello world"))  # Output: ['h', 'l', 'l', ' ', 'w', 'r', 'l', 'd']



['2', '5']
['R', 'o', 'o', 'm', ' ', '!']
['H', 'e', 'l', 'l', 'o', '_', 'W', 'o', 'r', 'l', 'd', '1', '2', '3']
['!', ' ']
[' ', '\t', ' ', '\n']
['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(6, 11), match='World'>
['abc', 'adc', 'aac', 'axc']
['b', 'ba', 'baa', 'baaa']
['ba', 'baa', 'baaa']
['color', 'colour']
['123', '678', '101']
['12', '123', '1234']
['12', '123', '1234', '1234']
['e', 'o', 'o']
['h', 'l', 'l', ' ', 'w', 'r', 'l', 'd']


###  Basic Regex Functions in Python
Function:	Description

re.search():	Finds the first match of the pattern.
    
re.match():	Checks if the pattern matches from the start.

re.findall():	Returns all matches as a list.

re.finditer():	Returns an iterator of match objects.

re.sub():	Replaces a match with another string.

re.split():	Splits a string based on a pattern.


In [3]:
# 3. Example - Finding Email Addresses

text = "Contact us at support@example.com or sales@company.org"

pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"

emails = re.findall(pattern, text)

print(emails)  
# Output: ['support@example.com', 'sales@company.org']


['support@example.com', 'sales@company.org']


## Explanation of above code

\b → Word boundary (ensures we match full emails)

[A-Za-z0-9._%+-]+ → Matches email username (letters, numbers, dots, etc.)

@ → Matches the @ symbol

[A-Za-z0-9.-]+ → Matches the domain name

\. → Matches the dot . before the domain extension

[A-Z|a-z]{2,} → Matches top-level domain (.com, .org, etc.)

In [4]:
# 4️ Example - Extracting Phone Numbers
text = "Call me at (123) 456-7890 or 987-654-3210"

pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"

matches = re.findall(pattern, text)

print(matches)
# Output: ['(123) 456-7890', '987-654-3210']


['(123) 456-7890', '987-654-3210']


## Explanation of above code

\(?\d{3}\)? → Matches an optional area code (123)

[-.\s]? → Matches an optional separator (-, ., or space)

\d{3}[-.\s]?\d{4} → Matches the phone number

In [None]:
# 5. Example - Checking If a String Is a Valid Date
import re

pattern = r"\b\d{2}/\d{2}/\d{4}\b"

text1 = "Today's date is 25/02/2025"
text2 = "Invalid date: 5/2/2025"

print(bool(re.search(pattern, text1)))  # True (Valid date)
print(bool(re.search(pattern, text2)))  # False (Invalid format)


## Explanation of above code

\b → Ensures full-word match

\d{2}/\d{2}/\d{4} → Matches dates in DD/MM/YYYY format

In [None]:
# 6️. Example - Validating Password Strength
password = "Strong@123"

pattern = r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$"

if re.match(pattern, password):
    print("Valid Password")
else:
    print("Weak Password")


## Explanation of above code

(?=.*[A-Z]) → At least one uppercase letter

(?=.*[a-z]) → At least one lowercase letter

(?=.*\d) → At least one digit

(?=.*[@$!%*?&]) → At least one special character

[A-Za-z\d@$!%*?&]{8,} → Minimum 8 characters long

In [5]:
# 7️. Example - Replacing Text (re.sub)
text = "I love Python and JavaScript"

#  re.sub(pattern, replacement, text) replaces all occurrences of a pattern.
# Replace 'Python' with 'Java'
new_text = re.sub(r"Python", "Java", text) 

print(new_text)
# Output: "I love Java and JavaScript"


I love Java and JavaScript


In [None]:
# 8️. Example - Splitting a String (re.split)
text = "apple, banana; cherry|grape"

# Split using multiple delimiters (, ; |)
words = re.split(r"[,;|]", text) # The regex [,;|] splits text using ,, ;, or |.

print(words)
# Output: ['apple', ' banana', ' cherry', 'grape']
