# Regular Expressions

Regular Expressions (regex) are powerful tools used for pattern matching in strings. Python provides a built-in module, `re`, for working with regular expressions.

## 1. Introduction

Regular expressions are sequences of characters that define a search pattern. They are commonly used for string matching and manipulation. 

## 2. Basic Syntax

- `.`: Matches any character except a newline.
- `^`: Matches the start of the string.
- `$`: Matches the end of the string.
- `*`: Matches 0 or more repetitions of the preceding element.
- `+`: Matches 1 or more repetitions of the preceding element.
- `?`: Matches 0 or 1 repetition of the preceding element.
- `{n}`: Matches exactly n repetitions of the preceding element.
- `{n,}`: Matches n or more repetitions of the preceding element.
- `{n,m}`: Matches between n and m repetitions of the preceding element.
- `[]`: Matches any one of the characters inside the brackets.
- `|`: Matches either the expression before or the expression after the `|`.

## 3. Using the `re` Module

In [8]:
import re

# Example: Matching a pattern in a string
pattern = r'\d+'  # Matches one or more digits
string = 'There are 123 apples'

match = re.search(pattern, string)
if match:
    print(match.group())

123


## 4. Special Characters

- `\d`: Matches any digit (equivalent to `[0-9]`).
- `\D`: Matches any non-digit.
- `\w`: Matches any alphanumeric character (equivalent to `[a-zA-Z0-9_]`).
- `\W`: Matches any non-alphanumeric character.
- `\s`: Matches any whitespace character.
- `\S`: Matches any non-whitespace character.

## 5. Character Classes

Character classes allow you to define a set of characters to match.

In [9]:
pattern = r'[aeiou]'  # Matches any vowel
string = 'Hello World'
matches = re.findall(pattern, string)
print(matches)

['e', 'o', 'o']


## 6. Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

In [10]:
pattern = r'\d{2,4}'  # Matches between 2 and 4 digits
string = '123 4567 89 34323 8'
matches = re.findall(pattern, string)
print(matches)

['123', '4567', '89', '3432']


## 7. Groups and Capturing

Groups allow you to capture parts of the match.

In [11]:
pattern = r'(\d{3})-(\d{2})-(\d{4})'
string = 'My number is 123-45-6789'
match = re.search(pattern, string)
if match:
    print(match.group(1))
    print(match.group(2))
    print(match.group(3))

123
45
6789


## 8. Lookahead and Lookbehind Assertions

Lookahead and lookbehind assertions allow you to match a pattern only if it's followed or preceded by another pattern.

In [12]:
# Lookahead
pattern = r'\d+(?= apples)'
string = 'There are 123 apples and 45 oranges'
matches = re.findall(pattern, string)
print(matches)

# Lookbehind
pattern = r'(?<=\$)\d+'
string = 'The price is $100'
matches = re.findall(pattern, string)
print(matches)

['123']
['100']


## 9. Flags

Flags modify the behavior of the pattern matching.

- `re.IGNORECASE` or `re.I`: Ignore case.
- `re.MULTILINE` or `re.M`: Multi-line matching.
- `re.DOTALL` or `re.S`: Allows `.` to match newline characters.
- `re.VERBOSE` or `re.X`: Allow verbose regex, which can be split into multiple lines with comments.

In [13]:
pattern = r'hello'
string = 'Hello world'
match = re.search(pattern, string, re.IGNORECASE)
if match:
    print(match.group())

Hello


## 10. Useful Functions

- `re.match()`: Determine if the regex matches at the start of the string.
- `re.search()`: Scan through a string, looking for any location where the regex matches.
- `re.findall()`: Find all substrings where the regex matches.
- `re.finditer()`: Return an iterator yielding match objects.
- `re.sub()`: Substitute occurrences of the pattern with a replacement string.
- `re.split()`: Split the string by occurrences of the pattern.

In [14]:
# Example: Using re.sub to replace text
pattern = r'apples'
replacement = 'oranges'
string = 'I like apples'
new_string = re.sub(pattern, replacement, string)
print(new_string)

I like oranges


## 11. Examples

In [15]:
### Validating Email Addresses
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
email = 'test.email+regex@gmail.com'
if re.match(pattern, email):
    print('Valid email')
else:
    print('Invalid email')

Valid email


In [16]:
### Extracting Dates from Text
pattern = r'\b\d{2}[/-]\d{2}[/-]\d{4}\b'
text = 'The event is on 12/05/2023 or 13-06-2023.'
dates = re.findall(pattern, text)
print(dates)

['12/05/2023', '13-06-2023']


In [17]:
### Password Strength Validation
pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[\W_]).{8,}$'
password = 'StrongPassw0rd!'
if re.match(pattern, password):
    print('Strong password')
else:
    print('Weak password')

Strong password


In [18]:
### Parsing Log Files
pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(\w+),(\w+),(.+)'
log = '2023-05-21 10:15:00,INFO,UserLogin,User admin logged in'
match = re.search(pattern, log)
if match:
    timestamp, level, event, message = match.groups()
    print(f'Timestamp: {timestamp}, Level: {level}, Event: {event}, Message: {message}')

Timestamp: 2023-05-21 10:15:00, Level: INFO, Event: UserLogin, Message: User admin logged in


## Conclusion

Regular expressions are a robust tool for text processing in Python. Mastering regex can significantly enhance your ability to manipulate and analyze strings efficiently. Practice with realistic examples to understand the full potential of regex in production environments.