# Module 1: Introduction to Regular Expressions

## What are Regular Expressions?

Regular expressions (regex or regexp) are powerful sequences of characters that define a search pattern. Think of them as a specialized mini-language for pattern matching and text manipulation.

**Key Points:**
- Regular expressions are universal across programming languages
- They provide a way to match patterns in text
- They can be used for validation, searching, and text manipulation

## Why Use Regular Expressions?

Regular expressions solve many common text processing problems:

1. **Data Validation**
   - Email addresses
   - Phone numbers
   - Passwords
   
2. **Text Searching**
   - Finding specific patterns
   - Extracting information
   
3. **Text Manipulation**
   - Replacing text
   - Cleaning data
   - Formatting strings

## The `re` Module in Python

Python's `re` module provides support for regular expressions. Let's import it and explore its basic functions:

In [3]:
import re

# The main functions we'll use:
# re.search()  - Find first match
# re.match()   - Match at beginning of string
# re.findall() - Find all matches
# re.sub()     - Replace matches

### Basic Functions Explained

1. **re.search(pattern, string)**: 
   - Scans through the string looking for a match
   - Returns first match found
   - Returns None if no match

In [2]:
# Example of re.search()
text = "Python is awesome"
match = re.search(r"is", text)
print(f"Found 'is' at position: {match.start() if match else 'Not found'}")

# Try with a pattern that doesn't exist
match = re.search(r"javascript", text)
print(f"Found 'javascript': {'Yes' if match else 'No'}")

Found 'is' at position: 7
Found 'javascript': No


2. **re.match(pattern, string)**:
   - Attempts to match at the beginning of the string
   - Returns None if the pattern doesn't match at the start

In [4]:
# Example of re.match()
text = "Python is awesome"

# This will match because 'Python' is at the start
match = re.match(r"Python", text)
print(f"Starts with 'Python': {'Yes' if match else 'No'}")

# This won't match because 'is' is not at the start
match = re.match(r"is", text)
print(f"Starts with 'is': {'Yes' if match else 'No'}")

Starts with 'Python': Yes
Starts with 'is': No


3. **re.findall(pattern, string)**:
   - Finds all non-overlapping matches in the string
   - Returns a list of all matches

In [7]:
# Example of re.findall()
text = "The quick brown fox jumps over the lazy fox"

# Find all occurrences of 'fox'
matches = re.findall(r"fox", text)
print(f"Found {len(matches)} occurrences of 'fox': {matches}")

# Find all words
words = re.findall(r"\w+", text)  # \w+ means one or more word characters
print(f"Words in text: {words}")

Found 2 occurrences of 'fox': ['fox', 'fox']
Words in text: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'fox']


4. **re.sub(pattern, repl, string)**:
   - Replaces all occurrences of the pattern with repl
   - Returns modified string

In [None]:
# Example of re.sub()
text = "I love javascript and javascript is great"

# Replace 'javascript' with 'python'
new_text = re.sub(r"javascript", "python", text)
print(f"Original: {text}")
print(f"Modified: {new_text}")

## Writing Your First Regex Pattern

Let's learn about basic pattern elements:

1. **Literal Characters**
   - Match exactly as they appear
   - Case-sensitive by default

In [None]:
# Literal character matching
text = "Hello, World!"

# Match 'Hello'
match = re.search(r"Hello", text)
print(f"Found 'Hello': {'Yes' if match else 'No'}")

# Case matters
match = re.search(r"hello", text)
print(f"Found 'hello': {'Yes' if match else 'No'}")

2. **Special Characters**
   - `.` - Matches any character except newline
   - `\w` - Matches word characters [a-zA-Z0-9_]
   - `\d` - Matches digits [0-9]
   - `\s` - Matches whitespace

In [12]:
# Special character examples
text = "Hello123 World!"

# Find word characters
word_chars = re.findall(r"\w", text)
print(f"Word characters: {word_chars}")

# Find digits
digits = re.findall(r"\d", text)
print(f"Digits: {digits}")

# Find whitespace
spaces = re.findall(r"\s", text)
print(f"Whitespace count: {len(spaces)}")

Word characters: ['H', 'e', 'l', 'l', 'o', '1', '2', '3', 'W', 'o', 'r', 'l', 'd']
Digits: ['1', '2', '3']
Whitespace count: 1


## Practice Exercises

Try these exercises to reinforce your understanding:

In [14]:
# Exercise 1: Count how many words start with 'p' (case insensitive)
text = "Python programming is powerful and practical and I am happy"

match = re.findall(r"\bp\w+", text, re.IGNORECASE)
print(f"found {len(match)} matches of word  starting with 'p' with words as {match}")
# Your code here
# Hint: Use re.findall() with r"\bp\w+" and re.IGNORECASE flag

found 4 matches of word  starting with 'p' with words as ['Python', 'programming', 'powerful', 'practical']


In [None]:
# Exercise 2: Extract all numbers from a string
text = "There are 123 apples and 456 oranges"

# Your code here
# Hint: Use re.findall() with r"\d+"

In [None]:
# Exercise 3: Replace all numbers with 'X'
text = "My phone number is 123-456-7890"

# Your code here
# Hint: Use re.sub() with r"\d", "X"

## Solutions to Practice Exercises

In [None]:
# Solution 1: Count words starting with 'p'
text = "Python programming is powerful and practical"
p_words = re.findall(r"\bp\w+", text, re.IGNORECASE)
print(f"Words starting with 'p': {p_words}")
print(f"Count: {len(p_words)}")

# Solution 2: Extract numbers
text = "There are 123 apples and 456 oranges"
numbers = re.findall(r"\d+", text)
print(f"Numbers found: {numbers}")

# Solution 3: Replace numbers
text = "My phone number is 123-456-7890"
masked = re.sub(r"\d", "X", text)
print(f"Masked text: {masked}")

## Summary

In this module, we learned:
1. What regular expressions are and why they're useful
2. Basic functions in Python's `re` module
3. How to write simple patterns
4. How to use special characters in patterns

In the next module, we'll explore more advanced pattern matching techniques including:
- Quantifiers
- Character classes
- Groups and capturing
- Assertions