
# Python Regular Expressions (`re`) – Complete Tutorial Notebook

This notebook contains **detailed explanations + runnable examples** for **ALL regex functions and concepts**
from **basic to advanced**.

You can **directly upload this notebook to GitHub** and use it as:
- Learning material
- Interview preparation
- Teaching reference
- Production reference

---


In [2]:
import re

## 1. re.findall()


**Purpose**
- Finds ALL non-overlapping matches
- Returns a list
- If groups exist → returns list of tuples

**When to use**
- Data extraction
- Parsing text


In [3]:

re.findall(r"\d+", "a1b22c333")
re.findall(r"(\d+)-(\d+)", "12-34 56-78")


[('12', '34'), ('56', '78')]

## 2. re.search()


**Purpose**
- Finds FIRST match anywhere in string
- Returns a Match object or None

**Key difference**
- Unlike findall, stops at first match


In [4]:

m = re.search(r"\d+", "abc123xyz")
m.group(), m.start(), m.end(), m.span()


('123', 3, 6, (3, 6))

## 3. re.match()


**Purpose**
- Matches ONLY at beginning of string
- Often confused with search()

**Interview favorite**


In [5]:

re.match(r"\d+", "123abc")
re.match(r"\d+", "abc123")


## 4. re.fullmatch()


**Purpose**
- Entire string must match the pattern
- Best for validations (email, phone, id)

**Most correct validation method**


In [6]:

re.fullmatch(r"\d+", "123")
re.fullmatch(r"\d+", "123abc")


## 5. re.sub()


**Purpose**
- Replace pattern with new text
- Returns new string

**Important**
- Original string unchanged


In [7]:

re.sub(r"\d", "#", "a1b2c3")
re.sub(r"\d", "#", "a1b2c3", count=1)


'a#b2c3'

## 6. re.subn()


**Purpose**
- Same as re.sub()
- Also returns count of replacements

**Used for auditing / logging**


In [8]:

re.subn(r"\d", "#", "a1b2c3")


('a#b#c#', 3)

## 7. re.split()


**Purpose**
- Splits string wherever regex matches
- More powerful than str.split()

**Common use**
- Split on multiple delimiters


In [9]:

re.split(r"[,\s]+", "a, b  c")
re.split(r"[,\s]+", "a, b  c", maxsplit=1)


['a', 'b  c']

## 8. re.finditer()


**Purpose**
- Iterator version of findall()
- Memory efficient
- Returns Match objects

**Best for large files**


In [10]:

for m in re.finditer(r"\d+", "a1b22c333"):
    print(m.group(), m.span())


1 (1, 2)
22 (3, 5)
333 (6, 9)


## 9. re.compile()


**Purpose**
- Pre-compiles regex
- Faster when reused multiple times

**Best practice for loops**


In [11]:

pattern = re.compile(r"\d+")
pattern.findall("a1b22")
pattern.search("abc123")


<re.Match object; span=(3, 6), match='123'>

## 10. Character Classes (\d \D \w \W \s \S)


**Core building blocks of regex**

- \d  → digit
- \D  → non-digit
- \w  → word character
- \W  → non-word
- \s  → whitespace
- \S  → non-whitespace


In [12]:

re.findall(r"\d+", "a12b")
re.findall(r"\D+", "a12b")
re.findall(r"\w+", "hello_123")
re.findall(r"\W+", "hi!!!")
re.findall(r"\s+", "a  b   c")
re.findall(r"\S+", "a b   c")


['a', 'b', 'c']

## 11. Groups, Alternation & Quantifiers


**Concepts**
- ()  → capturing group
- |   → OR
- + * ? {n} → quantifiers


In [13]:

re.findall(r"(ab)+", "abab ab")
re.findall(r"cat|dog", "cat dog cow")
re.findall(r"\d{2,4}", "123456")


['1234', '56']

## 12. Greedy vs Lazy Matching


**Greedy**
- Matches as much as possible

**Lazy**
- Matches as little as possible


In [14]:

re.findall(r"<.*>", "<a><b>")
re.findall(r"<.*?>", "<a><b>")


['<a>', '<b>']

## 13. Lookarounds (Advanced)


**Lookahead / Lookbehind**
- Match based on context
- Do NOT consume characters


In [15]:

re.findall(r"\d+(?=px)", "10px 20px")
re.findall(r"(?<=₹)\d+", "₹100 ₹200")


['100', '200']

## 14. Regex Flags


**Flags modify regex behavior**
- IGNORECASE
- MULTILINE
- DOTALL


In [16]:

re.findall(r"cat", "Cat CAT", re.IGNORECASE)
re.findall(r"^\d+", "123\n456", re.MULTILINE)
re.findall(r"a.*c", "a\nb\nc", re.DOTALL)


['a\nb\nc']

## 15. Real-world Validations


**Typical validation patterns**


In [17]:

re.fullmatch(r"[\w.-]+@[\w.-]+\.\w+", "test@example.com")
re.fullmatch(r"\d{10}", "9876543210")


<re.Match object; span=(0, 10), match='9876543210'>