### 1. Introduction to Regular Expressions (`re`)

A **Regular Expression** (or Regex) is a special sequence of characters that uses a search pattern to find a string or set of strings.

* **The Module:** Python has a built-in module called `re`.
* **The Raw String Rule:** Always use **Raw Strings** (`r"pattern"`) for regex patterns. This prevents Python's standard string escaping (like `\n`) from clashing with Regex escaping (like `\d`).

```python
import re

text = "My phone number is 123-456-7890."

# Simple pattern to find the phone number
# \d means "digit", {3} means "3 times"
pattern = r"\d{3}-\d{3}-\d{4}"

match = re.search(pattern, text)
print(match.group())  # Output: 123-456-7890

```

---

### 2. Basic Metacharacters

Metacharacters are characters with special meanings. They are the building blocks of regex.

| Character | Meaning | Example | Matches |
| --- | --- | --- | --- |
| `.` | **Wildcard** (Any character except newline) | `h.t` | hat, hot, hit, h@t |
| `^` | **Starts With** | `^Hello` | Only if string starts with "Hello" |
| `$` | **Ends With** | `World$` | Only if string ends with "World" |
| `\` | **Escape** | `\.` | Matches a literal dot `.` |
| ` | ` | **OR** | `ai |

```python
import re

s = "cat mat bat rat"

# Find words starting with 'c' or 'm' followed by 'at'
results = re.findall(r"[cm]at", s)
print(results) # ['cat', 'mat']

```

---

### 3. Character Classes (Sets) `[]`

Square brackets `[]` allow you to define a **set** of characters to match.

* `[abc]`: Match 'a', 'b', or 'c'.
* `[a-z]`: Match any lowercase letter.
* `[A-Z]`: Match any uppercase letter.
* `[0-9]`: Match any digit.
* `[^abc]`: **Negation**. Match anything EXCEPT 'a', 'b', or 'c'.

```python
text = "The price is $100"

# Match the '$' followed by digits
# We escape \$ because $ is a metacharacter
x = re.findall(r"\$\d+", text)
print(x) # ['$100']

```

---

### 4. Quantifiers (How many?)

Quantifiers specify how many occurrences of a character are expected.

| Symbol | Name | Description |
| --- | --- | --- |
| `*` | **Star** | 0 or more times (Greedy). |
| `+` | **Plus** | 1 or more times. |
| `?` | **Question** | 0 or 1 time (Optional). |
| `{n}` | **Exact** | Exactly `n` times. |
| `{n,m}` | **Range** | Between `n` and `m` times. |

```python
# 'a' followed by zero or more 'b's
print(re.findall(r"ab*", "a ab abb abbb"))
# Output: ['a', 'ab', 'abb', 'abbb']

# 'a' followed by ONE or more 'b's
print(re.findall(r"ab+", "a ab abb abbb"))
# Output: ['ab', 'abb', 'abbb'] ('a' is ignored)

```

---

### 5. Special Sequences (Shortcuts)

These are the most used tools in Regex.

* `\d`: Any Digit `[0-9]`.
* `\D`: Any **Non-digit**.
* `\w`: Any Alphanumeric (word) character `[a-zA-Z0-9_]`.
* `\W`: Any **Non-word** character (symbols, spaces).
* `\s`: Any Whitespace (space, tab, newline).
* `\S`: Any **Non-whitespace**.
* `\b`: **Word Boundary**. Matches the empty string at the beginning or end of a word.

```python
text = "I have 2 cats and 1 dog."

# Extract all numbers
nums = re.findall(r"\d+", text)
print(nums)  # ['2', '1']

# Extract all words
words = re.findall(r"\w+", text)
print(words) # ['I', 'have', '2', 'cats', 'and', '1', 'dog']

```

---

### 6. The Big 4 Functions

These are the methods you will use 99% of the time.

#### A. `re.findall(pattern, string)` - The Data Miner

Returns **all** non-overlapping matches as a **list of strings**.

* *Use case:* Extracting all emails, hashtags, or phone numbers from a text.

#### B. `re.search(pattern, string)` - The Scout

Scans the string for the **first** location where the pattern produces a match.

* Returns a **Match Object** if found, or `None` if not.
* Use `.group()` to get the actual text.

#### C. `re.match(pattern, string)` - The Gatekeeper

Checks for a match **only at the beginning** of the string.

* *Difference:* `search()` scans the whole string; `match()` only checks index 0.

#### D. `re.sub(pattern, replacement, string)` - The Cleaner

Replaces occurrences of the pattern with a new string.

```python
text = "Contact: test@email.com or admin@site.org"

# 1. findall
emails = re.findall(r"[\w\.-]+@[\w\.-]+", text)
print(emails)
# ['test@email.com', 'admin@site.org']

# 2. sub (Redact emails)
redacted = re.sub(r"[\w\.-]+@[\w\.-]+", "[REDACTED]", text)
print(redacted)
# "Contact: [REDACTED] or [REDACTED]"

```

---

### 7. Groups `()` and Capturing

Parentheses `()` allow you to group parts of a pattern together. This is crucial for extracting specific segments (like the domain of an email).

```python
text = "John Doe (Manager)"

# Group 1: Name, Group 2: Role
pattern = r"(\w+\s\w+)\s\((.+)\)"

match = re.search(pattern, text)
if match:
    print(match.group(0)) # The entire match: "John Doe (Manager)"
    print(match.group(1)) # Group 1: "John Doe"
    print(match.group(2)) # Group 2: "Manager"

```

---

### 8. Compilation for Performance

If you are using the same regex pattern inside a loop (e.g., processing 1 million lines), compile it first using `re.compile()`. This saves Python from parsing the pattern every single time.

```python
# Slow way (inside a loop)
# re.search(r"\d+", line)

# Fast way
pattern = re.compile(r"\d+")

lines = ["Line 1", "Line 2", "No numbers here"]
for line in lines:
    if pattern.search(line):
        print("Match found")

```