# Regular Expressions Workshop (45 min)

**Instructor:**  
**Date:**  

---

**In this session, you will learn:**
- What Regular Expressions (regex) are and why they’re useful  
- Basic regex syntax and common metacharacters  
- Python’s `re` module: `search`, `match`, `findall`, `sub`  
- Grouping and capturing  
- Simple demos and hands‐on exercises

**Agenda (45 min):**
1. Introduction to Regex (5 min)  
2. Basic Syntax & Metacharacters (10 min)  
3. Python `re` Functions & Demos (10 min)  
4. Grouping & Capturing (5 min)  
5. Exercises (15 min)  
6. Wrap‐up (if time remains)


## Table of Contents

1. [Introduction to Regex](#intro)  
2. [Basic Syntax & Metacharacters](#syntax)  
3. [Python `re` Functions & Demos](#functions)  
4. [Grouping & Capturing](#groups)  
5. [Exercises](#exercises)  
6. [Next Steps](#next)


<a id="intro"></a>  
## 1. Introduction to Regex (5 min)

- **What is a Regular Expression?**  
  A concise way to describe patterns in text (strings).  
  Used for searching, validating, and manipulating text.  

- **Why learn regex?**  
  - Quickly find phone numbers, email addresses, or dates in large text.  
  - Validate user input (e.g. “is this a valid email?”).  
  - Perform search‐and‐replace based on patterns rather than fixed substrings.  

- **Regex in Python** lives in the built‐in `re` module.  
  Common workflow:  
  1. Import `re`.  
  2. Write a pattern (as a raw string: `r"..."`).  
  3. Use functions like `re.search`, `re.match`, `re.findall`, `re.sub`.

<a id="syntax"></a>  
## 2. Basic Syntax & Metacharacters (10 min)

Below are the most common building blocks:

1. **Literal Characters**  
   - Letters, digits, punctuation appear literally.  
     e.g. `"cat"` matches the substring `"cat"`.

2. **Metacharacters** (special symbols):
   - `.`   : Matches any single character except newline.  
   - `^`   : Start of string (or start of a line in multiline mode).  
   - `$`   : End of string (or end of a line in multiline mode).  
   - `*`   : 0 or more of the preceding element.  
   - `+`   : 1 or more of the preceding element.  
   - `?`   : 0 or 1 of the preceding element (makes it optional).  
   - `{m,n}` : Between m and n of the preceding element.  
   - `[]`   : Character class (match any one inside).  
   - `|`   : Alternation (either/or).  
   - `\`   : Escape or introduce shorthand (see below).

3. **Character Classes & Shorthands**  
   - `[abc]`   : matches `a` or `b` or `c`.  
   - `[0-9]`   : matches any digit.  
   - `\d`   : same as `[0-9]`.  
   - `\D`   : non‐digit (anything except `[0-9]`).  
   - `\w`   : word character (letter, digit, or underscore).  
   - `\W`   : non‐word character.  
   - `\s`   : whitespace (space, tab, newline).  
   - `\S`   : non‐whitespace.

4. **Quantifiers**  
   - `a*`  : zero or more `a`.  
   - `a+`  : one or more `a`.  
   - `a?`  : zero or one `a`.  
   - `a{3}`: exactly three `a`’s.  
   - `a{2,5}`: between 2 and 5 `a`’s.

5. **Anchors**  
   - `^abc` : matches `"abc"` at the very start of the string.  
   - `xyz$` : matches `"xyz"` at the very end of the string.

<a id="functions"></a>  
## 3. Python `re` Functions & Demos (10 min)

- `re.search(pattern, string)`  
  → Searches entire string, returns first `Match` or `None`.

- `re.match(pattern, string)`  
  → Attempts match at the beginning of `string` only.

- `re.findall(pattern, string)`  
  → Returns a list of **all** (non-overlapping) matches.

- `re.finditer(pattern, string)`  
  → Returns an iterator of `Match` objects (useful for positions or groups).

- `re.sub(pattern, repl, string)`  
  → Replaces all occurrences of `pattern` in `string` with `repl`.

- **Flags** (pass as e.g. `re.IGNORECASE`):  
  - `re.IGNORECASE` (or `re.I`): case-insensitive.  
  - `re.MULTILINE` (or `re.M`): `^`/`$` match start/end of **each line**.  
  - `re.DOTALL` (or `re.S`): `.` also matches newline.

<a id="groups"></a>  
## 4. Grouping & Capturing (5 min)

- **Parentheses** `( … )` create a **capturing group**.  
  - The text matched by the group is accessible via `.group(i)` or in `findall()` as tuples.

- **Non‐capturing group** `(?: … )` matches without capturing.  
  - Useful when you need grouping for quantifiers, but don’t need the contents.

- **Examples**:  
  - `r"(foo|bar)"` → matches “foo” or “bar” and captures which one.  
  - `r"(?:foo|bar)"` → matches “foo” or “bar” without capturing for later.

- **Accessing Group Data**:  
  ```python
  m = re.search(r"(\d{3})-(\d{2})-(\d{4})", "SSN: 123-45-6789")
  m.group(1)  # "123"
  m.group(2)  # "45"
  m.group(3)  # "6789"
  m.group(0)  # full match: "123-45-6789"
  ```

<a id="exercises"></a>  
## 5. Exercises (15 min)

Try these short, easy exercises on your own. After you’ve attempted them, scroll down for the solutions.

---

### Exercise 1: Find All Phone Numbers

- **Prompt:**  
  Given the list of strings below, write a regex to extract all US‐style phone numbers of the form `XXX-XXX-XXXX`.

```python
lines = [
    "Call me at 555-123-4567 tomorrow.",
    "Emergency: 911 is for police, but 800-555-1212 for toll-free.",
    "No number here.",
    "Alternate: (555) 765-4321 or 555.987.6543"
]
```

- **Task:**  
  1. Extract only the dash‐separated numbers (`555-123-4567`, `800-555-1212`).  
  2. Ignore formats with parentheses or dots.

---

### Exercise 2: Validate Simple Email Addresses

- **Prompt:**  
  Write a regex that matches an email address if it has:
  - One or more word characters (`\w+`)  
  - The `@` symbol  
  - One or more word characters (`\w+`)  
  - A dot `.`  
  - A two‐ or three‐letter TLD (`[a-zA-Z]{2,3}`)

```python
candidates = [
    "alice@example.com",
    "bob@site.org",
    "invalid@no-tld",
    "john.smith@company.co",
    "@missinguser.com",
    "jane@domain.c"
]
```

- **Task:**  
  1. Use `re.match` or `re.fullmatch` so that the entire string must fit the pattern.  
  2. Print which candidates are “valid” and which are not.

---

### Exercise 3: Replace Whitespace Sequences

- **Prompt:**  
  Given a messy string that has multiple spaces, tabs, and newlines, replace **any sequence** of whitespace characters with a single space.

```python
messy = "This   is\t\tan example.\nNew     lines and    spaces.\n\tEnd."
```

- **Task:**  
  1. Write a regex to match one or more whitespace (`\s+`).  
  2. Use `re.sub` to turn every whitespace sequence into a single `" "`.  
  3. Print the cleaned‐up string.

---

<a id="next"></a>  
## 6. Next Steps & Wrap‐up

- **Key Takeaways:**  
  - Regex is a powerful way to describe text patterns.  
  - Learn and memorize common metacharacters: `. ^ $ * + ? {m,n} [ ] \d \w \s`  
  - Use Python’s `re` module:  
    - `search`, `match`, `findall`, `finditer`, `sub`  
    - Remember to use raw strings (`r"..."`) so backslashes aren’t eaten by Python.

- **Practice More:**  
  - Validate phone numbers in different formats (e.g., with parentheses, dots).  
  - Extract URLs (e.g., `https?://\S+`).  
  - Work with log files to parse timestamps.  
  - Explore lookahead/lookbehind: `(?=...)`, `(?<=...)`—advanced topic for next time.

**Congratulations!** You’ve completed a 45-minute introduction to Python Regular Expressions.  
Feel free to revisit the exercises or try out your own patterns on real datasets.