# RegEx (Regular Expressions) in Python

- What is RegEx?
- Metacharacters used in RegEx
- The `re` module
- Usage of the `re` module and available functions

## What is RegEx?

- is a sequence of characters that defines a search pattern.
- it is a mix of letters, numbers and symbols in the pattern defining the search term/criteria.
- **RegEx is case-sensitive**

    **Example**

    ```regex
    ^a...s$
    ```

    - The code above defines a pattern. The pattern is **any five letter string starting with `a` and ending with `s`.**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `^a...s$` | abs | No match |
    | `^a...s$` | alias | Match | 
    | `^a...s$` | abyss | Match |
    | `^a...s$` | Alias | No match |
    | `^a...s$` | An abacus | No match |

## Specifying patterns using RegEx

**Metacharacters:**

- Are interpreted in a special by the RegEx engine giving a search pattern

`[] . ^ $ * + ? {} () \ |`

1. `[]` - Square brackets

    - Specify a set of characters that you wish to match
    - The do not have to be in sequence
    - Each occurrence of a  specified character counts as one match

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `[abc]` | a | 1 match |
    | `[abc]` | ac | 2 matches |
    | `[abc]` | Hey Jude | No match |
    | `[abc]` | abc de ca | 5 matches |

    - You can also specify a range of characters using `-` inside the square brackets:

        - `[a-e]` is the same as `[abcde]`
        - `[1-4]` is the sames `[1234]`
        - `[0-39]` is the same as `[0123...39]

2. `.` - Period

    - Matches any single character (except newline `\n`)
    - The match must have at least the number of specified periods(characters)

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `..` | a | No match |
    | `..` | ac | 1 match |
    | `..` | acd | 1 match |
    | `..` | acde | 2 matches |

3. `^` -  Caret

    - is used to check if a string starts with a certain character(s)
    - The order of characters specified is important and must be recognized in order to return a match

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `^a` | a | 1 match |
    | `^a` | abc | 1 match |
    | `^ab` | abc | 1 match |
    | `^a` | bac | No match |
    | `^ab` | abab | 1 match |
    | `^ab` | acb | No match |

4. `$` - Dollar

    - Used to check if a string ends with a certain character
    - The order of characters specified is important and must be recognized in order to return a match

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `a$` | a | 1 match |
    | `a$` | formula | 1 match |
    | `a$` | formula one | No match | 
    | `re$` | fire | 1 match |

5. `*` - Asterisk (Star)

    - It matches zero or more occurrences of the pattern to the left of it.

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `ma*n` | an | 1 match |
    | `ma*n` | man | 1 match |
    | `ma*n` | maaaan | 1 match |
    | `ma*n` | main | No match (a is not followed by n) | 
    | `ma*n` | woman | 1 match |


6. `+` - Plus

    - Matches one or more occurrences of the pattern to the left of it

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `ma+n` | mn | No match |
    | `ma+n` | man | 1 match |
    | `ma+n` | mailed | No match |
    | `ma+n` | many | 1 match |
    | `ma+n` | mason | No match |

7. `?` - Question Mark

    - Matches zero or one occurrences of the pattern to the left of it

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `ma?n` | mn | 1 match |
    | `ma?n` | man | 1 match |
    | `ma?n` | maaaaan | No match (a appears more than once) |
    | `ma?n` | main | No match | 
    | `ma?n` | woman | 1 match |

8. `{}` - Braces 

    - The syntax is `{n, m}` : at least `n` and at most `m` repetitions of the pattern to the left of it.
    
    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `a{2, 4}` | abc dat | No match |
    | `a{2, 4}` | abc daat | 1 match (at d**aa**t) | 
    | `a{2, 4}` | aabc daaat | 2 matches | 
    | `a{2, 4}` | aabc daaaat | 2 matches |

9. `|` - Alternation / Vertical bar / Pipe symbol

    - Used as an `or` operator

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `a\|b` | cde | No match |
    | `a\|b` | ade | 1 match |
    | `a\|b` | acdbea | 3 matches |

10. `()` - Group

    - Used to group sub-patterns.

    **Usage:**

    | Expression | String | Matched? |
    |------------|--------|----------|
    | `(a\|b\|c)xz` | ab xz | No match | 
    | `(a\|b\|c)xz` | abxz | 1 match |
    | `(a\|b\|c)xz` | axz cabxz | 2 matches |
    | `(a\|b\|c)xz` | adxz | No match |


11. `\` - Backslash 

    - is used to escape various including the metacharacters.
    - If you are unsure if a character has a special meaning or not you can put a backslash in front of it.

    **Usage:**

    | Expression | Resulting pattern |
    |------------|-------------------|
    | `\$` | Includes `$` as part of the string |
    | `\[\]` | Include `[]` as part of the string/pattern |

#### Special Sequences

- Make commonly used patterns easier to write

`\A` - Matches if a specified character(s) are at the start of the string.

**Usage:**

| Expression | String | Matched? |
|------------|--------|----------|
| `\Athe `| the sun | Match |
| `\Athe`| The sun | No match |

**Tools/Resources:**

- RegEx checker - [https://regex101.com/](https://regex101.com/)
