# Regular Expressions

### The gist
Regular Expressions (regex) are sequences of characters that define search patterns, used for string manipulation and text search. They can be used in various languages including, of course, Python.

### Main features to read on your own:
1. Validation - Check if a string matches a certain pattern (e.g. email, phone number, etc)
2. Search and Replace - Find and replace specific patterns within a string.
3. Extraction - Extract specific information from a larger string.
4. Tokenization - Splitting text into meaningful units (e.g. words, sentences).
5. URL Routing - Matching and extracting information from URLs.

#### Basic components
1. Literals - Characters that match themselves (e.g. "a" matches the letter "a").
2. Metacharacters - Special characters with special meanings (common metacharacters listed below).
3. Character classes - Represent sets of characters (e.g. "[a-z]" matches any lowercase letter).

#### Common metacharacters
- .(dot) - Matches any single character except a newline.
- \(backslash) - Escapes a metacharacter to be treated as a literal (e.g. "\." matches a period).
- ^(caret) - Matches the start of a line or a string.
- $(dollar) - Matches the end of a line or a string.
- *(asterisk) - Matches the preceding element zero or more times.
- +(plus) - Matches the preceding element one or more times.
- ?(question mark) - Matches the preceding element zero or one time.
- [] - Defines a character class (e.g. "[aeiou]" matches any vowel).
- [^ ] - Defines a negated character class (e.g. "[0-9]" matches anything except digits).

#### Modifiers
- i - Case-insensitive matching.
- g - Global matching (find all matches, not just the first).
- m - Multiline mode (treat the input as having multiple lines).

### Example - Validate an email address
In this example, our `validate_email` function uses the `re.match()` function to check if the provided email matches the specified regex pattern.

In [1]:
import re

def validate_email(email):

    # This pattern checks the basic formatting of an email address
    # It matches sequences of characters before and after the @ symbol
    pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'

    # Return True if the email passes validation and False otherwise
    if re.match(pattern, email):
        return True
    else:
        return False

email1 = "user@example.com"
email2 = "invalid.email"

print(validate_email(email1))
print(validate_email(email2))

True
False
