# Introduction to regular expressions in python


In Python, **RegEx (Regular Expressions)** is a powerful tool for **pattern matching and manipulation within strings**.

Python's built-in **`re` module** provides core functions for working with RegEx, including:
* `search()`: To find the first occurrence of a pattern.
* `split()`: To split a string by a pattern.
* `findall()`: To find all occurrences of a pattern.
* `sub()`: To substitute (replace) matched patterns.

To use these functionalities, you just need to `import re` at the beginning of your script.

---

Regular Expressions (RegEx) use special sequences to define patterns for matching and manipulating text strings.

| Special Sequence | Meaning                                                 | Example                          |
| :--------------- | :------------------------------------------------------ | :------------------------------- |
| **`\d`** | Matches any **digit** character (0-9).                  | `"123"` matches `"\d\d\d"`       |
| **`\D`** | Matches any **non-digit** character.                    | `"hello"` matches `"\D\D\D\D\D"` |
| **`\w`** | Matches any **word** character (a-z, A-Z, 0-9, `_`).    | `"hello_world"` matches `"\w"*`  |
| **`\W`** | Matches any **non-word** character.                     | `"@#$%" `matches `"\W\W\W\W"`    |
| **`\s`** | Matches any **whitespace** character (space, tab, newline). | `"hello world"` matches `"\w+\s\w+"` |
| **`\S`** | Matches any **non-whitespace** character.               | `"hello_world"` matches `"\S"*`  |
| **`\b`** | Matches a **word boundary**.                            | `"\bcat\b"` matches "cat" in "The **cat** sat." |
| **`\B`** | Matches a **non-word boundary**.                        | `"\Bcat\B"` matches "cat" in "cate**go**ry" |

---

In [1]:
import re

# The text where we want to find email addresses
sample_text = "Contact us at support@example.com or info@domain.org. Our old email was old.address@sub.example.net."

# RegEx pattern for a simple email address
# Explanation:
# r"..."       - Raw string to avoid issues with backslashes
# \b          - Word boundary (start of email)
# [\w\.-]+    - Matches one or more word characters, periods, or hyphens (username part)
# @           - Matches the '@' symbol literally
# [\w\.-]+    - Matches one or more word characters, periods, or hyphens (domain name part)
# \.          - Matches the '.' symbol literally (needs escaping as '.' is a special regex char)
# [a-zA-Z]{2,3} - Matches 2 or 3 letters (for .com, .org, .net, etc.)
# \b          - Word boundary (end of email)
email_pattern = r"\b[\w\.-]+@[\w\.-]+\.[a-zA-Z]{2,3}\b"

# Use re.findall() to find all matches of the pattern in the text
found_emails = re.findall(email_pattern, sample_text)

# Check if any emails were found and print them
if found_emails:
    print("Found email addresses:")
    for email in found_emails:
        print(f"- {email}")
else:
    print("No email addresses found in the text.")

Found email addresses:
- support@example.com
- info@domain.org
- old.address@sub.example.net
