# Regex Essentials: Overview

- Regular expressions (regex) are a language for defining text search patterns.
- Python’s `re` module provides functions like `search` (find anywhere) and `match` (anchored at start).
- Patterns include literals, metacharacters (`. ^ $ * + ? [] \`), character classes (`\d`, `\w`, `\s`), and quantifiers (`*, +, ?, {n,m}`).
- Greedy quantifiers (`*, +`) match as much as possible; non-greedy (`*?, +?`) as little as possible.


# Introduction to `re.search()` vs `re.match()`


- `re.search(pattern, text)` scans the entire string for the first occurrence.
- `re.match(pattern, text)` checks only at the beginning of the string.
- `re.findall()` and `re.finditer()` let you retrieve every occurrence of a pattern.
- Always use raw strings (`r"..."`) to define regex patterns, avoiding Python string escapes interfering with regex.


In [1]:
import re

text = "My phone number is 0912345678"
match = re.search(r'\d{10}', text)
if match:
    print(match.group())  # Output: 0912345678


0912345678



# Common Metacharacters

- `.` matches any character (except newline).
- `^` anchors at start of string.
- `$` anchors at end of string.
- `[]`defines a set or range of characters, e.g. [A-Z].
- `\` escapes metacharacters or introduces special sequences.



In [2]:
import re

test = "Error code: E1234. cxge"

print(f"Dot matches any character: {re.findall(r"c..e", test)}")
print(f"Start anchor (finds): {re.findall(r"^Error", test)}")
print(f"Start anchor (does not find): {re.findall(r"^E1234", test)}")
print(f"End anchor: {re.findall(r"cxge$", test)}")
print(f"Character set: {re.findall(r"[E0-9]+", test)}")


Dot matches any character: ['code', 'cxge']
Start anchor (finds): ['Error']
Start anchor (does not find): []
End anchor: ['cxge']
Character set: ['E', 'E1234']
