# 🔹 What is Regex in Python?
## Regex (short for Regular Expression) is a tool used to search, match, or extract patterns in text like emails, phone numbers, or words.


## ✅ Real-Life Example
### Suppose you want to:

- Check if a phone number is valid ✅

- Extract email addresses from a paragraph 📧

- Find all dates in a document 📅

- Instead of checking each character manually, you can use regex to do it quickly and accurately.

## 🔍 Common Regex Patterns

| Pattern | Meaning                            | Example Match                        |
| ------- | ---------------------------------- | ------------------------------------ |
| `\d`    | Digit (0–9)                        | `5`, `9`, etc.                       |
| `\w`    | Word character (a-z, A-Z, 0-9, \_) | `a`, `9`, `_`                        |
| `\s`    | Whitespace (space, tab)            | Space                                |
| `.`     | Any character (except newline)     | `a`, `1`, `%`                        |
| `+`     | One or more times                  | `\d+` = one or more digits           |
| `*`     | Zero or more times                 | `\w*` = word characters (maybe none) |
| `^`     | Starts with                        | `^Hello` matches "Hello World"       |
| `$`     | Ends with                          | `end$` matches "the end"             |
| `[]`    | Set of characters                  | `[abc]` matches `a`, `b`, or `c`     |


## Now, here are the most commonly used functions in the re module:

| Function         | Description                                                            |
| ---------------- | ---------------------------------------------------------------------- |
| `re.search()`    | ✅ Searches for the **first match** of a pattern anywhere in the string |
| `re.match()`     | ✅ Checks if the pattern **matches at the beginning** of the string     |
| `re.fullmatch()` | ✅ Checks if **the entire string** matches the pattern                  |
| `re.findall()`   | ✅ Returns a list of **all non-overlapping matches**                    |
| `re.finditer()`  | ✅ Returns an iterator yielding **match objects** for all matches       |
| `re.sub()`       | ✅ Replaces all matches with a new string (**substitute**)              |
| `re.split()`     | ✅ Splits the string at each match of the pattern                       |
| `re.compile()`   | ✅ Compiles a regex pattern into a reusable regex object                |


### 🔗 How They’re Connected (Both Tables)

- 🔹 1. Patterns = What to look for

These are like rules or formulas (e.g., \d+ = one or more digits).

- 🔹 2. Functions = Where and how to look

These are the tools that use those patterns to search, match, find, replace, etc.

## 🔧 In Python: We use the re module

In [2]:
import re

## ✅ 1. \d — Digit (0–9)
### Used with: findall(), search(), match()

In [3]:
import re

text = "Marks: 75, 82, 99"
print(re.findall(r"\d", text))    # for single digits
print(re.findall(r"\d+", text))   # for one or more digits


['7', '5', '8', '2', '9', '9']
['75', '82', '99']


## ✅ 2. \w — Word character (letters, digits, underscore)
### Used with: findall(), split()

In [12]:
text = "Ali_99 and Sara@123"

print(re.findall(r"\w", text))    # for single word characters (letters, digits, underscore)

print()

print(re.findall(r"\w+", text))  # for word characters (letters, digits, underscore)


['A', 'l', 'i', '_', '9', '9', 'a', 'n', 'd', 'S', 'a', 'r', 'a', '1', '2', '3']

['Ali_99', 'and', 'Sara', '123']


## ✅ 3. \s — Whitespace
### Used with: split() to split by spaces

In [5]:
text = "Hello    world! This is Python"
print(re.split(r"\s+", text))   # split by whitespace (spaces, tabs, etc.)


['Hello', 'world!', 'This', 'is', 'Python']


## ✅ 4. . — Any character (except newline)
### Used with: search()

In [6]:
text = "A$B"
print(re.search(r".", text))  # finds the first character
print(re.search(r".+", text))  # finds one or more characters

print()

print(re.findall(r".", text))   # finds all characters
print(re.findall(r".+", text))  # finds all characters as a single match

<re.Match object; span=(0, 1), match='A'>
<re.Match object; span=(0, 3), match='A$B'>

['A', '$', 'B']
['A$B']


## ✅ 5. + — One or more times
### Used with: findall(), sub()



In [7]:
text = "helloooo"
print(re.search(r"lo*", text).group())  # finds "lo" followed by zero or more 'o's

l


## ✅ 7. ^ — Starts with
### Used with: match() or search()

In [8]:
text = "Python is easy"
print(re.match(r"^Python", text))  # matches "Python" at the start
print(re.match(r"^easy", text))    # does not match "easy" at the start


<re.Match object; span=(0, 6), match='Python'>
None


## ✅ 8. $ — Ends with
### Used with: search()

In [9]:
text = "This is the end"
print(re.search(r"end$", text))  

print(re.search(r"start$", text))  # does not match "start" at the end

<re.Match object; span=(12, 15), match='end'>
None


## ✅ 9. [] — Set of characters
### Used with: findall()

In [10]:
text = "abc123xyz"

print(re.findall(r"[a-z]", text))  # finds all lowercase letters
print(re.findall(r"[a-z]+", text))  # finds one or more lowercase letters

print(re.findall(r"[a-zA-Z]", text)) # finds all letters (both lowercase and uppercase)

print(re.findall(r"[0-9]", text))  # finds all digits
print(re.findall(r"[0-9]+", text))  # finds one or more digits

['a', 'b', 'c', 'x', 'y', 'z']
['abc', 'xyz']
['a', 'b', 'c', 'x', 'y', 'z']
['1', '2', '3']
['123']


## 1. re.search() – Search for a pattern anywhere in the string

In [13]:
import re

text = "I am fond of leaning AI and Cloud Computing"
result = "AI"

re.search(text, result)  

if re.search(result, text):
    print(f"'{result}' found in the text.")
else:
    print(f"'{result}' not found in the text.")    

'AI' found in the text.


## 2. re.fullmatch() – Check if the entire string matches the pattern

In [None]:
import re
email = "Alikhan123._@gmail.com"


pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" # Starts with letters, numbers, or allowed symbols (._%+-)
                                                            # Must contain a single @ symbol
                                                            # Followed by a valid domain (letters, numbers, dots or hyphens)
                                                            # Ends with a dot and domain extension of at least 2 letters (e.g., .com, .org)
if re.fullmatch(pattern, email):
    print("Valid email address")
else:
    print("Invalid email address")

Valid email address


## 3. re.finditer() – Returns an iterator of match objects for all matches

In [20]:
import re

text = "Roll numbers: 01, 19, 32"
matches = re.finditer(r"\d+", text)

for match in matches:
    print(f"Found:", match.group(), "at position:", match.span())  

Found: 01 at position: (14, 16)
Found: 19 at position: (18, 20)
Found: 32 at position: (22, 24)


## 4. re.sub() – Substitute (replace) matching text

In [None]:
import re

text = "My number is 123-456-7890" 
new_text = re.sub(r"\d", "*", text) # replaces all digits with '*'

print(new_text)

My number is ***-***-****


## 5. re.split() – Split a string based on the pattern

In [None]:
import re

text = "C++ Python SQL-Server"
languages = re.split(r"\s|[-]", text)  # split by whitespace or hyphen

print(languages)  

['C++', 'Python', 'SQL', 'Server']


## 6. re.compile() – Compile a pattern for reusability

In [27]:
import re

text1 = "Age: 25"
text2 = "Age: 30"

pattern = re.compile(r"\d[0-9]")  # matches two digits

matches1 = pattern.findall(text1)
matches2 = pattern.findall(text2)


print(f"Matches in text1: {matches1}")
print(f"Matches in text2: {matches2}")

Matches in text1: ['25']
Matches in text2: ['30']


## 7. re.match() – Check if the pattern matches at the start of the string

In [29]:
import re 

text = "Saylani Welfare Trust"
match = re.match(r"Saylani", text)  # matches "Saylani" at the start

if match:
    print("Match found:", match.group())
else:
    print("No match found at the start of the string.")

Match found: Saylani


## 8. re.findall() – Return all non-overlapping matches in a list

In [33]:
import re

text = "Contact numbers 0300-1234567, 0312-7654321"
matches = re.findall(r"\d{4}-\d{7}", text)  

print("Matches found:", matches)  

Matches found: ['0300-1234567', '0312-7654321']
