## Regex 

### 1. **Basics of Regular Expressions**
To use regular expressions in Python, you need to import the `re` module.



In [None]:
import re

### Basic Functions

- **re.match()**: Checks if the pattern matches from the start of the string.
- **re.search()**: Searches for the first match anywhere in the string.
- **re.findall()**: Returns all non-overlapping matches as a list.
- **re.sub()**: Replaces matched patterns with a replacement string.


In [None]:
# Example usage
pattern = r'\d+'  # matches digits
text = "My number is 12345"

# Match from the start
match = re.match(pattern, text)  # None, no digits at the start

# Search anywhere
search = re.search(pattern, text)  # Finds '12345'

# Find all matches
find_all = re.findall(pattern, text)  # ['12345']

# Replace matched pattern
substitute = re.sub(pattern, '#####', text)  # "My number is #####"


### Special Characters

- `.` : Any character except newline.
- `^` : Start of the string.
- `$` : End of the string.
- `*` : 0 or more repetitions of the preceding character.
- `+` : 1 or more repetitions.
- `?` : 0 or 1 repetition (optional).
- `[]` : Set of characters to match.
- `\d` : Any digit (equivalent to `[0-9]`).
- `\D` : Any non-digit.
- `\w` : Any alphanumeric character or underscore (equivalent to `[a-zA-Z0-9_]`).
- `\W` : Any non-word character.
- `\s` : Whitespace (spaces, tabs, newlines).
- `\S` : Non-whitespace.


In [None]:
# Example
text = "Hello 123"
re.search(r'\d+', text)  # Matches '123' (one or more digits)
re.search(r'^\w+', text)  # Matches 'Hello' (word at the start)
re.search(r'\D+', text)  # Matches 'Hello ' (non-digit characters)


### Quantifiers
- `{m}` : Exactly m occurrences.
- `{m,n}` : Between m and n occurrences.




In [None]:
# Example
text = "abbbbcc"
re.search(r'b{3}', text)  # Matches 'bbb' (exactly 3 'b's)
re.search(r'b{2,4}', text)  # Matches 'bbbb' (between 2 and 4 'b's)

### Grouping and Capturing
- Parentheses `()` are used for grouping.
- Captured groups can be accessed using `.group(n)` where `n` is the group number (starting from 1).


In [None]:
# Example
text = "My age is 25"
match = re.search(r'My age is (\d+)', text)
age = match.group(1)  # Captures '25'


### Alternation and Escaping

- `|` : Acts as an OR operator.
- `\` : Escapes special characters.


In [None]:
# Example
re.search(r'cat|dog', "I love cats")  # Matches 'cat' (alternation)
re.search(r'\$', "The price is $10")  # Matches '$' (escaped character)


### Lookahead and Lookbehind

- **Lookahead** `(?=...)`: Matches a group only if it is followed by the specified pattern.
- **Negative Lookahead** `(?!...)`: Ensures that the pattern is not followed by the specified group.
- **Lookbehind** `(?<=...)`: Matches a group only if it is preceded by the specified pattern.
- **Negative Lookbehind** `(?<!...)`: Ensures that the pattern is not preceded by the specified group.


In [None]:
# Example
text = "Python 3.9 is awesome!"
re.search(r'Python(?=\s3)', text)  # Matches 'Python' if followed by ' 3'
re.search(r'awesome(?<!not)', text)  # Matches 'awesome' if not preceded by 'not'


### Flags

- `re.IGNORECASE` (`re.I`) : Case-insensitive matching.
- `re.DOTALL` (`re.S`) : `.` matches newlines as well.
- `re.MULTILINE` (`re.M`) : `^` and `$` match start/end of lines.


In [None]:
# Example
text = "HELLO world"
re.search(r'hello', text, re.IGNORECASE)  # Matches 'HELLO'



## Summary Table of Key Functions and Patterns

| **Pattern**       | **Description**                                    |
|-------------------|----------------------------------------------------|
| `.`               | Any character except newline                       |
| `^`, `$`          | Start, end of string                               |
| `*`, `+`, `?`     | 0 or more, 1 or more, 0 or 1 repetitions          |
| `{m}`, `{m,n}`    | Exactly m, between m and n occurrences             |
| `\d`, `\D`        | Digit, non-digit                                   |
| `\w`, `\W`        | Alphanumeric, non-alphanumeric                     |
| `\s`, `\S`        | Whitespace, non-whitespace                         |
| `|`               | Alternation (OR)                                   |
| `()`              | Grouping, capturing                                |
| `(?=...)`, `(?!...)` | Lookahead, negative lookahead                     |
| `(?<=...)`, `(?<!...)` | Lookbehind, negative lookbehind                 |


# Example

In [1]:
# Validate an Email

import re

email_pattern = r'^[\w\.-]+@[a-zA-Z\d\.-]+\.[a-zA-Z]{2,6}$'
email = "example@domain.com"
if re.match(email_pattern, email):
    print("Valid email")

In [3]:
#Basic program

import re

# Original string
text = "There are 123 apples and 456 oranges."

# Substitute each digit with '#'
new_text = re.sub(r'\d', '#', text)

# Output the result
print("Text after substitution:", new_text)


Text after substitution: There are ### apples and ### oranges.
