# Symbols:

Regular expressions (regex) use various symbols to define search patterns. Below is a comprehensive list of the common regex symbols and their meanings:

### Common Regex Symbols

1. **Anchors**:
   - `^` : Matches the start of a string.
   - `$` : Matches the end of a string.

2. **Character Classes**:
   - `.` : Matches any character except a newline.
   - `\d` : Matches any digit (equivalent to `[0-9]`).
   - `\D` : Matches any non-digit character.
   - `\w` : Matches any word character (alphanumeric & underscore, equivalent to `[a-zA-Z0-9_]`).
   - `\W` : Matches any non-word character.
   - `\s` : Matches any whitespace character (spaces, tabs, line breaks).
   - `\S` : Matches any non-whitespace character.

3. **Brackets**:
   - `[abc]` : Matches any single character `a`, `b`, or `c`.
   - `[^abc]` : Matches any character except `a`, `b`, or `c`.
   - `[a-z]` : Matches any character from `a` to `z` (inclusive).
   - `[A-Z]` : Matches any character from `A` to `Z`.
   - `[0-9]` : Matches any digit from `0` to `9`.

4. **Quantifiers**:
   - `*` : Matches 0 or more occurrences of the preceding element.
   - `+` : Matches 1 or more occurrences of the preceding element.
   - `?` : Matches 0 or 1 occurrence of the preceding element.
   - `{n}` : Matches exactly `n` occurrences of the preceding element.
   - `{n,}` : Matches `n` or more occurrences.
   - `{n,m}` : Matches between `n` and `m` occurrences.

5. **Grouping and Capture**:
   - `(...)` : Capturing group; captures the matched text for later use.
   - `(?:...)` : Non-capturing group; groups the expression without capturing the match.
   - `(?P<name>...)` : Named capturing group; captures the matched text and gives it a name for later reference.

6. **Alternation**:
   - `|` : Matches either the expression before or after it (logical OR).

7. **Assertions**:
   - `(?=...)` : Positive lookahead; matches a group only if followed by the specified expression.
   - `(?!...)` : Negative lookahead; matches a group only if not followed by the specified expression.
   - `(?<=...)` : Positive lookbehind; matches a group only if preceded by the specified expression.
   - `(?<!...)` : Negative lookbehind; matches a group only if not preceded by the specified expression.

8. **Escape Character**:
   - `\` : Escapes a special character to match it literally. For example, `\.` matches a literal period.

9. **Flags**:
   - `(?i)` : Enables case-insensitive matching (can also be set globally).
       - pattern = r"(?i)hello"
   - `(?m)` : Enables multiline mode (affects `^` and `$`).
       - pattern = r"(?m)^hello"
   - `(?s)` : Enables dot-all mode (makes `.` match newline characters).
       - pattern = r"(?s)hello."
   - `(?x)` : Enables verbose mode (allows for comments and whitespace).
       - pattern = r"""(?x)hello  #enable Verbose"""

### Summary of Common Symbols

- **Anchors**: `^`, `$`
- **Character Classes**: `.`, `\d`, `\D`, `\w`, `\W`, `\s`, `\S`, `[abc]`, `[^abc]`, `[a-z]`
- **Quantifiers**: `*`, `+`, `?`, `{n}`, `{n,}`, `{n,m}`
- **Grouping**: `(...)`, `(?:...)`, `(?P<name>...)`
- **Alternation**: `|`
- **Assertions**: `(?=...)`, `(?!...)`, `(?<=...)`, `(?<!...)`
- **Escape**: `\`
- **Flags**: `(?i)`, `(?m)`, `(?s)`, `(?x)`

These symbols form the foundation of regular expressions and are essential for pattern matching in strings!

# Functions:

In Python, the `re` module (which handles **regular expressions**) contains many functions that allow you to perform operations using regex patterns. Here’s a list of the primary functions in the `re` module:

## Core Functions in the `re` Module

1. **`re.match(pattern, string, flags=0)`**
   - Attempts to match a pattern at the **beginning** of the string.
   - Returns a match object if found; otherwise, it returns `None`.

2. **`re.search(pattern, string, flags=0)`**
   - Searches the string for **any location** where the pattern matches.
   - Returns a match object for the first match found; otherwise, returns `None`.

3. **`re.findall(pattern, string, flags=0)`**
   - Returns **all matches** of the pattern in the string as a list of strings.
   - If there are no matches, an empty list is returned.

4. **`re.finditer(pattern, string, flags=0)`**
   - Returns an iterator yielding match objects over all matches in the string.

5. **`re.sub(pattern, repl, string, count=0, flags=0)`**
   - Replaces occurrences of the pattern in the string with `repl`.
   - Returns the modified string.
   - If `count` is specified, it replaces up to that many occurrences.


6. **`re.subn(pattern, repl, string, count=0, flags=0)`**
   - Similar to `re.sub()`, but it returns a tuple `(new_string, number_of_subs_made)`.


7. **`re.split(pattern, string, maxsplit=0, flags=0)`**
   - Splits the string by occurrences of the pattern and returns a list of substrings.
   - If `maxsplit` is specified, it will limit the number of splits to that value.


## Functions for Working with Compiled Patterns
These are methods that are available once you've compiled a regular expression pattern using `re.compile()`.

8. **`re.compile(pattern, flags=0)`**
   - Compiles a regex pattern into a **regex object** for reuse.
   - This can be useful when you need to use the same regex multiple times.

Once you've compiled a pattern, you can use the following methods on that pattern object:

**`pattern.match(string)`**

**`pattern.search(string)`**

**`pattern.findall(string)`**

**`pattern.finditer(string)`**

**`pattern.sub(repl, string, count=0)`**

**`pattern.subn(repl, string, count=0)`**

**`pattern.split(string, maxsplit=0)`**

## Utility Functions

16. **`re.escape(string)`**
    - Escapes all non-alphanumeric characters in the string so they can be used in a regex.
    - Useful for creating safe regex patterns from strings that might contain special characters.

17. **`re.purge()`**
    - Clears the regular expression cache.

18. **`re.DEBUG`**
    - This is a flag that can be used with compiled patterns to print debugging information.


19. **`re.fullmatch(pattern, string, flags=0)`**
    - Similar to `re.match()`, but it only matches if the entire string matches the pattern.

---

### To summarize:
- The core functions include `match()`, `search()`, `findall()`, `finditer()`, `sub()`, `subn()`, `split()`, and `compile()`.
- After compiling a pattern, you get additional methods that mirror the core functions but are used on regex objects (`pattern.match()`, `pattern.findall()`, etc.).
- Utility functions like `re.escape()` and `re.purge()` offer additional control over regex behavior.

For more details, you can always check out Python’s official documentation for the `re` module [here](https://docs.python.org/3/library/re.html).

In [71]:
import re

## Re.Match
**`re.match(pattern, string, flags=0)`**

    - re.match(pattern, string, flags=0).
    - Attempts to match a pattern at the beginning of the string.
    - Returns a match object if found; otherwise, it returns None.




In [33]:
result = re.match(r"hello", "hello world")
print("Raw result:", result)                                   #gives you raw result


Raw result: <re.Match object; span=(0, 5), match='hello'>


In [37]:
#When you call group() without any arguments, it returns the entire match as a string.

matched_text = result.group()
print(matched_text)                                           # More refined result

hello


In [31]:
# You can also use group() like this:

string = "hello world"
pattern = r"(hello) (world)"

match = re.match(pattern, string)

if match:
    print("Full match:", match.group(0))       # Output: hello world
    print("First group:", match.group(1))      # Output: hello
    print("Second group:", match.group(2))     # Output: world
    print("All groups:", match.groups())        # Output: ('hello', 'world')


Full match: hello world
First group: hello
Second group: world
All groups: ('hello', 'world')


## Re.Search
**`re.search(pattern, string, flags=0)`**
   - Searches the string for **any location** where the pattern matches.
   - Returns a match object for the first match found; otherwise, returns `None`.

In [44]:
result = re.search(r"world", "hello world")
print(result)
print(result.group())

<re.Match object; span=(6, 11), match='world'>
world


## Re.Findall
**`re.findall(pattern, string, flags=0)`**
   - Returns **all matches** of the pattern in the string as a list of strings.
   - If there are no matches, an empty list is returned.
   

In [49]:
result = re.findall(r"\d+", "123 apples and 456 oranges")
print(result)

['123', '456']


## Re.Finditer

**`re.finditer(pattern, string, flags=0)`**
   - Returns an iterator yielding match objects over all matches in the string.


In [63]:
result = re.finditer(r"\d+", "123 apples and 456 oranges")
for match in result:
    print(match.group())

123
456


## Re.Sub
**`re.sub(pattern, repl, string, count=0, flags=0)`**
   - Replaces occurrences of the pattern in the string with `repl`.
   - Returns the modified string.
   - If `count` is specified, it replaces up to that many occurrences.


In [83]:
result = re.sub(r"\d+", "NUM", "123 apples and 456 oranges")
print(result)

NUM apples and NUM oranges


In [85]:
result = re.sub(r"\d+", "NUM", "123 apples and 456 oranges", 1)
print(result)

NUM apples and 456 oranges


## Re.Subn

**`re.subn(pattern, repl, string, count=0, flags=0)`**
   - Similar to `re.sub()`, but it returns a tuple `(new_string, number_of_subs_made)`.

In [93]:
result = re.subn(r"\d+", "NUM", "123 apples and 456 oranges and 44 bananas")
print(result)

('NUM apples and NUM oranges and NUM bananas', 3)


## Re.Split
 
 **`re.split(pattern, string, maxsplit=0, flags=0)`**
   - Splits the string by occurrences of the pattern and returns a list of substrings.
   - If `maxsplit` is specified, it will limit the number of splits to that value.
   

In [96]:
result = re.split(r"\s+", "split these words")
print(result)

['split', 'these', 'words']


## Re.Compile
**`re.compile(pattern, flags=0)`**
   - Compiles a regex pattern into a **regex object** for reuse.
   - This can be useful when you need to use the same regex multiple times.

In [105]:
pattern = re.compile(r"\d+")
result = pattern.findall("123 apples and 456 oranges")
print(result)

['123', '456']


In [159]:
result = pattern.match("123 apples")
print(result)
result = pattern.search("I have 456 apples")
print(result)
result = pattern.findall("123 apples and 456 oranges")
print(result)
result = pattern.finditer("123 apples and 456 oranges")
print(result)
result = pattern.sub("Num", "123 apples and 456 oranges")
print(result)
result = pattern.subn("Num", "123 apples and 456 oranges")
print(result)
result = pattern.split("123 apples and 456 oranges")
print(result)

<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(7, 10), match='456'>
['123', '456']
<callable_iterator object at 0x10cddf5e0>
Num apples and Num oranges
('Num apples and Num oranges', 2)
['', ' apples and ', ' oranges']


# Utility Functions

## Re.Escape

**`re.escape(string)`**
    - Escapes all non-alphanumeric characters in the string so they can be used in a regex.
    - Useful for creating safe regex patterns from strings that might contain special characters.


In [173]:
result = re.escape(".*?:()+")
print(result)

\.\*\?:\(\)\+


## Re.Purge

**`re.purge()`**
    - Clears the regular expression cache.

In [183]:
re.purge()

## Re.Debug

**`re.DEBUG`**
    - This is a flag that can be used with compiled patterns to print debugging information.




In [190]:
pattern = re.compile(r"\d+", re.DEBUG)

MAX_REPEAT 1 MAXREPEAT
  IN
    CATEGORY CATEGORY_DIGIT

 0. INFO 4 0b0 1 MAXREPEAT (to 5)
 5: REPEAT_ONE 9 1 MAXREPEAT (to 15)
 9.   IN 4 (to 14)
11.     CATEGORY UNI_DIGIT
13.     FAILURE
14:   SUCCESS
15: SUCCESS


## Re.Fullmatch

**`re.fullmatch(pattern, string, flags=0)`**
    - Similar to `re.match()`, but it only matches if the entire string matches the pattern.

    

In [201]:
result = re.fullmatch(r"\d+", "123456")
print(result.group())                             #Output: 123456

result = re.fullmatch(r"\d+", "123wef")
print(result)                                     #Output: None

123456
None


# List of Regular Expression Flags

The `re` module in Python provides several flags that can be used to modify the behavior of regular expression operations. Here’s a comprehensive list of the flags, along with their descriptions and use cases:

- **`re.IGNORECASE`**: Case-insensitive matching.
- **`re.MULTILINE`**: Anchors match the start/end of each line.
- **`re.DOTALL`**: `.` matches newlines as well.
- **`re.VERBOSE`**: Readable regex with comments.
- **`re.ASCII`**: `\w` matches only ASCII characters.
- **`re.UNICODE`**: `\w` matches Unicode characters.
- **`re.LOCALE`**: Locale-dependent matching.
- **`re.DEBUG`**: Prints debug information about the regex pattern.

### Combining Flags

You can combine multiple flags using the bitwise OR operator (`|`):

These flags give you flexibility and power when working with regular expressions in Python!

## Re.IGNORECASE

**`re.IGNORECASE` (or `re.I`)**
   - Makes the matching case-insensitive.

In [212]:
re.search(r"hello", "Hello World", flags=re.IGNORECASE).group()  # Matches "Hello"

'Hello'

## Re.MULTILINE

**`re.MULTILINE` (or `re.M`)**
   - Changes the behavior of `^` and `$` to match the start and end of each line within the string, instead of the whole string.


In [219]:
text = "hello\nworld"
print(text)
re.findall(r"^h", text, flags=re.MULTILINE)  # Matches 'h' in "hello"

hello
world


['h']

## Re.DOTALL

**`re.DOTALL` (or `re.S`)**
   - Makes the `.` special character match any character, including newline characters.


In [241]:
text = "hello\nworld"
re.search(r"hello.world", text, flags=re.DOTALL).group()                 # Matches "hello\nworld"

'hello\nworld'

## Re.VERBOSE

**`re.VERBOSE` (or `re.X`)**
   - Allows you to write more readable regular expressions by ignoring whitespace and allowing comments within the pattern.


In [245]:
pattern = r"""
         hello   # Match the word 'hello'
         \s      # Followed by a whitespace
         world    # Then match the word 'world'
     """
re.search(pattern, "hello world", flags=re.VERBOSE).group()               # Matches "hello world"

'hello world'

## Re.ASCII
**`re.ASCII` (or `re.A`)**
   - Makes the `\w`, `\W`, `\b`, and `\B` special sequences match only ASCII characters.


In [248]:
re.findall(r"\w+", "café", flags=re.ASCII)  # Matches only 'c'

['caf']

## Re.UNICODE

**`re.UNICODE` (or `re.U`)**
   - Makes the `\w`, `\W`, `\b`, and `\B` special sequences match Unicode characters (default in Python 3).

In [257]:
re.findall(r"\w+", "café")  # Matches 'café'

['café']

## Re.UNICODE

**`re.UNICODE`**
   - This allows the \w special sequence to include Unicode word characters, which covers accented letters and other characters.

In [279]:
# Sample text containing Spanish characters
text = "café, jalapeño, año, niño"

# Regex pattern to match word characters using re.LOCALE
pattern = re.compile(r'\w+', re.UNICODE)

# Find all matches in the text
matches = pattern.findall(text)

# Print the matches
print(matches)  # Output should include 'café', 'jalapeño', 'año', 'niño' as separate words


['café', 'jalapeño', 'año', 'niño']


## Re.DEBUG

**`re.DEBUG`**
   - When this flag is used, the regex engine will print information about how it processes the pattern. This is useful for debugging complex patterns.
     

In [262]:
re.compile(r"hello", flags=re.DEBUG)  # Prints debug information about the pattern

LITERAL 104
LITERAL 101
LITERAL 108
LITERAL 108
LITERAL 111

 0. INFO 16 0b11 5 5 (to 17)
      prefix_skip 5
      prefix [0x68, 0x65, 0x6c, 0x6c, 0x6f] ('hello')
      overlap [0, 0, 0, 0, 0]
17: LITERAL 0x68 ('h')
19. LITERAL 0x65 ('e')
21. LITERAL 0x6c ('l')
23. LITERAL 0x6c ('l')
25. LITERAL 0x6f ('o')
27. SUCCESS


re.compile(r'hello', re.UNICODE|re.DEBUG)

## Combining Flags

You can combine multiple flags using the bitwise OR operator (`|`):

In [293]:
pattern = r"hello"
text = "Hello World"
result = re.search(pattern, text, flags=re.IGNORECASE | re.MULTILINE)
print(result.group())

Hello
