## The re (regex) library is a powerfull library used for string manupulations 

## **Methods of re library**
### `1. re.search()`
 - 	Looks for a match anywhere in the string.
### `2. re.match()`
 - 	Checks if the string starts with the pattern.
### `3. re.findall()`
 - 	Returns all non-overlapping matches as a list.
### `4. re.sub()`
 - 	Replaces parts of the string that match a pattern.
### `5. re.split()`
 - Splits the string by a regex pattern.
### `6. re.compile()`
 - Compiles a pattern into a regex object for reuse.

## **First we talk about re.search() method**
---
 - The re.search() scans the whole string.
 - And gives the first match answer.
### **Parameters**
- `pattern` : The regex pattern to search for.

- `string` : The string you want to search in.

- `flags` :  Optional modifiers (like re.IGNORECASE, re.MULTILINE, etc.)

### **Return Values**
- Give the match object, if exists.
- If not exists then give None.

### **You can extract the information:**

- By using `.group()` method, which gives the actual match object or text.

-  `.start()`: By using this method you obtain the starting position of text in string (in numbers).

-  `.end()`: By using this method you obtain the ending position of text in string (in numbers).
 
- `.span()` : This method gives a tuple of .start() and .end() values.

In [None]:
import re
# example of re.search

name : str = "This is a test string 000" # indexing start from 0

# test_pattern : str = "py"
# match : str = re.search(test_pattern, name) # it will search the pattern in the string

match1 : str = re.search(r"Th", name) # it will search the pattern in the string

if match1:
    print("match value: ", match1.group())
    print("match value start position: ", match1.start())
    print("match value end position: ", match1.end())
    print("match value both position position: ", match1.span())
else:
    print("No match found")


match value:  Th
match value start position:  0
match value end position:  2
match value both position position:  (0, 2)


## Now using **flags** in re.search method 
- ### Flags are optional settings that modify the behavior of search() method
- ### There are several types of flags used in search() method
## **First using re.IGNORECASE or re.I flag**

In [None]:
import re
# 1. re.IGNORECASE Or re.I -> This flag makes the search case-insensitive.

father_name : str = "my father name is IFRAQ"
# match2 = re.search(r'ifraq', father_name) gives AttributeError: 'NoneType' object has no attribute 'group'
match2 = re.search('ifraq', father_name, re.I) # it will search the pattern in the string
print(match2.group())

IFRAQ


## Using **re.MULTILINE** or **re.M**

- ### This flag allows the **^** to check at the start of every line.
- ### This flag allows the **$** to check at the end of every line.

## Without it:
- ### This symbol **^** checks at the start of first line.
- ### This symbol **$** checks at the end of last line.


In [5]:
import re
message : str = '''This is a first line
It is second
And it is last line.
'''

# making two patterns for first and last line
first_pattern : str = '^It'
last_pattern : str = 'second$'

# match the first pattern (without re.M flag) 
match : str = re.search(first_pattern, message) # without flag it will give None
print(match)

# match the first pattern (with re.M flag) 
match : str = re.search(first_pattern, message, re.M) # with flag it will give the match value
print(match.group()) 

# match the last pattern (without re.M flag) 
match : str = re.search(last_pattern, message) # without flag it will give None
print(match)

# match the last pattern (with re.M flag) 
match : str = re.search(last_pattern, message, re.M) # with flag it will give the match value
print(match.group()) 

None
It
None
second


## **Using re.DOTALL or re.S flag**
- ### This flag enables the **DOT (.)** symbol to match the newline character.
---
## What is the working of **DOT (.)** symbol.
- By default the . symbol matches any single character (only) that fits between two boundaries.Like if i write **a.b** -> then it means a + any single character + b. e.g. aab,acb,ahb, etc
- We can also use **.** with no boundaries in this way it prints all single characters except new line in an array.
- We also use . symbol with any any single boundary like **a.** it means any single character which starts from a in a string, like is there any word $alpha$ then it picks $al$ as output
- And also in this way **.a** it means any single character that ends on a like above example in $alpha$ it picks $ha$ as output.
- An important thing about this symbol is that it gives the first match value.

In [6]:
import re
# first using dot (.) symbol 
name1 : str = 'This is a string'
match : str = re.search(r's.', name1) # prints s actually 's ' because space is also a character
# display(match.group())

name2 : str = 'This is a string'
match : str = re.search(r'.s', name2) # prints is as only one dot before s
# display(match.group())

name3 : str = 'This is a string'
match : str = re.search(r'...s', name3) # prints this because 3 dots before s
# display(match.group())

# but this method fails in newline characters
name4 : str = "This is first line \n This is second line."
match : str = re.search(r'T.*T', name4) # because it will not match the newline character
display(match) # prints None

# Now using re.DOTALL flag 
name5 : str = "This is first line \n This is second line."
match : str = re.search(r'T.*T', name4, re.S) # because it will not match the newline character
display(match) # prints None

None

<re.Match object; span=(0, 22), match='This is first line \n T'>

### **There are other flags like re.X (VERBOSE), re.A (ASCII) and re.L (LOCAL).**
### **But they are not use as much.**

---
## **Now exploring special sequences in `re` library.**
- ### first we will discuss about \w and \W.

### 🔹 **\w**
- \w mathches any word character in a string, that may be:
    - All uppercase characters (A-Z).
    - All lowercase characters (a-z).
    - All digits (0-9).
    - All underscores (_).

### 🔹 **\W**
- \W mathches any non-word character in a string, that may be:
    - Spaces
    - Punctuations (@, !, ., ~, etc).
    - symbols.
    - newlines.
    - etc.


In [None]:
import re
address : str = 'L-972 sector 11-E, M.T, North Karachi!'

match : str = re.search(r'\w', address) # it will match the first word character in the string
print(match.group()) # prints L

match : str = re.search(r'\W', address)
print(match.group()) # prints -


L
-


## **Now \d and \D**
### **&#x25C6; \d**
  - Matches only single digit from 0-9.
  - [0-9]

### **&#x25C6; \D**
  - Matches a single non-digit character.
  - [^0-9]

In [None]:
import re

text = "House no. 129B, Block 4A"
match : str = re.search(r'\d', text) # it will match the first digit in the string
print(match.group()) # prints 1

match : str = re.search(r'\D', text)
print(match.group()) # prints H


1
H


## **Now \s and \S**
### **&#x25C6; \s**
- Matches any whitespace character, including:
    - space (' ').
    - tab (\t).
    - newline (\n).
    - carriage return (\r).
    - form feed (\f).
    - vertical tab (\v).
### Equivalent to: [ \t, \n, \r, \f, \v ]
### **&#x25C6; \S**
- Matches any character except whitespace.
### Equivalent to: [^ \t, \n, \r, \f, \v ]




In [16]:
import re

text = "Hello 123\tWorld\nNew Line"
match : str = re.search(r'\s', text) # it will match the first whitespace character in the string
display(match.group()) # prints space

match : str = re.search(r'\S', text) # it will match the first whitespace character in the string
display(match.group()) # prints H

' '

'H'

## **Now \b and \D**
### **&#x25C6; \b -Word Boundary**
- Matches the position between:
    - A word character (\w) and.
    - A non-word character (\W) or start/end of string.
    - It's useful for finding whole words.
- It doesn't consume any characters — it just checks where a word begins or ends.

### **&#x25C6; \B -Not a Word Boundary**
- Matches a position not at a word boundary.
- So it's inside a word (between two word characters, like between **e** and **l** in **"hello"**).



In [31]:
import re

text = "This is a test. Testing word-boundary!"
match : str = re.search(r'\ba', text) # it will match the first word in the string
print(match.group()) # prints a

match : str = re.search(r'\Bbad\B', "abaddream")
print(match.group()) # prints 

a
bad


# **Now let's talk about re.match() method**
### &#x25C6; **re.match()**.
- Tries to match a pattern only at the beginning of the string.
- If the pattern starts exactly at position 0, it returns a `match` object. Otherwise, it returns `None`.
### &#x25C6; **Parameters**.
- `pattern`: The regex pattern to match.

- `string`: The string to test against.

- `flags`: (Optional) Regex flags like, re.IGNORECASE
### &#x25C6; **Think of it like this**.
### **re.match() says**
 - Hey, does this pattern start the string?

 - if yes, it gives the match.

 - if no, it gives None.



In [33]:
import re

text : str = "Hello world"

result = re.match(r'Hello', text)
print(result.group())  # ✅ Output: Hello

result = re.match(r'world', text)
print(result)  #  Output: None


Hello
None


# &#x25C6; **re.findall() method**.
- The `re.findall()` function searches through the entire string and returns all non-overlapping matches of a pattern as a list of strings.

- It does not return match objects — it returns the actual matched substrings.
# &#x25C6; **Parameters**.
- `pattern`: The regular expression you want to search with.

- `string`: The string to search in.

- `flags`: Optional flags like re.IGNORECASE.

In [50]:
import re

text1 = "Phone: 123-456-7890, Code: 007"
matches = re.findall(r'\d+', text1)
print(matches) # Output: ['123', '456', '7890', '007']

text2 = "Hello, how are you?"
matches = re.findall(r'\w+', text2)
print(matches) # Output: ['Hello', 'how', 'are', 'you']

text = "Name: John, Age: 30, city: 3"
matches = re.findall(r'(\w+): (\w+)', text)
print(matches) # Output: [('Name', 'John'), ('Age', '30')]



['123', '456', '7890', '007']
['Hello', 'how', 'are', 'you']
[('Name', 'John'), ('Age', '30'), ('city', '3')]


# **&#x25C6; re.sub()**.
- Here `sub` stands for `substitute`.
- It searches for a pattern in a string and replaces it with something else.
# **&#x25C6; Parameters**.
- `pattern`: The regex pattern to search for.

- `replacement`: The string to replace each match with.

- `string`: The original text.

- `count` (optional): The number of replacements (default = 0, meaning replace all).

- `flags` (optional): For flags like re.IGNORECASE.

In [None]:
import re

text = "My phone number is 123-456"
new_text = re.sub(r'\d', '#', text, 3) # replace first 3 digits with '#'
print(new_text) # Output: My phone number is ###-###-####

# remove all the punctuation marks

text = "Hello! Are you there? Yes, I am."
cleaned = re.sub(r'[^\w\s]', '', text)
print(cleaned) # Output: Hello Are you there Yes I am


My phone number is ###-456
Hello Are you there Yes I am


# **&#x25C6; re.split()**.
- This method is used to **split** a string by the occurrences of a pattern.
# **&#x25C6; Parameters**.
- `pattern`: The regular expression pattern used to split the string.

- `string`: The string you want to split.

- `maxsplit` (optional): Maximum number of splits. 0 means no limit.

- `flags` (optional): Special matching behavior like re.IGNORECASE.
# **&#x25C6; Returns:**.
- A list of strings split by the specified pattern.
- And None if not match.


In [10]:
import re

text1 = "apple,banana;cherry orange"
result = re.split(r'[;, ]', text1, 0)
# Output: ['apple', 'banana', 'cherry', 'orange']
print(result)

text2 = "one two,three four five"
result = re.split(r'\s', text2, maxsplit=2)
print(result)



['apple', 'banana', 'cherry', 'orange']
['one', 'two,three', 'four five']


# **&#x25C6; re.compile()**.
- Saves the pattern so you can reuse it with different regex methods like:
    - `.match()`

    - `.search()`

    - `.findall()`

    - `.sub()`
    
    - `.split()`
# **&#x25C6; Parameters**.
- `pattern`: The regular expression or pattern to save, as a string.

- `flags` (optional): Modify the regex behavior, e.g., re.IGNORECASE, re.MULTILINE.

In [22]:
import re

pattern1 = re.compile(r'\d+')  # Matches one or more digits
result = pattern1.findall("Item 1 costs 30 dollars")
print(result) # Output: ['1', '30']

pattern2 = re.compile(r'hello', re.IGNORECASE)
result = pattern2.match("Hello world")
print(result)


['1', '30']
<re.Match object; span=(0, 5), match='Hello'>
