### **Important Functions of the `re` Module**

The `re` module in Python provides regular expression matching operations. Here's a list of some important functions:

1.  **`match()`:** Attempts to match the pattern at the *beginning* of the string.

2.  **`fullmatch()`:** Attempts to match the *entire* string against the pattern.

3.  **`search()`:** Searches for the first occurrence of the pattern anywhere in the string.

4.  **`findall()`:** Returns a list of all non-overlapping matches in the string.

5.  **`finditer()`:** Returns an iterator yielding match objects for all non-overlapping matches.

6.  **`sub()`:** Replaces one or more occurrences of the pattern with a replacement string.

7.  **`subn()`:** Similar to `sub()`, but also returns the number of substitutions made.

8.  **`split()`:** Splits the string by occurrences of the pattern.

9.  **`compile()`:** Compiles a regular expression pattern into a regex object, which can be used for matching using its `match()`, `search()` and other methods. This is useful for efficiency if the same pattern is used multiple times.

#### 1. `match()`

The `match()` function from Python's `re` module is used to check if a given pattern is present at the *beginning* of a target string.

*   If a match is found at the beginning of the string, the function returns a Match object.
*   If no match is found at the beginning, the function returns `None`.

**Match Object Methods:**

If a match is found (and a Match object is returned), you can use the following methods on the Match object (let's call it `m`):

*   **`m.start()`:** Returns the starting index of the match within the string (always 0 for `match()` as it matches from the beginning).
*   **`m.end()`:** Returns the ending index + 1 of the match.
*   **`m.group()`:** Returns the matched string.



In [2]:
import re

In [4]:
p=input("Enter some pattern to check: ")
m=re.match(p, "abcdefgh")
if m is not None:
    print(f"Target string start with {m.group()}")
else:
    print(f"Target string start with {p}")


Target string start with asdf


In [5]:
import re

string1 = "abcdefg"
string2 = "xyzabc"

pattern = "abc"

match1 = re.match(pattern, string1)
match2 = re.match(pattern, string2)

if match1:
    print(f"Match found in string1: {match1.group()}, start: {match1.start()}, end: {match1.end()}")
else:
    print("No match found in string1")

if match2:
    print(f"Match found in string2: {match2.group()}, start: {match2.start()}, end: {match2.end()}")
else:
    print("No match found in string2")

Match found in string1: abc, start: 0, end: 3
No match found in string2


In [2]:
import re
num=input("Enter mobile number to validate: ")
pattern='[6-9][0-9]{9}'
match_p=re.fullmatch(pattern, num)
if match_p is not None:
    print(num)
    print("Valid 10-digit mobile number")
else:
    print(num)
    print("Invalid 10-digit mobile number")


9970099092
Valid 10-digit mobile number


In [8]:
num='99009900'
pattern='[6-9][0-9]{9}'
match_p=re.fullmatch(pattern, num)
if match_p is not None:
    print("Valid 10-digit mobile number")
else:
    print("Invalid 10-digit mobile number")

Invalid 10-digit mobile number


In [3]:
num='56756756'
pattern='[6-9][0-9]{9}'
match_p=re.fullmatch(pattern, num)
if match_p is not None:
    print(f"Valid {len(num)}-digit mobile number")
else:
    print(f"Invalid {len(num)}-digit mobile number")

Invalid 8-digit mobile number


In [12]:
match=re.search('bbb', 'aaabbaabbaabbabbb')
if match is not None:
    print(f"Match is avilable at: {match.start()}")
else:
    print(f"Match is not avilable")

Match is avilable at: 14


In [13]:
import re

# Sample string
text = "Hello, my name is Akshay."

# Pattern to search for
pattern = r"Akshay"

# Using re.search() to search for the pattern
match = re.search(pattern, text)

if match:
    print("Pattern found!")
    print("Start index:", match.start())  # Start index of the match
    print("End index:", match.end())      # End index of the match
    print("Matched string:", match.group())  # Matched string
else:
    print("Pattern not found!")


Pattern found!
Start index: 18
End index: 24
Matched string: Akshay


#### 3. `search()`

The `search()` function from Python's `re` module is used to search for a given pattern *anywhere* within a target string.

*   If a match is found, the function returns a Match object representing the *first* occurrence of the match.
*   If no match is found, the function returns `None`.

**Key Difference from `match()`:**

Unlike `match()`, which only checks for a match at the beginning of the string, `search()` searches throughout the entire string.



In [14]:
string1 = "abcdefg"
string2 = "xyzabcpqr"
string3 = "xyzpqr"

pattern = "abc"

match1 = re.search(pattern, string1)
match2 = re.search(pattern, string2)
match3 = re.search(pattern, string3)

if match1:
    print(f"Match found in string1: {match1.group()}, start: {match1.start()}, end: {match1.end()}")
else:
    print("No match found in string1")

if match2:
    print(f"Match found in string2: {match2.group()}, start: {match2.start()}, end: {match2.end()}")
else:
    print("No match found in string2")

if match3:
    print(f"Match found in string3: {match3.group()}, start: {match3.start()}, end: {match3.end()}")
else:
    print("No match found in string3")

Match found in string1: abc, start: 0, end: 3
Match found in string2: abc, start: 3, end: 6
No match found in string3


In [16]:
l=re.findall('[0-9]', 'a7b9kz@kmn4')
print(l)

['7', '9', '4']


In [18]:
l = re.finditer('[0-9]', 'a7b9kz@kmn4')
for match in l:
    print(f"Match found at: {match.start()}")

Match found at: 1
Match found at: 3
Match found at: 10


In [19]:
s=re.sub('[0-9]', '#', 'a7b9kz@kmn4')
print(s) 


a#b#kz@kmn#


In [33]:
import re

# Sample string with a complex format (email with name)
text = "Contact me at john.doe@example.com for further details."

# Pattern to search for email and extract username and domain using named groups
pattern = r"(?P<username>[a-zA-Z0-9._%+-]+)@(?P<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"

# Using re.search() to find the first match
match = re.search(pattern, text)

if match:
    print("Pattern found!")
    print("Username:", match.group("username"))  # Extract username
    print("Domain:", match.group("domain"))      # Extract domain
    print("Full match:", match.group())          # Extract the full matched string
    print("Start index:", match.start())         # Start index of the match
    print("End index:", match.end())             # End index of the match
else:
    print("Pattern not found!")


Pattern found!
Username: john.doe
Domain: example.com
Full match: john.doe@example.com
Start index: 14
End index: 34



#### 6. `sub()`

The `sub()` function in Python's `re` module performs substitutions or replacements of matched patterns within a string.

**Syntax:**

```python
re.sub(pattern, replacement_string, target_string)
```

*   `pattern`: The regular expression pattern to search for.
*   `replacement_string`: The string that will replace the matched occurrences of the pattern.
*   `target_string`: The string in which the substitutions will be performed.

**Functionality:**

`sub()` searches for all non-overlapping occurrences of the `pattern` in the `target_string` and replaces them with the `replacement_string`. It then returns the modified string.


In [20]:

text = "The quick brown fox jumps over the lazy dog."
pattern = "fox"
replacement = "cat"

new_text = re.sub(pattern, replacement, text)
print(new_text)

text2 = "123 abc 456 abc 789"
pattern2 = "abc"
replacement2 = "xyz"
new_text2 = re.sub(pattern2, replacement2, text2)
print(new_text2)

text3 = "aaa bbb ccc"
pattern3 = "b"
replacement3 = "d"
new_text3 = re.sub(pattern3, replacement3, text3)
print(new_text3)

The quick brown cat jumps over the lazy dog.
123 xyz 456 xyz 789
aaa ddd ccc


#### 7. `subn()`

The `subn()` function in Python's `re` module is very similar to `sub()`, but with one key difference: it returns a tuple containing two elements:

1.  The modified string (with the substitutions made).
2.  The number of substitutions that were performed.

**Syntax:**

```python
re.subn(pattern, replacement_string, target_string)

In [24]:
text = "The quick brown fox jumps over the lazy fox."
pattern = "fox"
replacement = "cat"

result = re.subn(pattern, replacement, text)
print(result)

('The quick brown cat jumps over the lazy cat.', 2)


In [25]:

text2 = "123 abc 456 abc 789"
pattern2 = "abc"
replacement2 = "xyz"
result2 = re.subn(pattern2, replacement2, text2)
print(result2)


('123 xyz 456 xyz 789', 2)


In [26]:

text3 = "aaa bbb ccc"
pattern3 = "z" # no match
replacement3 = "d"
result3 = re.subn(pattern3, replacement3, text3)
print(result3)

('aaa bbb ccc', 0)


#### 8. `split()`

The `split()` function in Python's `re` module is used to split a target string into a list of substrings, based on occurrences of a specified pattern.

**Syntax:**

```python
re.split(pattern, target_string)

In [27]:
date_string = '27-11-2020'
result = re.split('-', date_string)
print(result)

for s in result:
    print(s)

['27', '11', '2020']
27
11
2020


In [28]:
text = "apple,banana,orange,grape"
result2 = re.split(',', text)
print(result2)

['apple', 'banana', 'orange', 'grape']


In [29]:
text2 = "one two three four"
result3 = re.split(r"\s+", text2) # Split by one or more whitespace characters
print(result3)

['one', 'two', 'three', 'four']


In [30]:
url = "[invalid URL removed]"
parts = re.split(r"//", url)  # Split by "//"
print(parts)

url2 = "[invalid URL removed]"
parts2 = re.split(r"//", url2)
print(parts2)

['[invalid URL removed]']
['[invalid URL removed]']


In [34]:
url = "https://www.example.com/path/to/page.html"
parts = re.split(r"//", url)
protocol = parts[0]
rest = parts[1]
domain_and_path = re.split(r"/", rest, 1)re only once
domain = domain_and_path[0]
path = "/" + domain_and_path[1] if len(domain_and_path) > 1 else ""

print(f"Protocol: {protocol}")
print(f"Domain: {domain}")
print(f"Path: {path}")

url2 = "http://www.example.com" # no path
parts2 = re.split(r"//", url2)
protocol2 = parts2[0]
rest2 = parts2[1]
domain_and_path2 = re.split(r"/", rest2, 1)
domain2 = domain_and_path2[0]
path2 = "/" + domain_and_path2[1] if len(domain_and_path2) > 1 else ""

print(f"Protocol: {protocol2}")
print(f"Domain: {domain2}")
print(f"Path: {path2}")

Protocol: https:
Domain: www.example.com
Path: /path/to/page.html
Protocol: http:
Domain: www.example.com
Path: 


In [35]:
import re

# Sample URL with query string
url_with_query = "https://www.example.com/search?q=python&lang=en"

# Split the URL into base URL and query string using re.split()
parts = re.split(r"\?", url_with_query)
base_url = parts[0]
query_string = parts[1] if len(parts) > 1 else ""

print(f"Base URL: {base_url}")
print(f"Query String: {query_string}")

# If there is a query string, split it into individual parameters
if query_string:
    query_params = re.split(r"&", query_string)
    print("Query Parameters:")
    for param in query_params:
        print(param)


Base URL: https://www.example.com/search
Query String: q=python&lang=en
Query Parameters:
q=python
lang=en


`match()` --> To check whether the given target string starts with specified pattern or not

`fullmatch()` --> To check whether total target string matched pattern or not

`search()` --> To return first occurrence of the match

`findall()` --> To return all matches

`finditer()` --> To return iterator object which yields Match object

`sub()` --> To replace every occurence of the pattern with provided replacement string

`subn()` --> same as `sub()` but also returns the number of occurrences

`split()` --> To split the target string based on given pattern.