## Regular expression (regex) is a sequence of characters that forms a search pattern. It is a powerful tool used for pattern matching and searching within strings. 
<br>

- ### Regular expressions provide a concise and flexible way to match, locate, and manipulate text based on specific patterns.
<br>

- ### In Python, the re module provides functions for working with regular expressions. 
- ### It allows you to perform various operations such as pattern matching, searching, substitution, and splitting of strings based on specific patterns.

In [2]:
import re

## Methods
<br>

- ### re.match()
- ### re.search()
- ### re.findall()
- ### re.finditer()
- ### re.split()
- ### re.sub()
- ### re.subn()
- ### re.compile()
<br>

## Attributes
<br>

- ### re.IGNORECASE or re.I
- ### re.MULTILINE or re.M
- ### re.DOTALL or re.S
- ### .group()
- ### .span()

<br>

- ### The re module in Python provides several functions for working with regular expressions. 

<br><br>
- ### re.match(pattern, string, flags=0)
<br>

- ### Attempts to match the pattern at the beginning of the string. 
- ### Returns a match object if successful or None otherwise.

In [4]:
string = "0abc8d"

var = re.match(r"[a-z]", string)

print(var)

None


In [6]:
# now using iteration
for i in string:
    t = re.match(r"[a-z]",i)
    print(t)

None
<re.Match object; span=(0, 1), match='a'>
<re.Match object; span=(0, 1), match='b'>
<re.Match object; span=(0, 1), match='c'>
None
<re.Match object; span=(0, 1), match='d'>


In [7]:
# now using iteration
for i in string:
    t = re.match(r"[a-z]",i)
    if t:
        print(i,end="")

abcd

### Used in Strings

In [8]:
# Example 1

pattern = r'^\d{3}-\d{3}-\d{4}$'

phone_number = '123-456-7890'

match = re.match(pattern, phone_number)

if match:
    print('Valid phone number')
else:
    print('Invalid phone number')

Valid phone number


## Used In Iteration

In [10]:
# Example 2 

pattern = r'^Hello'
strings = ["Hello, World!", "Hi there!", "Hello, OpenAI"]

for string in strings:
    match = re.match(pattern, string)
    print(match)

<re.Match object; span=(0, 5), match='Hello'>
None
<re.Match object; span=(0, 5), match='Hello'>


In [17]:
# now print if the string are match

for string in strings:
    match = re.match(pattern, string)
    if match:
        print(string)

Hello, World!
Hello, OpenAI


<br>

### re.search(pattern, string, flags=0)
<br>

- ### Searches the string for a match to the pattern. 
- ### Returns a match object if a match is found or None otherwise.

In [11]:
pattern = r'^\d{3}-\d{3}-\d{4}$'
phone_number = '123-456-7890'
re.search(pattern,phone_number)

<re.Match object; span=(0, 12), match='123-456-7890'>

In [14]:
re.search(pattern,phone_number).group(0)

'123-456-7890'

In [48]:
# Example 2 

pattern = r'^Hello'
strings = ["Hello, World!", "Hi there!", "Hello, OpenAI"]

for string in strings:
    match = re.search(pattern, string)
    if match:
        print(string)

Hello, World!
Hello, OpenAI


<br>

- ### re.findall(pattern, string, flags=0)
- ### Returns all non-overlapping matches of the pattern in the string as a list of strings.

In [15]:
sentence = "The quick brown fox jumps over the lazy dog"
word = "the"

matches = re.findall(word, sentence)
matches

['the']

In [16]:
re.findall(word, sentence,re.IGNORECASE)

# re.IGNORECASE

['The', 'the']

In [17]:
# Example 2

text = "The event dates are 01-05-2022, 15-06-2022, and 30-07-2022."
pattern = r"\d{2}-\d{2}-\d{4}"

dates = re.findall(pattern, text)
print(dates)


['01-05-2022', '15-06-2022', '30-07-2022']


<br>

- ### re.finditer(pattern, string, flags=0)

- ### Returns an iterator yielding match objects for all non-overlapping matches of the pattern in the string.


In [60]:
text = "Contact us at info@example.com or support@example.com for any inquiries."
pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b"

matches = re.finditer(pattern, text)

matches

<callable_iterator at 0x7f2cec6dd6f0>

In [61]:
# use loop
for match in matches:
    print(match)

<re.Match object; span=(14, 30), match='info@example.com'>
<re.Match object; span=(34, 53), match='support@example.com'>


In [67]:
# get group
for match in matches:
    print(match.group())

# code not execute because  matches is the generator

In [68]:
# run again
matches = re.finditer(pattern, text)
for match in matches:
    print(match.group())



info@example.com
support@example.com


<br>

- ### re.split(pattern, string, maxsplit=0, flags=0)
- ### Splits the string by the occurrences of the pattern and returns a list of substrings.

In [72]:
text = "123abc456def789"
parts = re.split(r"(\d+)", text)
parts


['', '123', 'abc', '456', 'def', '789', '']

In [77]:
# maxsplit parameter
re.split(r"(\d+)", text,maxsplit=1) 

['', '123', 'abc456def789']

<br>

- ### re.sub(pattern, repl, string, count=0, flags=0)
- ### Replaces all occurrences of the pattern in the string with the replacement string

In [78]:
text = "The quick brown fox jumps over the lazy dog."

re.sub(r"fox", "cat", text)

# replace fox to cat

'The quick brown cat jumps over the lazy dog.'

In [80]:
# Example 2

text = "Hello123World456"
re.sub(r"\d", "", text)

# replace \d digits with  ""


'HelloWorld'

In [82]:
# Replace multiple patterns in a string using a dictionary:

text = "I have an apple and a banana."
patterns = {"apple": "orange", "banana": "grape"}


print("\b(" + "|".join(patterns.keys()) + r")\b")

'I have an orange and a grape.'

In [84]:
re.sub(r"\b(" + "|".join(patterns.keys()) + r")\b", lambda match: patterns[match.group(0)], text)

'I have an orange and a grape.'

<br>

- ### re.subn(pattern, repl, string, count=0, flags=0)
- ### Similar to re.sub(), but returns a tuple containing the modified string and the number of substitutions made

In [86]:
text = "Hello123World456"
new_text, count = re.subn(r"\d", "", text)

In [87]:
count

6

In [88]:
new_text

'HelloWorld'

In [89]:
# Replace multiple patterns in a string using a dictionary and count the number of replacements

text = "I have an apple and a banana."
patterns = {"apple": "orange", "banana": "grape"}

new_text, count = re.subn(r"\b(" + "|".join(patterns.keys()) + r")\b", lambda match: patterns[match.group(0)], text)

In [90]:
count

2

In [91]:
new_text

'I have an orange and a grape.'

<br>

- ### The re.compile() function in Python is used to compile a regular expression pattern into a pattern object. 

- ### There are several reasons why you might want to use re.compile()
<br>

- ### Performance: Compiling a regular expression pattern with re.compile() improves performance by reusing the compiled pattern object for multiple matching operations.
<br>

- ### Code readability: Assigning the compiled pattern object to a variable with a meaningful name enhances code readability and maintainability.
<br>

- ### Dynamic pattern modification: re.compile() allows you to store the pattern object in a variable and modify it dynamically based on conditions or inputs.
<br>

- ### Error handling: re.compile() raises a re.error exception immediately if there is a syntax error in the pattern, enabling explicit error handling and easier debugging.
<br>

- ## re.compile() provides flexibility, improves performance, enhances code organization, and allows for dynamic pattern modification and explicit error handling

In [92]:
pattern = re.compile(r'hello')
text = 'hello world'

match = pattern.search(text)
match

<re.Match object; span=(0, 5), match='hello'>

In [93]:
if match:
    print('Pattern found')
else:
    print('Pattern not found')

Pattern found


In [96]:
# Example 2: Case-insensitive matching

# user input is
pattern_str = input('Enter a pattern: ')
pattern = re.compile(pattern_str)

text = 'This is some text'

pattern.search(text)

<re.Match object; span=(2, 4), match='is'>

In [97]:
# we use all functions with re.compiler

pattern.findall(text)

['is', 'is']

<br>

- ### re.IGNORECASE or re.I
<br>

- ### This attribute enables case-insensitive matching. It allows matching regardless of whether letters are uppercase or lowercase.

In [117]:
pattern = re.compile(r'hello')  # without  re.I
pattern.findall('Hello, World!')

[]

In [118]:
pattern = re.compile(r'hello', re.I)  # Case-insensitive matching
pattern.findall('Hello, World!')

['Hello']

<br>

- ### re.MULTILINE or re.M: 

<br>

- ### This attribute enables multiline matching. 
- ### It changes the behavior of ^ and $ anchors to match the start and end of each line instead of the whole string.

In [130]:
text = '''Hello, World!
Welcome to Python
Regex is powerful Python
'''

In [131]:
pattern = re.compile(r'Python$') # without Multiline
pattern.findall(text)

['Python']

In [132]:
pattern = re.compile(r'Python$',re.MULTILINE)
pattern.findall(text)

['Python', 'Python']

<br>

- ### re.DOTALL or re.S

<br>

- ### This attribute enables dot-all matching. 
- ### It allows the dot (.) metacharacter to match any character, including newlines.

In [144]:
text = '''Hello,
World!
Welcome to
Python
Regex is powerful
'''

pattern = re.compile(r'Hello.*Regex')
pattern.findall(text)


[]

In [149]:
pattern = re.compile(r'Hello.*Regex', re.DOTALL)
pattern.findall(text)


['Hello,\nWorld!\nWelcome to\nPython\nRegex']

### .group

<br>

- ### The group() method is available in Python's regular expressions module (re) and is used to retrieve the matched substring or captured groups within the pattern. 
- ### The group() method is a method of the Match object returned by various re functions, such as  : 
    - ### search()
    - ### findall()
    - ### finditer().

In [151]:
text = "Hello, World! Welcome to Python"
pattern = re.compile(r"\b\w+\b")  # Match individual words

match = pattern.search(text)
match

<re.Match object; span=(0, 5), match='Hello'>

In [155]:
match.group(0)

'Hello'

In [156]:
# example 2

text = "John Doe, 30 years old"
pattern = re.compile(r"(\w+) (\w+), (\d+) years old")  # Match name and age
match = pattern.search(text)

match

<re.Match object; span=(0, 22), match='John Doe, 30 years old'>

In [157]:
match.group(0)

'John Doe, 30 years old'

In [158]:
match.group(1)

'John'

In [159]:
match.group(3)

'30'

## .span

<br>

- ### The span() method is available in Python's regular expressions module (re) and is used to retrieve the starting and ending indices of the matched substring within the input string. 
- ### The span() method is a method of the Match object returned by various re functions, such as :
    - ### search()
    - ### findall()
    - ### finditer()

In [169]:
text = "Hello, World! Welcome to Python"
pattern = re.compile(r"\b\w+\b")  # Match individual words
match = pattern.search(text)
match

<re.Match object; span=(0, 5), match='Hello'>

In [165]:
match.span()

(0, 5)

In [168]:
text[match.span()[0]:match.span()[-1]]

'Hello'