# RegEx
What is a Regular Expression? A Regular Expression (RegEx) is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Python’s re Module Python has a built-in package called re, which can be used to work with Regular Expressios1. Here’s how you can import it:

In [1]:
import re

Basic RegEx Functions The re module offers a set of functions that allows us to search a string for a match:  
- findall: Returns a list containing all matches.  
- search: Returns a Match object if there is a match anywhere in the string.  
- split: Returns a list where the string has been split at each match.  
- sub: Replaces one or many matches with a string.  ring.

In [7]:
pattern = 'apple'  # Pattern to search for
text = 'I have an apple and an orange'
match = re.search(pattern, text)
if match:
    print('Pattern found!')
else:
    print('Pattern not found!')

Pattern found!


In [31]:
pattern = r'\d+'  # Matches one or more digits
text = 'I have 3 apples and 5 oranges'
matches = re.findall(pattern, text)
print(matches)  # Output: ['3', '5']

['3', '5']


#### The 'r' before the string signifies a raw string literal, ensuring that backslashes are treated as literal characters, which is a common practice when working with regular expressions in Python.

In [33]:
pattern = r'\W+'  # Matches one or more non-alphanumeric characters
text = 'apple,orange;banana grape12 34qwer'
split_text = re.split(pattern, text)
print(split_text)  # Output: ['apple', 'orange', 'banana', 'grape']

['apple', 'orange', 'banana', 'grape12', '34qwer']


In [12]:
pattern = r'\d+'  # Matches one or more digits
text = 'I have 3 apples and 5 oranges'
replacement = 'X'  # Replace digits with 'X'
new_text = re.sub(pattern, replacement, text)
print(new_text)  # Output: 'I have X apples and X oranges'

I have X apples and X oranges


## Metacharacters

. (dot): Matches any single character except newline.

In [34]:
pattern = 'c.t'  # Matches 'cat', 'cot', 'cut', etc.
text = 'The cat sat on the mat cut.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'cot']

['cat', 'cut']


^ (caret): Matches the start of the string.

In [37]:
pattern = '^The'  # Matches 'The' only at the start of the string
text = 'The cat sat on the mat.'
if re.search(pattern, text):
    print('Pattern found at the start!')

Pattern found at the start!


$ (dollar): Matches the end of the string.

In [39]:
pattern = 'mat.$'  # Matches 'mat.' only at the end of the string
text = 'The cat sat on the mat.'
if re.search(pattern, text):
    print('Pattern found at the end!')

Pattern found at the end!


'*' (asterisk): Matches zero or more occurrences of the preceding character.

In [41]:
pattern = 'co*l'  # Matches 'col', 'cool', 'coool', etc.
text = 'The cl col cat sat on the coooooooooool mat.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['col', 'cool', 'coool']


['cl', 'col', 'coooooooooool']


'+' (plus): Matches one or more occurrences of the preceding character.

In [42]:
pattern = 'co+l'  # Matches 'col', 'cool', 'coool', but not 'cl'
text = 'The cl col cat sat on the coooooooooool mat.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['col', 'cool', 'coool']


['col', 'coooooooooool']


? (question mark): Matches zero or one occurrence of the preceding character.

In [43]:
pattern = 'colou?r'  # Matches 'color' or 'colour'
text = 'The color is blue, but the colour is green.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['color', 'colour']

['color', 'colour']


[ ]: Matches any single character within the brackets.

In [45]:
pattern = 'gr[am]y'  # Matches 'gray' or 'grey'
text = 'The sky is gray, but the cat is grey grmy.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['gray', 'grey']

['gray', 'grmy']


| (pipe): Acts like a logical OR, matches either the expression before or after it.

In [23]:
pattern = 'cat|dog'  # Matches 'cat' or 'dog'
text = 'I have a cat and a dog.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cat', 'dog']

['cat', 'dog']


## Quantifiers

'*': Zero or more occurrences.  

Already explained in the dot example.  

'+': One or more occurrences.  

Already explained in the plus example.  

'?': Zero or one occurrence.  

Already explained in the question mark example.  

{n}: Exactly n occurrences.  urrences.

In [24]:
pattern = r'co{2}l'  # Matches 'cool'
text = 'The col cat sat on the cool mat.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cool']

['cool']


{n,}: At least n occurrences.

In [46]:
pattern = r'co{2,}l'  # Matches 'cool' and 'coool'
text = 'The coooooooool cat sat on the cool mat.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['cool', 'coool']

['coooooooool', 'cool']


{n,m}: Between n and m occurrences.

In [47]:
pattern = 'co{1,2}l'  # Matches 'col' and 'cool'
text = 'The col cat sat on the cool mat.'
matches = re.findall(pattern, text)
print(matches)  # Output: ['col', 'cool']

['col', 'cool']


## Groups and Capturing:

In [48]:
pattern = r'(\d{2})-(\d{2})-(\d{4})'  # Matches date in the format DD-MM-YYYY
text = 'Today is 16-05-2024.'
matches = re.findall(pattern, text)
print(match)
for match in matches:
    print(f"Day: {match[0]}, Month: {match[1]}, Year: {match[2]}")
# Output: Day: 16, Month: 05, Year: 2024

('16', '05', '2024')
Day: 16, Month: 05, Year: 2024
