# RE


In Python, 're' refers to the built-in module that provides support for regular expressions. Regular expressions often abbreviated as regex or RE. This is powerful tools used for pattern matching within strings.

### The re module allows you to:

    Search for patterns : Find specific sequences of characters within a larger string.
    Match patterns      : Determine if a string or a part of a string conforms to a defined pattern.
    Replace text        : Substitute occurrences of a pattern with a different string.
    Split strings       : Divide a string into a list of substrings based on a pattern.
    Extract information : Pull out specific data like email addresses, phone numbers or dates from text. 

### Commonly used functions within the re module include: 

    re.search()         : Finds the first occurrence of a pattern in a string.
    re.findall()        : Returns a list of all non-overlapping matches of a pattern.
    re.sub()            : Replaces occurrences of a pattern with a specified replacement string.
    re.split()          : Splits a string by occurrences of a pattern.
    re.match()          : Checks if the pattern matches at the beginning of the string.

In [1]:
import re

### Findall()

Special Sequences:

    w = word
    s = spaces
    d = digit
    b = restrict the matches

In [2]:
string = """PAN Numbers
rohit_sharma123@gmail.com
rohit0123_@gmail.com
ASDFG9876W
DFGHJ4567V
FGBVJD4562J
HIOPA1234KO
IJYUA47896P
1234 2932 4313
5422 2454 1343 @#$%^&
5422 2454 13436 @#$%^&
"""

# E.g.:
valid_mail = re.findall("[a-z_]{1,15}[0-9]+[@][a-z.]+", string, flags=re.ASCII)
valid_pan = re.findall(r"\b[A-Z]{5}[0-9]{4}[A-z]\b", string)
valid_aadhaar = re.findall(r"\b[\d]{4}[ ]\d{4}[ ]\d{4}\b", string)
print(f"Valid Mail    : {valid_mail}\nValid PAN     : {valid_pan}\nValid Aadhaar : {valid_aadhaar}")

Valid Mail    : ['rohit_sharma123@gmail.com']
Valid PAN     : ['ASDFG9876W', 'DFGHJ4567V']
Valid Aadhaar : ['1234 2932 4313', '5422 2454 1343']


In [3]:
data = """
Agentic AI refers to autonomous artificial intelligence systems capable of setting goals,
planning, and executing tasks with minimal human intervention to achieve objectives in dynamic environments.
"""

five_letter_words = re.findall(r"\b[a-z]{5}\b", data.lower())
print(five_letter_words)

['goals', 'tasks', 'human']


In [4]:
dates = """
Dates:
02-09-2025
02/09/2025
15/12/2025
25-10-1992
12.07.1998
32-21-2000
"""

valid_dates = re.findall("[0-3][0-9][-/.][0-1][0-9][-/.][0-9]{4}", dates)
print(f"Valid Dates : {valid_dates}")

Valid Dates : ['02-09-2025', '02/09/2025', '15/12/2025', '25-10-1992', '12.07.1998']


In [5]:
from dateutil import parser

date1 = '18.08.2023'
date2 = "02-09-2025"
date3 = "2014/08/15"
date4 = "2025 Jul 15"
date5 = "Jul-01-2025 15:5:14"

parser.parse(date5, fuzzy=False)


datetime.datetime(2025, 7, 1, 15, 5, 14)

### Splitting Date & Time Seperately From String

In [6]:
mixed_string = """
05.12.2016 15:58:31 jsfkgskdlfsgfdslkcbj 05.13.2016 15:58:31  sdfasjdcakscsdv
01.14.2016fsdkjgfdhkvjxdbvxd
"""

textdate = re.findall("[0-3][0-9][.][0-1][0-9][.][0-9]{4}", mixed_string)
texttime = re.findall("[0-9]{2}[:][0-9]{2}[:][0-9]{2}", mixed_string)
print(textdate,texttime)

['05.12.2016', '05.13.2016', '01.14.2016'] ['15:58:31', '15:58:31']


### Combining Matched Date and Time From Text

In [7]:
first_res = list(map(lambda x: " ".join(x), zip(textdate,texttime)))
first_res

['05.12.2016 15:58:31', '05.13.2016 15:58:31']

### Matching Date & Time Togather by Findall()

In [8]:
text_date_time = re.findall("[0-3][0-9][.][0-1][0-9][.][0-9]{4}[ ][0-9]{2}[:][0-9]{2}[:][0-9]{2}", mixed_string)
print(text_date_time)

['05.12.2016 15:58:31', '05.13.2016 15:58:31']
