<h1>Comprehensive Guide to Python Regular Expressions</h1>

<p>Regular Expressions (regex) are powerful tools for pattern matching and string manipulation. Python provides the <code>re</code> module to work with regex efficiently. This guide covers the basics, common constructs, and advanced usage of regex with examples.</p>

<hr>

<h2>What Are Regular Expressions?</h2>
<p>Regular Expressions are patterns used to match character combinations in strings. They can validate input, search for patterns, or manipulate text.</p>

<hr>

<h2>Commonly Used Functions in the <code>re</code> Module</h2>

<h3>1. <code>re.match()</code></h3>
<ul>
  <li>Matches a pattern only at the beginning of the string.</li>
</ul>
<pre><code>import re
result = re.match(r'\d+', '123abc456')
print(result.group())  # Output: 123
</code></pre>

<h3>2. <code>re.search()</code></h3>
<ul>
  <li>Searches the entire string for a match.</li>
</ul>
<pre><code>result = re.search(r'\d+', 'abc123xyz')
print(result.group())  # Output: 123
</code></pre>

<h3>3. <code>re.findall()</code></h3>
<ul>
  <li>Returns a list of all matches.</li>
</ul>
<pre><code>result = re.findall(r'\d+', 'abc123xyz456')
print(result)  # Output: ['123', '456']
</code></pre>

<h3>4. <code>re.finditer()</code></h3>
<ul>
  <li>Returns an iterator yielding match objects.</li>
</ul>
<pre><code>for match in re.finditer(r'\d+', 'abc123xyz456'):
    print(match.group())  # Output: 123 456
</code></pre>

<h3>5. <code>re.sub()</code></h3>
<ul>
  <li>Replaces matches with a specified string.</li>
</ul>
<pre><code>result = re.sub(r'\d+', '*', 'abc123xyz456')
print(result)  # Output: abc*xyz*
</code></pre>

<h3>6. <code>re.split()</code></h3>
<ul>
  <li>Splits a string by the matches.</li>
</ul>
<pre><code>result = re.split(r'\d+', 'abc123xyz456')
print(result)  # Output: ['abc', 'xyz', '']
</code></pre>

<hr>


# In Python, a raw string is a string prefixed with the letter r or R. It is primarily used to treat backslashes (\) in the string as literal characters, rather than as escape characters. This is especially useful when dealing with strings that contain a lot of backslashes, such as file paths or regular expressions.

In [3]:
normal_string = "C:\Users\John\Documents"
print(normal_string)  # Output: C:\Users\John\Documents


SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (982956812.py, line 1)

In [2]:
raw_string = r"C:\Users\John\Documents"
print(raw_string)  # Output: C:\Users\John\Documents


C:\Users\John\Documents


In [4]:
normal = "This is a newline:\nAnd this is a tab:\tEnd."
raw = r"This is a newline:\nAnd this is a tab:\tEnd."

print(normal)
# Output:
# This is a newline:
# And this is a tab:   End.

print(raw)
# Output: This is a newline:\nAnd this is a tab:\tEnd.


This is a newline:
And this is a tab:	End.
This is a newline:\nAnd this is a tab:\tEnd.


In [1]:
import re

In [6]:
score='Sachin scores 76 Dravid scores 60 Rohit scores 89 Dhoni scores 99'
name=re.findall(r'[A-Z][a-z]*',score)
age=re.findall(r'\d{2}',score)

In [7]:
print(name)
print(age)

['Sachin', 'Dravid', 'Rohit', 'Dhoni']
['76', '60', '89', '99']


In [12]:
str='Her name is Adwitiya and Adwitiya is a cute girl'
if re.search('Adwitiya',str):
    print("search found")
    match=re.findall('Adwitiya',str)
    print(match)
    

search found
['Adwitiya', 'Adwitiya']


In [15]:
for i in re.finditer('Adwitiya',str):
    index=i.span()
    print(index)
    print(i.group())

(12, 20)
Adwitiya
(25, 33)
Adwitiya


In [16]:
str2='Rat Cat Pet Mat Sat'
data=re.findall('[RCM]at',str2)
print(data)

['Rat', 'Cat', 'Mat']


In [25]:
str2='Rat Cat Pet Mat Sat Qat'
data=re.findall('[^RCM]at',str2)
print(data)

['Sat', 'Qat']


In [20]:
data2=re.findall('[^P-R]at',str2)
print(data2)

['Cat', 'Mat', 'Sat']


In [27]:
reg=re.compile('[R]at')
str2=reg.sub('Lion',str2)
print(str2)

Lion Cat Pet Mat Sat Qat


In [30]:
str4='''Learn Python
Python is a popular programming language
Python can be used on a server to create web applications
'''
print(str4)

Learn Python
Python is a popular programming language
Python can be used on a server to create web applications



In [31]:
reg=re.compile('\n')
str5=reg.sub(' ',str4)
print(str5)


Learn Python Python is a popular programming language Python can be used on a server to create web applications 


In [32]:
import re

pattern = 'AB123XYZ23'

# Find all digits
digits = re.findall(r'\d', pattern)

# Find all alphabets
alphabets = re.findall(r'[a-zA-Z]', pattern)

# Count them
num_digits = len(digits)
num_alphabets = len(alphabets)

print(f"Number of digits: {num_digits}")
print(f"Number of alphabets: {num_alphabets}")


Number of digits: 5
Number of alphabets: 5


In [33]:
occurrences_of_2 = len(re.findall(r'2', pattern))

print(f"Number of occurrences of '2': {occurrences_of_2}")

Number of occurrences of '2': 2


In [34]:
import re

def validate_indian_phone_number(phone_number):
    pattern = r'^[789]\d{9}$'
    
    # Validate phone number
    if re.match(pattern, phone_number):
        return True
    else:
        return False

# Test cases
phone_number = '9876543210'
if validate_indian_phone_number(phone_number):
    print(f"{phone_number} is a valid Indian phone number.")
else:
    print(f"{phone_number} is not a valid Indian phone number.")


9876543210 is a valid Indian phone number.


In [None]:
Explanation:
^[789]: Ensures the phone number starts with either 7, 8, or 9.
\d{9}: Ensures the next 9 characters are digits (\d is a digit,
and {9} means exactly 9 occurrences).
$: Ensures the number has no extra characters after the 10 digits.

In [35]:
import re

def validate_email(email):
    # Basic regex pattern for email validation
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    
    # Validate email
    if re.match(pattern, email):
        return True
    else:
        return False

# Test cases
email = 'example.email@domain.com'
if validate_email(email):
    print(f"{email} is a valid email address.")
else:
    print(f"{email} is not a valid email address.")


example.email@domain.com is a valid email address.


In [40]:
import re

def validate_name(name):
    # Regex pattern for name validation (allows letters, spaces, and hyphens)
    pattern = r'^[a-zA-Z\s\-]+$'
    
    # Validate name
    if re.match(pattern, name):
        return True
    else:
        return False

# Test cases
name = 'Sumit Kumar'
if validate_name(name):
    print(f"{name} is a valid name.")
else:
    print(f"{name} is not a valid name.")


Sumit Kumar is a valid name.


In [None]:
Explanation:
^[a-zA-Z\s\-]+$:
^ asserts the start of the string.
[a-zA-Z\s\-] matches any uppercase or lowercase letter, a space, or a hyphen.
+ matches one or more of the preceding characters.
$ asserts the end of the string.

In [39]:
import re

def validate_name(name):
    # Regex pattern for name validation (allows letters, spaces, and hyphens)
    pattern = r'\w{2,20}\s\w{2,20}'
    
    # Validate name
    if re.match(pattern, name):
        return True
    else:
        return False

# Test cases
name = 'Sumit Kumar'
if validate_name(name):
    print(f"{name} is a valid name.")
else:
    print(f"{name} is not a valid name.")


Sumit Kumar is a valid name.
