# Regular Expressions or RegEx

## Raw Strings

In Python, the r prefix before a string is used to create a raw string. A raw string is a string that treats backslashes (\) as literal characters and doesn't interpret them as escape characters. This can be particularly useful when working with file paths, regular expressions, or any other situation where you want to ensure that backslashes are treated as plain text and not as escape characters.

In [22]:
# Escape sequence \f is "form feed" and \n is "new line"
normal_string = "C:\Documents\new_folder\file.txt"
print(normal_string)

C:\Documents
ew_folderile.txt


  normal_string = "C:\Documents\new_folder\file.txt"


In [24]:
# Raw string
raw_string = r"C:\Documents\new_folder\file.txt,"
print(raw_string)

C:\Documents\new_folder\file.txt


## RegEx
The re library in Python is a built-in module that provides support for working with regular expressions (regex). Regular expressions are powerful tools for pattern matching and text manipulation. The re module allows you to define and apply regular expressions to search for, match, and manipulate strings based on specific patterns.

## Pattern Matching

In [25]:
# importing re module
import re

In [26]:
# Create a pattern to match
pattern = "orange"

In [29]:
# find pattern in given text
text = "Oranges are my favorite fruit. Best Pakistani orange is found in Sargodha."
match = re.search(pattern, text)
print(match)

<re.Match object; span=(46, 52), match='orange'>


If a match is found, you can work with the match object to access information about the match. For example, to get the start and end positions of the match:

In [30]:
if match:
    start = match.start()
    end = match.end()
    print(f"Found '{match.group()}' at position {start}-{end}")
else:
    print("No match found")

Found 'orange' at position 46-52


### Meta Characters

Regex provides metacharacters to define more complex patterns. Some common metacharacters include:

* . (dot): Matches any character except a newline.
* \* (asterisk): Matches zero or more occurrences of the preceding character or group.
* \+ (plus): Matches one or more occurrences of the preceding character or group.
* ? (question mark): Matches zero or one occurrence of the preceding character or group.
* | (pipe): Acts like a logical OR, allowing you to match one of several options.


In [31]:
# dot character
text = "I like orange. Best orange is found in Sargodha. You know what is good about or3nge, it protects us from germs."
pattern = "or.nge"
matches = re.findall(pattern, text)
if matches:
    for match in matches:
        print("Found: ", match)
else:
    print("No matches found!")

Found:  orange
Found:  orange
Found:  or3nge


In [39]:
# asterik * meta character
text = "ac, abbc, abbbc, abbbbc, abbbbbc, abbbdc"
pattern = "ab*c"
matches = re.findall(pattern, text)
if matches:
    for match in matches:
        print("Found: ", match)
else:
    print("No matches found!")

Found:  ac
Found:  abbc
Found:  abbbc
Found:  abbbbc
Found:  abbbbbc


In [40]:
# asterik * meta character
text = "ac, abbc, abbbc, abbbbc, abbbbbc, abbbdc"
pattern = "ab+c"
matches = re.findall(pattern, text)
if matches:
    for match in matches:
        print("Found: ", match)
else:
    print("No matches found!")

Found:  abbc
Found:  abbbc
Found:  abbbbc
Found:  abbbbbc


Search pakistan in string and print it


In [46]:
text="i love pakistan"
word = "pakistan"
match = re.search(word,text)
print(match)

<re.Match object; span=(7, 15), match='pakistan'>


In [49]:
# ? meta character

text = "The color of the car is red. The colour of the house is blue."
pattern = r"colou?r"  # This pattern matches "color" or "colour" where the "u" is optional.

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: color
Found: colour


find apple and aple in string

In [51]:
text="aple is sweet i like apple"
pattern = "ap*le"
matches = re.findall(pattern, text)
if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: aple
Found: apple


In [44]:
# pipe meta character 

text = "cat dog fish bird"
pattern = r"cat|dog"  # This pattern matches either "cat" or "dog."

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: cat
Found: dog


### Character classes
You can use square brackets [] to specify a character class, allowing you to match any character from a defined set. For example, r"[aeiou]" matches any lowercase vowel

In [46]:
text = "apple banana cherry epple ipple "
pattern = r"[aei]pple"  # This pattern matches "apple" or "epple"

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: apple
Found: epple
Found: ipple


### Anchors
Anchors are used to specify the start or end of a line or string. ^ matches the start of a line, and $ matches the end.

In [59]:
text = "apple banana cherry"
pattern = r"^a"  # This pattern matches "apple" only if it appears at the start of a line.

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: a


In [57]:
text = "apple banana cherry"
pattern = r"y$"  # This pattern matches "cherry" only if it appears at the end of a line.

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: y


### Quantifiers
Quantifiers specify the number of repetitions for a character or group. For example, r"\d{2,4}" matches 2 to 4 digits.

In [62]:
text = " 1 123 4567 89 45678 12345"
pattern = r"\d{2,3}"  # This pattern matches sequences of 2 to 4 digits.

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: 123
Found: 456
Found: 89
Found: 456
Found: 78
Found: 123
Found: 45


### Escaping Metacharacters
If you need to match metacharacters as literal characters, you can escape them with a backslash. For example, r"\*" matches an asterisk character.

In [72]:
text = "The price is $10.99 (on sale) and (50% off)!"
pattern = r"\$10\.99 \(on sale\) and \(50%\ off\)!"  # This pattern matches the exact text with escaped metacharacters.

matches = re.findall(pattern, text)

if matches:
    for match in matches:
        print("Found:", match)
else:
    print("No matches found.")


Found: $10.99 (on sale) and (50% off)!


### Modifier Flags
You can use modifier flags like re.IGNORECASE to perform case-insensitive matching. 

In [67]:
import re

text = "The Quick Brown Fox"
pattern = "quick"  # This pattern is case-sensitive
matches = re.findall(pattern, text)
print("Case-sensitive match:", matches)

pattern = "quick"  # This pattern is case-insensitive
matches = re.findall(pattern, text, re.IGNORECASE)
print("Case-insensitive match:", matches)

Case-sensitive match: []
Case-insensitive match: ['Quick']


## String Searching 

Find fox ignoring Uppercase or Lowercase

In [69]:
text="Fox is red fox live in forest"

In [70]:
text = "The quick brown fox jumps over the lazy dog. The dog barks, and the fox runs away."

# Using the find() method to find the first occurrence of a substring
index = text.find("fox")
index

16

In [62]:
text = "The quick brown fox jumps over the lazy dog. The dog barks, and the fox runs away."

# Using the find() method to find the first occurrence of a substring
index = text.index("fox")
index

16

In [63]:
text = "The quick brown fox jumps over the lazy dog. The dog barks, and the fox runs away."

# Using the find() method to find the first occurrence of a substring
count = text.count("fox")
count

2

In [73]:
text = "The quick brown fox jumps over the lazy dog. The dog barks, and the fox runs away."

# Using the find() method to find the first occurrence of a substring
index = text.find("fox")
if index != -1:
    print(f"Found 'fox' at position {index}")
else:
    print("'fox' not found")

# Using the index() method to find the first occurrence of a substring
try:
    index = text.index("dog")
    print(f"Found 'dog' at position {index}")
except ValueError:
    print("'dog' not found")

# Using the count() method to count the number of occurrences of a substring
count = text.count("the")
print(f"'the' appears {count} times in the text")


Found 'fox' at position 16
Found 'dog' at position 40
'the' appears 2 times in the text


## String Manipulation

In [65]:
text = "The color of the car is blue."
pattern = r"blue"
replacement = "red"

# Use re.sub() to replace "blue" with "red"
new_text = re.sub(pattern, replacement, text)
print(new_text)


The color of the car is red.


replace my name with your own name

In [75]:
text ="my name is samsaan"
pattern = r"samsaan"
name =input("Enter your name")

new_text = re.sub(pattern, name, text)
print(new_text)

my name is Asghar


In [77]:
text = "My CNIC number is 71101-6991209-9. Call me anytime."
pattern = r"\d{5}-\d{7}-\d{1}"

# Use re.search() to find and extract the phone number
match = re.search(pattern, text)

if match:
    phone_number = match.group()
    print("CNIC number:", phone_number)
else:
    print("CNIC number not found.")


CNIC number: 71101-6991209-9


In [78]:
text = "apple,banana,kiwi,orange"
pattern = r","

# Use re.split() to split the text into a list of fruits
fruits = re.split(pattern, text)
print(fruits)


['apple', 'banana', 'kiwi', 'orange']


In [79]:
text = "Product A (Price: $2500), Product B (Price: $1500), Product C (Price: $3000)"
#pattern = r"\w+\s\(\w+: \$(\d+)\)"
pattern = r"\$\d+"

# Use re.findall() to find and extract the prices of products
prices = re.findall(pattern, text)
print("Product Prices:", prices)


Product Prices: ['$2500', '$1500', '$3000']


In [80]:
text = "The activity of the actor is appreciated."
pattern = r"\bact\w*\b"

# Use re.findall() to find and extract words starting with "act"
matches = re.findall(pattern, text)
print("Words starting with 'act':", matches)


Words starting with 'act': ['activity', 'actor']


In [11]:
import re

text = "My email id is abbas.abbasi@iub.edu.pk and my phone number is 0345-8023770"

gmail = r'\b\w+.\w+?@\w+.\w+.\w+?\b'
number = r"\d{4}-\d{7}"


match_gmail = re.findall(gmail, text)
match_number = re.findall(number, text)
if match_gmail and match_number:
    print(f"gmail is : {match_gmail}")
    print(f"number is : {match_number}")
    
    
else:
    print("Not found")

gmail is : ['abbas.abbasi@iub.edu.pk']
number is : ['0345-8023770']


## Validation

Take input from user and validate  email

In [14]:

gmail = "1212asghar.com"
gmail_valid = r'\b\w+.\w+?@\w+.\w+.\w+?\b'

valid = re.findall(gmail_valid,gmail)

if valid:
    print("gmail is valid")
else:
    print("gmail is not valid")


gmail is not valid


take input from user and validate its pakistan phone number or not

In [17]:
number = input("Enetr your number here : ")
number_valid = r"\d{4}-\d{7}"

valid = re.findall(number_valid , number)

if valid:
    print(f"the {number} is valid")

else:
    print(f"The {number} is not valid")

the 0343-2813144 is valid


validated user number it can contain A to Z latter and integer number

In [41]:
user_number = "121dj1231"
pattern = r"^[0-9a-z]+$"

valid = re.match(pattern,user_number)

if valid:
    print("valid user number")
else:
    print("Not valid user number")

Not valid user number
