# Advanced Regular Expression Assignments

# Assignment 1: Extracting Phone Numbers

Raw Text: Extract all valid Pakistani phone numbers from a given text.

Example:

Text: Please contact me at 0301-1234567 or 042-35678901 for further details.

In [24]:
import re

text="Please contact me at 0301-1234567 or 042-35678901 for further details."

pattern= r"\d{1,4}-\d{1,8}"

re.findall(pattern,text,re.MULTILINE)


['0301-1234567', '042-35678901']

# Assignment 2: Validating Email Addresses

Raw Text: Validate email addresses according to Pakistani domain extensions (.pk).

Example:

Text: Contact us at info@example.com or support@domain.pk for assistance.

In [41]:
text1="Contact us at info@example.com or support@domain.pk for assistance."

#pattern=info@example.com
pattern1= r"\b[\w]+@[\w]+\.pk \b"

re.findall(pattern1,text1,re.MULTILINE)

['support@domain.pk ']

# Assignment 3: Extracting CNIC Numbers

Raw Text: Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

Example:

Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.

In [42]:
text2= "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."
    

pattern2= r"\d{2,5}-\d{2,7}-\d{1}"

re.findall(pattern2,text2,re.MULTILINE)

['12345-6789012-3', '34567-8901234-5']

# Assignment 4: Identifying Urdu Words

Raw Text: Identify and extract Urdu words from a mixed English-Urdu text.

Example:

Text: یہ sentence میں کچھ English words بھی ہیں۔

In [58]:
text3 = "یہ sentence میں کچھ English words بھی ہیں۔ "

pattern3 = r"\b[^s]+\b"

urdu_text = re.findall(pattern3, text3)
print(urdu_text)

['یہ ', ' میں کچھ ', ' ', ' بھی ہیں']


# Assignment 5: Finding Dates

Raw Text: Find and extract dates in the format DD-MM-YYYY from a given text.

Example:

Text: The event will take place on 15-08-2023 and 23-09-2023.

In [45]:
text4="The event will take place on 15-08-2023 and 23-09-2023."


pattern4= r"\d{1,2}-\d{1,2}-\d{2,4}"

re.findall(pattern4,text4,re.MULTILINE)

['15-08-2023', '23-09-2023']

# Assignment 6: Extracting URLs

Raw Text: Extract all URLs from a text that belong to Pakistani domains.

Example:

Text: Visit http://www.example.pk or https://website.com.pk for more information.

In [84]:
text5="Visit http://www.example.pk or https://website.com.pk for more information."

pattern5=r"\b https?://+\w+\.\w+.pk\b"

#r"\b([https://]+[\w]+\.[\w]+.pk)\b"

re.findall(pattern5,text5,re.MULTILINE)

[' http://www.example.pk', ' https://website.com.pk']

# Assignment 7: Analyzing Currency
    
Raw Text: Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

Example:

Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.

In [95]:
text6="The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."

pattern6=r"....\d{2,4}+"

re.findall(pattern6,text6,re.MULTILINE)

['PKR 1500', 'Rs. 2500']

# Assignment 8: Removing Punctuation

Raw Text: Remove all punctuation marks from a text while preserving Urdu characters.

Example:

Text: کیا! آپ, یہاں؟

In [119]:
text7= "کیا! آپ, یہاں؟ "

#text7=text7.replace(",","")

pattern7=r'[^\w\s]'

re.sub(pattern7,"",text7)

'کیا آپ یہاں '

# Assignment 9: Extracting City Names

Raw Text: Extract names of Pakistani cities from a given text.

Example:

Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.

In [154]:
text8="Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

pattern8=r"\b\w+\,\s\w+\,\s\w+\,\sand\s+\w+\b"

re.findall(pattern8,text8)

['Lahore, Karachi, Islamabad, and Peshawar']

# Assignment 10: Analyzing Vehicle Numbers

Raw Text: Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

Example:

Text: I saw a car with the number plate LEA-567 near the market.

In [164]:
text9="I saw a car with the number plate LEA-567 near the market."

pattern9=r"..\w-\d{1,3}"

re.findall(pattern9,text9)

['LEA-567']