---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:** 
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```

In [1]:
import re

#  Sample text with phone numbers
text = "Please contact me at 0301-1234567 or 042-35678901 for further details."
    
# Define the regex pattern for phone numbers
pattern = r"\b\d{3,4}-\d{7,8}\b"

# Find all matches using re.findall()
phone_numbers = re.findall(pattern, text)
print(f"Valid Pakistani phone number: {phone_numbers}")

Valid Pakistani phone number: ['0301-1234567', '042-35678901']


### Assignment 2: Validating Email Addresses

**Raw Text:** 
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```

In [3]:
import re

#  Sample text with email address
text = "Contact us at info@example.com or support@domain.pk for assistance."

# Define the regex pattern for email address
email_pattern = r'\b[\w.-]+@[\w.-]+\.pk\b'
# pattren = r"\b[\w\.-]+@[\w]+.[\w]+"

# Find all matches using re.findall()
email_address = re.findall(email_pattern,text)

# Validate email address
if email_address:
    print(f" Valid email address = {email_address}")
else:
    print("Invalid email address")

 Valid email address = ['support@domain.pk']


### Assignment 3: Extracting CNIC Numbers

**Raw Text:** 
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```

In [31]:
import re

#  Sample text with email address
text = "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."

# Define the regex pattern for cnic numbers
pattern = r"\b\d{5}-\d{7}-\d\b"

# Extracting all matches using re.findall()
cnic = re.findall(pattern, text)
print(f"Pakistani CNIC: {cnic}")

Pakistani CNIC: ['12345-6789012-3', '34567-8901234-5']



### Assignment 4: Identifying Urdu Words

**Raw Text:** 
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



There are two ways to extract urdu words from the given English-Urdu text. 
1. Using Unicode concpet.
2. Using regex special sequence and match characters concept.

using unicode concpet

In [22]:
import re

# Sample text with email address
text = "یہ sentence میں کچھ English words بھی ہیں۔"

# # Define the regex pattern to identify urdu words
pattern = r"\b[\u0600-\u06FF]+\b" #using unicode concpet

# Extracting all matches using re.findall()
urdu_words = re.findall(pattern, text)
print(f"Urdu words in the text: {urdu_words}")

Urdu words in the text: ['یہ', 'میں', 'کچھ', 'بھی', 'ہیں']


using regex special sequence and match characters concept

In [48]:
import re

# Sample text with email address
text = "یہ sentence میں کچھ English words بھی ہیں۔"

# # Define the regex pattern to identify urdu words
pattern =  r"\b[^\sA-z]+\b" #using regex special sequence and match characters concept

# Extracting all matches using re.findall()
urdu_words = re.findall(pattern, text)
print(f"Urdu words in the text: {urdu_words}")

Urdu words in the text: ['یہ', 'میں', 'کچھ', 'بھی', 'ہیں']


### Assignment 5: Finding Dates

**Raw Text:** 
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [29]:
import re

#  Sample text with Dates
text = "The event will take place on 15-08-2023 and 23-09-2023."

# Define the regex pattern to find exact dates:
pattern = r"\d{1,2}[-?]\d{1,2}[-?]\d{2,4}"

# Extracting all matches using re.findall()
dates = re.findall(pattern, text)
print(f"Extract dates: {dates}")

Extract dates: ['15-08-2023', '23-09-2023']


### Assignment 6: Extracting URLs

**Raw Text:** 
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [36]:
import re

# Sample text with URLs
text = "Visit http://www.example.pk or https://website.com.pk for more information."

# Define the regex pattern to find urls:
pattern = r"[https?://]+\S+\.pk" 

# Extracting all matches using re.findall()
URLs = re.findall(pattern, text)
print(f"Pakistani domains: {URLs}")

Pakistani domains: ['http://www.example.pk', 'https://website.com.pk']


### Assignment 7: Analyzing Currency

**Raw Text:** 
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```


In [44]:
import re

# Sample text with currency amount
text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."

#here we add Rs and PKR string in text using re.sub function
new_text = re.sub("Rs.","PKR",text)

# Define the regex pattern to find rupees:
pattern = r"\b\w{3} \d{4}\b"

# Extracting and Analyzing currency amounts:
pakistani_rupees = re.findall(pattern,new_text)
print(f"Currency amounts in PKR: {pakistani_rupees}")

Currency amounts in PKR: ['PKR 1500', 'PKR 2500']


### Assignment 8: Removing Punctuation

**Raw Text:** 
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```


There are two ways to extract urdu words from the given English-Urdu text. 
1. Using Unicode concpet.
2. Using regex special sequence and match characters concept.
3. Using regex special sequence i.e. \w - because text only contains urdu words.

using Unicode concpet.

In [62]:
import re

# Sample text with punctuation marks
text = "کیا! آپ, یہاں؟"

# Define the regex pattern to extract text without punctuation marks:
pattern = r"\b[\u0600-\u06FF]+\b" #using unicode

# Extracting all matches using re.findall()
clean_text = re.findall(pattern, text)
print(f"Text without punctuation: {clean_text}")

Text without punctuation: ['کیا', 'آپ', 'یہاں']


using regex special sequence and match characters concept.

In [63]:
import re

# Sample text with punctuation marks
text = "کیا! آپ, یہاں؟"

# Define the regex pattern to extract text without punctuation marks:
pattern =  r"\b[^\sA-z]+\b" #using regex special sequence and match characters concept

# Extracting all matches using re.findall()
clean_text = re.findall(pattern, text)
print(f"Text without punctuation: {clean_text}")

Text without punctuation: ['کیا', 'آپ', 'یہاں']


### Assignment 9: Extracting City Names

**Raw Text:** 
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [5]:
import re

# Sample text with names of Pakistani cities
text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

# Define the regex pattern to extract city name:
pattern = r"[A-Z][a-z]+\b[^.]"

# Extracting all matches using re.findall()
pakistani_cities = re.findall(pattern,text)
print(f"Pakistani cities: {pakistani_cities}")

Pakistani cities: ['Lahore,', 'Karachi,', 'Islamabad,', 'Peshawar ']



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:** 
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```


In [6]:
import re

# Sample text with Pakistani vehicle registration numbers:
text = "I saw a car with the number plate LEA-567 near the market."

# Define the regex pattern to extract vehicle registration numbers :
pattren = r"\b[A-Z]{3}-\d{3}\b"

# Extracting all matches using re.findall()
registration_numbers = re.findall(pattren,text)
print(f"Pakistani vehicle registration number: {registration_numbers}")

Pakistani vehicle registration number: ['LEA-567']
