---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:** 
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [20]:
import re
text_1= "Please contact me at 0301-1234567 or 042-35678901 for further details."
pattern_1 = r"\b(\d{3,4}\-\d{7,8})\b"  
phone_number= re.findall(pattern_1, text_1,re.MULTILINE)
phone_number


['0301-1234567', '042-35678901']

### Assignment 2: Validating Email Addresses

**Raw Text:** 
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [14]:
import re


text = "Contact us at info@example.com or support@domain.pk for assistance."
pattern = r'\b[\w\.-]+@[\w\.-]+\b\.pk\b'
matches = re.findall(pattern, text)
for match in matches:
    print(match)


support@domain.pk


### Assignment 3: Extracting CNIC Numbers

**Raw Text:** 
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [10]:
import re


text = "My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."
pattern = r'\b\d{5}-\d{7}-\d{1}\b'
matches = re.findall(pattern, text)
for match in matches:
    print(match)


12345-6789012-3
34567-8901234-5



### Assignment 4: Identifying Urdu Words

**Raw Text:** 
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [15]:
import re

text = "یہ sentence میں کچھ English words بھی ہیں۔"
urdu_pattern = r'[\u0600-\u06FF\s]+'
matches = re.findall(urdu_pattern, text)
for match in matches:
    print(match.strip())  


یہ
میں کچھ

بھی ہیں۔


### Assignment 5: Finding Dates

**Raw Text:** 
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [16]:
import re


text = "The event will take place on 15-08-2023 and 23-09-2023."
date_pattern = r'\b\d{2}-\d{2}-\d{4}\b'
matches = re.findall(date_pattern, text)
for match in matches:
    print(match)


15-08-2023
23-09-2023


### Assignment 6: Extracting URLs

**Raw Text:** 
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [13]:
import re

text = "Visit http://www.example.pk or https://website.com.pk for more information."

url_pattern = r'https?://(?:www\.)?[\w.-]+\.pk\b'
matches = re.findall(url_pattern, text)

for match in matches:
    print(match)


http://www.example.pk
https://website.com.pk


### Assignment 7: Analyzing Currency

**Raw Text:** 
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [17]:
import re


text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."
currency_pattern = r'PKR\s+(\d+(?:,\d{3})*(?:\.\d{2})?)'
matches = re.findall(currency_pattern, text)
total_sum = 0.0
for match in matches:
    amount = float(match.replace(',', ''))  
    print(f"Found amount: PKR {amount:.2f}")
    total_sum += amount

print(f"Total sum of currency amounts: PKR {total_sum:.2f}")


Found amount: PKR 1500.00
Total sum of currency amounts: PKR 1500.00


### Assignment 8: Removing Punctuation

**Raw Text:** 
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [18]:
import re

text = "کیا! آپ, یہاں؟"

punctuation_pattern = r'[^\w\s؀-\ے]+'

text_without_punctuation = re.sub(punctuation_pattern, '', text)

print(text_without_punctuation)


کیا آپ یہاں؟


### Assignment 9: Extracting City Names

**Raw Text:** 
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [16]:
import re


text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."
city_names = ["Lahore", "Karachi", "Islamabad", "Peshawar"]
city_pattern = r'\b(?:' + '|'.join(re.escape(city) for city in city_names) + r')\b'
matches = re.findall(city_pattern, text)

for match in matches:
    print(match)


Lahore
Karachi
Islamabad
Peshawar



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:** 
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [19]:
import re


text = "I saw a car with the number plate LEA-567 near the market. Another car had the plate XYZ-1234."
registration_number_pattern = r'\b[A-Z]{3}-\d{3,4}\b'
matches = re.findall(registration_number_pattern, text)

for match in matches:
    print(match)


LEA-567
XYZ-1234
