# Assignment # 4


---

## Advanced Regular Expression Assignments

### Question 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```




In [30]:
#Answer
import re

text = "Please contact me at 0301-1234567 or 042-35678901 for further details."
pattern = r'\b(0[1-9]\d{1,3}-\d{7}|\(0[1-9]\d{1,3}\)\d{7})\b'
matches = re.findall(pattern, text)
for number in matches:
    print("Phone Number:", number)


Phone Number: 0301-1234567


---
### Question 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [34]:
#Answer

Text = """Contact us at info@example.com  or support@domain.pk for assistance."""


pattern1 = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'
pattern2 = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[pk|PK]\b'

print(re.findall(pattern1,Text,re.MULTILINE))
print(re.findall(pattern2,Text,re.MULTILINE))

No valid email addresses with .pk domain found.


---
### Question 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [35]:
#Answer
Text = """My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."""
pattern = r"\d{0,5}-\d{0,7}-\d{0,9}"

print(re.findall(pattern,Text,re.MULTILINE))

['12345-6789012-3', '34567-8901234-5']


---
### Question 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [13]:
#Answer
Text ="""یہ sentence میں کچھ English words بھی ہیں۔"""
pattern= r'\b[^s]+\b'

re.findall(pattern,Text,re.MULTILINE)

['یہ ', ' میں کچھ ', ' ', ' بھی ہیں']

---
### Question 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [20]:
#Answer
Text = """The event will take place on 15-08-2023 and 23-09-2023."""
pattern =  r"\d{1,2}-\d{1,8}-\d{1,4}"

re.findall(pattern,Text,re.MULTILINE)

['15-08-2023', '23-09-2023']

---
### Question 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [38]:
#Answer
import re

text = "Visit http://www.example.pk or https://website.com.pk for more information."
pattern = r'https?://[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\.pk(?:/[A-Za-z0-9_]+)*'
matches = re.findall(pattern, text)

for url in matches:
    print("URL:", url)


URL: https://website.com.pk


---
### Question 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [58]:
#Answer
import re

text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."
pattern = r'\bPKR\s+(\d+(?:,\d{3})*(?:\.\d{2})?)\b'

matches = re.findall(pattern, text)
currency_amounts = [float(match.replace(',', '')) for match in matches]

for amount in currency_amounts:
    print("Amount in PKR:", amount)


Amount in PKR: 1500.0


---
### Question 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [59]:
#Answer
import re

text = "کیا! آپ, یہاں؟"
pattern = r'[^\w\s\u0600-\u06FF]+'
cleaned_text = re.sub(pattern, '', text)

print("Cleaned Text:", cleaned_text)


Cleaned Text: کیا آپ یہاں؟


---
### Question 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [60]:
#Answer
import re

text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."
city_names = ["Lahore", "Karachi", "Islamabad", "Peshawar", "Rawalpindi", "Multan", "Faisalabad",
              "Quetta", "Gujranwala", "Sialkot", "Sargodha", "Gujrat", "Bahawalpur", "Sukkur", "Jhang",
              "Sheikhupura", "Mardan", "Larkana", "Kasur", "Rahim Yar Khan", "Sahiwal", "Okara", "Wah",
              "Dera Ghazi Khan", "Mirpur Khas", "Nawabshah", "Kamoke", "Burewala", "Jhelum", "Sadiqabad",
              "Khanewal", "Hafizabad", "Kohat", "Jacobabad", "Shikarpur", "Muzaffargarh", "Khanpur", "Gojra",
              "Bahawalnagar", "Muridke", "Pakpattan", "Abottabad", "Tando Adam", "Khairpur", "Chishtian",
              "Daska", "Dera Ismail Khan", "Charsadda", "Jamshoro", "Nowshera", "Mandi Bahauddin", "Wazirabad",]
pattern = r'\b(?:' + '|'.join(re.escape(city) for city in city_names) + r')\b'

matches = re.findall(pattern, text, re.IGNORECASE)

print("Extracted City Names:", matches)


Extracted City Names: ['Lahore', 'Karachi', 'Islamabad', 'Peshawar']


---
### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [61]:
#Answer
import re

text = "I saw a car with the number plate LEA-567 near the market."
pattern = r'\b[A-Z]{3}-\d{3}\b'

matches = re.findall(pattern, text)

for reg_number in matches:
    print("Vehicle Registration Number:", reg_number)


Vehicle Registration Number: LEA-567
