---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:** 
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [17]:
import re

text = """
Please contact me at 0301-1234567 or 042-35678901 or 074 12345567 or 0302-4567893 or 021-12546986 
or 815-562413 for further details.
"""

pattern = r"\b(\d{3,4}[ -]?\d{7,8})\b"
phone_num = re.findall(pattern, text, re.MULTILINE)
print(phone_num)


['0301-1234567', '042-35678901', '074 12345567', '0302-4567893', '021-12546986']


### Assignment 2: Validating Email Addresses

**Raw Text:** 
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [44]:
text = """
info123@example.pk
support@domain.pk
pbs@pbs.com
contact@agpr.gov.pk
pbs@pbs.gov.pk
abc@123.com
abcd12@domain.pk
"""
pattern = r"\b([\w\d.-]+@[\w\d.-]+.pk)"
email = re.findall(pattern, text, re.M)
email

['info123@example.pk',
 'support@domain.pk',
 'contact@agpr.gov.pk',
 'pbs@pbs.gov.pk',
 'abcd12@domain.pk']

### Assignment 3: Extracting CNIC Numbers

**Raw Text:** 
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [52]:
text = """
My CNIC is 12345-6789012-3 and another one is 34567-8901234-5
"""
pattern = r"\d{5}-\d{7}-\d"
cnic = re.findall(pattern, text)
cnic

['12345-6789012-3', '34567-8901234-5']


### Assignment 4: Identifying Urdu Words

**Raw Text:** 
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [57]:
text = "یہ sentence میں کچھ English words بھی ہیں۔"

pattern = r"\b([^\s\-a-zA-Z]+\b)"
# pattern = r"\b[\u0600-\u06FF]+\b" #Unicode characters for the Urdu script
urdu = re.findall(pattern, text)
urdu

['یہ', 'میں', 'کچھ', 'بھی', 'ہیں']

### Assignment 5: Finding Dates

**Raw Text:** 
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [59]:
text = """
The event will take place on 15-08-2023 and 23-09-2023"""

#both pattern is correct
pattern = r"\d{1,2}-\d{1,2}-\d{4}"
# pattern = r"\d{2}-\d{2}-\d{4}"
date = re.findall(pattern, text)
date

['15-08-2023', '23-09-2023']

### Assignment 6: Extracting URLs

**Raw Text:** 
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [86]:
text = """
http://www.example.pk or https://website.com.pk"""

# pattern = r"\b([https?://]+[\w.-]+\.pk)\b"
pattern = r"\b([https?://]+[\w]+.[\w]+.pk)\b"
url = re.findall(pattern, text)
url

['http://www.example.pk', 'https://website.com.pk']

### Assignment 7: Analyzing Currency

**Raw Text:** 
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [96]:
text = """
The product costs PKR 1500, while the deluxe version is priced at Rs. 2500"""

pattern = r"[PKR]+ \d{4}"
url = re.findall(pattern, text)
url

['PKR 1500']

### Assignment 8: Removing Punctuation

**Raw Text:** 
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [98]:
text = """کیا! آپ, یہاں؟"""

pattern = r"\b([^!,؟]+)\b"
punctuation = re.findall(pattern,text)
punctuation

['کیا', 'آپ', 'یہاں']

### Assignment 9: Extracting City Names

**Raw Text:** 
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [130]:
text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

# pattern = r"[\w]+"
pattern = r"\b(?:([\w]+)\,\s([\w]+)\,\s([\w]+)\,\sand\s([\w]+))\b"
city = re.findall(pattern,text)
city

[('Lahore', 'Karachi', 'Islamabad', 'Peshawar')]


### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:** 
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [140]:
text = "I saw a car with the number plate LEA-567 near the market"

#BOTH IS CORRECT
# pattern = r"\b([A-Z]{3}-\d{3})\b"
pattern = r"\b([\w]{3}-\d{3})\b"
vehicle = re.findall(pattern,text)
vehicle

['LEA-567']