---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:** 
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [141]:
import re
text = "Please contact me at 0301-1234567 or 042-35678901 for further details."

pattern = r"[\d-]+\s" #\s Exit at Whitespace

nums = re.findall(pattern, text)

for i in nums:
    print(i)

0301-1234567 
042-35678901 


### Assignment 2: Validating Email Addresses

**Raw Text:** 
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [54]:
text = 'Contact us at info@example.com or support@domain.pk for assistance.'

pattern = r"[\w]+@[\w]+\.[\w]+" 

email = re.findall(pattern, text , re.IGNORECASE)

for i in email:
    domain = i[-2:]
    if (domain == "pk"):
        print(f"Email: {i} is validated as Pakistani Domain (.pk)")
    else:
        print(f"Email: {i} is NOT validated as Pakistani Domain (.pk)") 

Email: info@example.com is NOT validated as Pakistani Domain (.pk)
Email: support@domain.pk is validated as Pakistani Domain (.pk)


### Assignment 3: Extracting CNIC Numbers

**Raw Text:** 
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [43]:
text = 'My CNIC is 12345-6789012-3 and another one is 34567-8901234-5'

pattern = r"\d{5}-\d{7}-\d{1}" 

cnic = re.findall(pattern, text , re.IGNORECASE)

for i in cnic:
    print(i)

12345-6789012-3
34567-8901234-5



### Assignment 4: Identifying Urdu Words

**Raw Text:** 
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [89]:
text = "یہ sentence میں کچھ English words بھی ہیں۔"
pattern = r'[\u0600-\u06FF]+'  # Unicode range for Arabic script

urdu = re.findall(pattern, text)
print(urdu)

['یہ', 'میں', 'کچھ', 'بھی', 'ہیں۔']


### Assignment 5: Finding Dates

**Raw Text:** 
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [99]:
text = "The event will take place on 15-08-2023 and 23-09-2023."
pattern = r'\d{1,2}-\d{1,2}-\d{1,4}'  

dates = re.findall(pattern, text)
print(dates)

['15-08-2023', '23-09-2023']


### Assignment 6: Extracting URLs

**Raw Text:** 
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [122]:
text = "Visit http://www.example.pk or https://website.com.pk for more information."

pattern = r"https?://(?:www\.)?+\w+(?:\.com)?\.pk" #(?:www\.)?  = Optional Group

urls = re.findall(pattern, text)
print(urls)

['http://www.example.pk', 'https://website.com.pk']


### Assignment 7: Analyzing Currency

**Raw Text:** 
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [124]:
text = "The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."

pattern = r"\d+"

currency = re.findall(pattern, text)
print(currency)

['1500', '2500']


### Assignment 8: Removing Punctuation

**Raw Text:** 
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [132]:
text = "کیا! آپ, یہاں؟"

pattern = r"\w+"

urdu = re.findall(pattern, text)
print(urdu)


['کیا', 'آپ', 'یہاں']


### Assignment 9: Extracting City Names

**Raw Text:** 
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [133]:
text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

pattern = r"(\w+), (\w+), (\w+), and (\w+) are major cities of Pakistan."

cities = re.findall(pattern, text)
print(cities)

[('Lahore', 'Karachi', 'Islamabad', 'Peshawar')]



### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:** 
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [139]:
text = "I saw a car with the number plate LEA-567 near the market."

pattern = r"[A-Z]+-\d{1,3}"

number_plates = re.findall(pattern, text)
print(number_plates)

['LEA-567']
