---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [3]:
import pandas as pd
import numpy as np
import re

In [130]:
Text = """Please contact me at 0301-1234567 or 042-35678901 for further details."""
# Text = Text.replace("-","")
pattern = ("\d{4}-\d{7}")
abc = re.findall(pattern, Text)
abc

['0301-1234567']

### Assignment 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [293]:
text = """Contact us at info@example.com or support@domain.pk for assistance."""
patterns = r"([\w]+@[\w]+.pk)"
email = re.findall(patterns, text)
email

['support@domain.pk']

### Assignment 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [299]:
Text = """My CNIC is 12345-6789012-3 and another one is 34567-8901234-5."""
pattern_3 = "\d{5}-\d{7}-\d"
cnic = re.findall(pattern_3, Text)
cnic

['12345-6789012-3', '34567-8901234-5']


### Assignment 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [323]:
Text = """یہ sentence میں کچھ English words بھی ہیں"""
pattern_4 = "([^\s\-a-zA-Z]+)"
# pattern_4 = r"\b([^\s\-a-zA-Z]+\b)"
urdu = re.findall(pattern_4, Text)
urdu

['یہ', 'میں', 'کچھ', 'بھی', 'ہیں']

### Assignment 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [326]:
Text = """The event will take place on 15-08-2023 and 23-09-2023."""
pattern_5 = "\d{2}-\d{2}-\d{2,4}"
date = re.findall(pattern_5, Text)
date

['15-08-2023', '23-09-2023']

### Assignment 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [338]:
Text = """Visit http://www.example.pk or https://website.com.pk for more information."""
pattern_6 = "(https.//\w.+pk)"
url = re.findall(pattern_6, Text)
url

['https://website.com.pk']

### Assignment 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [426]:
Text = """The product costs PKR 1500, while the deluxe version is priced at Rs. 2500."""
pattern_7 = "(\d{1,90})"
amount = re.findall(pattern_7, Text)
amount

['1500', '2500']

### Assignment 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [450]:
Text = """کیا! آپ, یہاں؟"""
pattern_8 = "([^\W]+)"
rem_Punc = re.findall(pattern_8, Text)
rem_Punc

['کیا', 'آپ', 'یہاں']

### Assignment 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [471]:
Text = """Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."""
pattern_9 = "([\w]+)\,\s([\w]+)\,\s([\w]+)\,\sand\s([\w]+)"
cities = re.findall(pattern_9, Text)
cities

[('Lahore', 'Karachi', 'Islamabad', 'Peshawar')]


### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [480]:
Text = """I saw a car with the number plate LEA-567 near the market."""
pattern_10 = r"(LEA-\d{3,6})"
reg_num = re.findall(pattern_10, Text)
reg_num

['LEA-567']