---

## Advanced Regular Expression Assignments

### Assignment 1: Extracting Phone Numbers

**Raw Text:**
Extract all valid Pakistani phone numbers from a given text.

**Example:**
```
Text: Please contact me at 0301-1234567 or 042-35678901 for further details.
```



In [161]:
import re

text = """
Please contact me at 0301-1234567 or 042-35678901 for further details.
Please contact me at 0300-1246810 or 021-35678901 for further details.
Please contact me at 0333-1234567 or 022-35678901 for further details.
"""

pattern = "(\d{4}-\d{7}|\d{3}-\d{8})"
                                                    
Pak_numbers = re.findall(pattern, text)

Pak_numbers

import pandas as pd

df = pd.DataFrame(Pak_numbers, columns=["Mobile Numbers"])
df

Unnamed: 0,Mobile Numbers
0,0301-1234567
1,042-35678901
2,0300-1246810
3,021-35678901
4,0333-1234567
5,022-35678901


### Assignment 2: Validating Email Addresses

**Raw Text:**
Validate email addresses according to Pakistani domain extensions (.pk).

**Example:**
```
Text: Contact us at info@example.com or support@domain.pk for assistance.
```



In [162]:
import re

contact = """
Contact us at info@example.com or support@domain.pk for assistance.
For help email at info@example.com or help@domain.pk.
For query email at info@example.com or query@domain.pk.
"""
    
pattern = "[A-z.+]+@+[A-z.+]+\.pk" 

emails = re.findall(pattern, contact)

emails

import pandas as pd

df = pd.DataFrame(emails, columns=["Email_Add"])
df

Unnamed: 0,Email_Add
0,support@domain.pk
1,help@domain.pk
2,query@domain.pk


### Assignment 3: Extracting CNIC Numbers

**Raw Text:**
Extract all Pakistani CNIC (Computerized National Identity Card) numbers from a given text.

**Example:**
```
Text: My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
```


In [164]:
import re

Info = """
My CNIC is 12345-6789012-3 and another one is 34567-8901234-5.
Ahmed CNIC no. is 42201-5086613-3 and Nasir CNIC is 42101-1667788-5
"""

pattern = "\d{5}-\d{7}-\d"

CNICs = re.findall(pattern, Info)

CNICs

import pandas as pd

df = pd.DataFrame(CNICs, columns=["CNIC No."])
df

Unnamed: 0,CNIC No.
0,12345-6789012-3
1,34567-8901234-5
2,42201-5086613-3
3,42101-1667788-5



### Assignment 4: Identifying Urdu Words

**Raw Text:**
Identify and extract Urdu words from a mixed English-Urdu text.

**Example:**
```
Text: یہ sentence میں کچھ English words بھی ہیں۔
```



In [44]:
import re

raw_text ="یہ sentence میں کچھ English words بھی ہیں۔"

pattern = r'[\u0600-\u06FF]+'

urdu_words = re.findall(pattern, raw_text)

urdu_words

import pandas as pd

df = pd.DataFrame(urdu_words, columns=["Urdu Alfaaz"])
df

Unnamed: 0,Urdu Alfaaz
0,یہ
1,میں
2,کچھ
3,بھی
4,ہیں۔


### Assignment 5: Finding Dates

**Raw Text:**
Find and extract dates in the format DD-MM-YYYY from a given text.

**Example:**
```
Text: The event will take place on 15-08-2023 and 23-09-2023.
```



In [165]:
Statement = """
The event will take place on 15-08-2023 and 23-09-2023.
My date of birth is 04/04/1984.
Test is on 01/09/23.
"""

pattern = "\d{1,2}[/\\-?]\d{1,2}[/\\-?]\d{2,4}"

Dates = re.findall(pattern, Statement)

Dates

import pandas as pd

df = pd.DataFrame(Dates, columns=["Dates"])
df

Unnamed: 0,Dates
0,15-08-2023
1,23-09-2023
2,04/04/1984
3,01/09/23


### Assignment 6: Extracting URLs

**Raw Text:**
Extract all URLs from a text that belong to Pakistani domains.

**Example:**
```
Text: Visit http://www.example.pk or https://website.com.pk for more information.
```



In [37]:
import re

example = """
Visit http://www.example.pk or https://website.com.pk for more information.
Contact us at http://www.pak.pk or http://www.help.pk for assistance.
For help email at http://www.123.pk.
"""
    
pattern1 = r'http?://[A-z0-9.+]+\.pk'

URL1 = re.findall(pattern1, example, re.M)

pattern2 = r'https://[A-z0-9.+]+\.com+\.pk'

URL2 = re.findall(pattern2, example, re.M)

URLs = URL1 + URL2

import pandas as pd

df = pd.DataFrame(URLs, columns=["Website"])
df

Unnamed: 0,Website
0,http://www.example.pk
1,http://www.pak.pk
2,http://www.help.pk
3,http://www.123.pk
4,https://website.com.pk


### Assignment 7: Analyzing Currency

**Raw Text:**
Extract and analyze currency amounts in Pakistani Rupees (PKR) from a given text.

**Example:**
```
Text: The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
```



In [40]:
import re

analysis = """
The product costs PKR 1500, while the deluxe version is priced at Rs. 2500.
The product costs PKR 5500, while the deluxe version is priced at Rs. 8000.
The product costs PKR 1000, while the deluxe version is priced at Rs. 1500.
"""

pattern1 = r'(PKR \d{1,1000})'

Product_cost = re.findall(pattern1, analysis, re.M)

Product_cost

pattern2 =  r'(Rs\. \d{1,1000})'

Deluxe_version = re.findall(pattern2, analysis, re.M)

Deluxe_version

import pandas as pd

df1 = pd.DataFrame(Product_cost, columns= ["Product_cost"])
print(df1)

df2 = pd.DataFrame(Deluxe_version, columns= ["Deluxe_version"])
print(df2)


  Product_cost
0     PKR 1500
1     PKR 5500
2     PKR 1000
  Deluxe_version
0       Rs. 2500
1       Rs. 8000
2       Rs. 1500


### Assignment 8: Removing Punctuation

**Raw Text:**
Remove all punctuation marks from a text while preserving Urdu characters.

**Example:**
```
Text: کیا! آپ, یہاں؟
```



In [65]:
import re

urdu = "کیا! آپ, یہاں؟"

pattern = r'[:,?!؟]'

clear_urdu = re.sub(pattern, '', urdu)

clear_urdu

'کیا آپ یہاں'

### Assignment 9: Extracting City Names

**Raw Text:**
Extract names of Pakistani cities from a given text.

**Example:**
```
Text: Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan.
```


In [63]:
import re

Text = "Lahore, Karachi, Islamabad, and Peshawar are major cities of Pakistan."

pattern = r'L.+war'

Pak_cities = re.findall(pattern, Text)

Pak_cities


['Lahore, Karachi, Islamabad, and Peshawar']


### Assignment 10: Analyzing Vehicle Numbers

**Raw Text:**
Identify and extract Pakistani vehicle registration numbers (e.g., ABC-123) from a text.

**Example:**
```
Text: I saw a car with the number plate LEA-567 near the market.
```



In [61]:
import re

Text = """
I saw a car with the number plate LEA-567 near the market.
A car with the number plate AAB-300.
Number plate ABA-350.
"""

pattern = r'[A-Z]{3}-\d{3}'

Num_plate = re.findall(pattern, Text, re.M)

Num_plate

import pandas as pd

df = pd.DataFrame(Num_plate, columns=["Number Plate"])
df

Unnamed: 0,Number Plate
0,LEA-567
1,AAB-300
2,ABA-350
