# Pandas Data Manipulation Practice
This notebook demonstrates basic string manipulation, column splitting, and filtering using the Pandas library in Python.

In [117]:
import pandas as pd

In [118]:
df = pd.DataFrame({
    "Name": ["John", "Jane", "Jack", "Jill", "Ali", "Ayesha", "Hassan", "Hina", "Hafsa", "Hassan"],
    "Age": [28, 22, 35, 29, 21, 24, 27, 26, 25, 28],
    "Gender": ["Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female", "Female", "Male"],
    "Address": [" iqbal town  Lahore", "minar-e-pakistan Lahore ", "main market Islamabad", "quadide azam market PeShawar",
    "ziaraat Quetta", "airport Faisalabad ", "faizabad   Rawalpindi", "bilal park Hyderbad   ",
    "khatme nabuwat chowk Gojra", "kotli chowk Shorkot"]
})
df.head()

Unnamed: 0,Name,Age,Gender,Address
0,John,28,Male,iqbal town Lahore
1,Jane,22,Female,minar-e-pakistan Lahore
2,Jack,35,Male,main market Islamabad
3,Jill,29,Female,quadide azam market PeShawar
4,Ali,21,Male,ziaraat Quetta


### 1. Data Cleaning: Remove Whitespace
The `Address` column contains inconsistent spacing (extra spaces at the start or end). We use the `.str.strip()` method to clean this up.

In [119]:
df["Address"] = df["Address"].str.strip()
df.head()

Unnamed: 0,Name,Age,Gender,Address
0,John,28,Male,iqbal town Lahore
1,Jane,22,Female,minar-e-pakistan Lahore
2,Jack,35,Male,main market Islamabad
3,Jill,29,Female,quadide azam market PeShawar
4,Ali,21,Male,ziaraat Quetta


### 2. Standardize Text Format
To ensure consistency, we convert the addresses to **Title Case** (where the first letter of each word is capitalized) using `.str.title()`.

In [120]:
df["Address"] = df["Address"].str.title()
df.head()

Unnamed: 0,Name,Age,Gender,Address
0,John,28,Male,Iqbal Town Lahore
1,Jane,22,Female,Minar-E-Pakistan Lahore
2,Jack,35,Male,Main Market Islamabad
3,Jill,29,Female,Quadide Azam Market Peshawar
4,Ali,21,Male,Ziaraat Quetta


### 3. Feature Engineering: Split Address
We split the `Address` column into two separate columns: `Town` and `City`.
* We use `rsplit(" ", n=1)` to split the string starting from the right side, ensuring we capture the city correctly even if the town name has spaces.

In [121]:
df[["Town", "City"]] = df["Address"].str.rsplit(" ",n=1, expand=True)
df

Unnamed: 0,Name,Age,Gender,Address,Town,City
0,John,28,Male,Iqbal Town Lahore,Iqbal Town,Lahore
1,Jane,22,Female,Minar-E-Pakistan Lahore,Minar-E-Pakistan,Lahore
2,Jack,35,Male,Main Market Islamabad,Main Market,Islamabad
3,Jill,29,Female,Quadide Azam Market Peshawar,Quadide Azam Market,Peshawar
4,Ali,21,Male,Ziaraat Quetta,Ziaraat,Quetta
5,Ayesha,24,Female,Airport Faisalabad,Airport,Faisalabad
6,Hassan,27,Male,Faizabad Rawalpindi,Faizabad,Rawalpindi
7,Hina,26,Female,Bilal Park Hyderbad,Bilal Park,Hyderbad
8,Hafsa,25,Female,Khatme Nabuwat Chowk Gojra,Khatme Nabuwat Chowk,Gojra
9,Hassan,28,Male,Kotli Chowk Shorkot,Kotli Chowk,Shorkot


### 4. Add Dummy Data
We add a `Phone No` column to the DataFrame to practice string slicing operations.

In [122]:
df["Phone No"] = ["0300-1234567", "0321-7654321", "0346-1234567", "0331-1234567","0315-1234567", "0321-7654321", "0346-1234567", "0331-1234567","0315-1234567", "0321-7654321"]
df

Unnamed: 0,Name,Age,Gender,Address,Town,City,Phone No
0,John,28,Male,Iqbal Town Lahore,Iqbal Town,Lahore,0300-1234567
1,Jane,22,Female,Minar-E-Pakistan Lahore,Minar-E-Pakistan,Lahore,0321-7654321
2,Jack,35,Male,Main Market Islamabad,Main Market,Islamabad,0346-1234567
3,Jill,29,Female,Quadide Azam Market Peshawar,Quadide Azam Market,Peshawar,0331-1234567
4,Ali,21,Male,Ziaraat Quetta,Ziaraat,Quetta,0315-1234567
5,Ayesha,24,Female,Airport Faisalabad,Airport,Faisalabad,0321-7654321
6,Hassan,27,Male,Faizabad Rawalpindi,Faizabad,Rawalpindi,0346-1234567
7,Hina,26,Female,Bilal Park Hyderbad,Bilal Park,Hyderbad,0331-1234567
8,Hafsa,25,Female,Khatme Nabuwat Chowk Gojra,Khatme Nabuwat Chowk,Gojra,0315-1234567
9,Hassan,28,Male,Kotli Chowk Shorkot,Kotli Chowk,Shorkot,0321-7654321


### 5. Extract Network Code
We want to extract only the network code (the first 4 digits). We use string slicing `[:4]` to keep the first four characters and discard the rest.

In [123]:
df["Phone No"] = df["Phone No"].str[:4]
df

Unnamed: 0,Name,Age,Gender,Address,Town,City,Phone No
0,John,28,Male,Iqbal Town Lahore,Iqbal Town,Lahore,300
1,Jane,22,Female,Minar-E-Pakistan Lahore,Minar-E-Pakistan,Lahore,321
2,Jack,35,Male,Main Market Islamabad,Main Market,Islamabad,346
3,Jill,29,Female,Quadide Azam Market Peshawar,Quadide Azam Market,Peshawar,331
4,Ali,21,Male,Ziaraat Quetta,Ziaraat,Quetta,315
5,Ayesha,24,Female,Airport Faisalabad,Airport,Faisalabad,321
6,Hassan,27,Male,Faizabad Rawalpindi,Faizabad,Rawalpindi,346
7,Hina,26,Female,Bilal Park Hyderbad,Bilal Park,Hyderbad,331
8,Hafsa,25,Female,Khatme Nabuwat Chowk Gojra,Khatme Nabuwat Chowk,Gojra,315
9,Hassan,28,Male,Kotli Chowk Shorkot,Kotli Chowk,Shorkot,321


### 6. Filter Data
Finally, we filter the DataFrame to display only the rows where the `Address` contains the string "Lahore".

In [124]:
lahore_addresses = df[df["Address"].str.contains("Lahore")]
lahore_addresses

Unnamed: 0,Name,Age,Gender,Address,Town,City,Phone No
0,John,28,Male,Iqbal Town Lahore,Iqbal Town,Lahore,300
1,Jane,22,Female,Minar-E-Pakistan Lahore,Minar-E-Pakistan,Lahore,321
