# 🧹 Data Cleaning & Transformation Project using Pandas

## 🎯 Tasks

**1. Capitalize all names**  
Use `.str.title()` to fix inconsistent casing in the `Name` column.

**2. Standardize gender values**  
Use `.replace()` or `.map()` to convert `"f"`, `"F"`, `"female"` → `"Female"` and `"m"`, `"M"`, `"male"` → `"Male"`.

**3. Normalize city names**  
Make all `City` values lowercase, then capitalize the first letter of each word (e.g., `"new york"` → `"New York"`).

**4. Fill missing Age values**  
Use the average age to replace missing values in the `Age` column.

**5. Convert Join Date to datetime**  
Use `pd.to_datetime()` to convert all date strings in the `Join Date` column to proper datetime format.

**6. Fill missing Income values**  
Use the median of the `Income` column to replace missing entries.

**7. Create a new column "Age Group"**  
Using `.apply()` and a `lambda`, create an `Age Group` column:
- If age < 18 → `"Minor"`  
- If 18 ≤ age < 60 → `"Adult"`  
- If age ≥ 60 → `"Old"`

**8. Display the final cleaned DataFrame**
Show the complete, cleaned dataset with all changes applied.

---



In [70]:
import pandas as pd
import numpy as np

In [71]:
#1
data = {
    "Name": ["alice", "BOB", "Charlie", "diana", "Eve"],
    "Age": [24, 30, np.nan, 66, 17],
    "Gender": ["f", "M", "male", "Female", "F"],
    "City": ["new york", "los angeles", "Chicago", "mumbai", "DELHI"],
    "Income": [55000.0, 68000.5, 72000, np.nan, 42000],
    "Join Date": ["12/05/2021", "05-07-2020", "2021.08.01", "01-01-2020", "2019/03/25"]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,alice,24.0,f,new york,55000.0,12/05/2021
1,BOB,30.0,M,los angeles,68000.5,05-07-2020
2,Charlie,,male,Chicago,72000.0,2021.08.01
3,diana,66.0,Female,mumbai,,01-01-2020
4,Eve,17.0,F,DELHI,42000.0,2019/03/25


In [72]:
#2
df['Name']=df['Name'].str.title()
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,f,new york,55000.0,12/05/2021
1,Bob,30.0,M,los angeles,68000.5,05-07-2020
2,Charlie,,male,Chicago,72000.0,2021.08.01
3,Diana,66.0,Female,mumbai,,01-01-2020
4,Eve,17.0,F,DELHI,42000.0,2019/03/25


In [73]:
#3
df['Gender']=df['Gender'].replace({'f':'Female','F':'Female','M':'Male'}).str.title()
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,Female,new york,55000.0,12/05/2021
1,Bob,30.0,Male,los angeles,68000.5,05-07-2020
2,Charlie,,Male,Chicago,72000.0,2021.08.01
3,Diana,66.0,Female,mumbai,,01-01-2020
4,Eve,17.0,Female,DELHI,42000.0,2019/03/25


In [74]:
df['City']=df['City'].str.title()
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,Female,New York,55000.0,12/05/2021
1,Bob,30.0,Male,Los Angeles,68000.5,05-07-2020
2,Charlie,,Male,Chicago,72000.0,2021.08.01
3,Diana,66.0,Female,Mumbai,,01-01-2020
4,Eve,17.0,Female,Delhi,42000.0,2019/03/25


In [75]:
#4
df['Age']=df['Age'].fillna(df['Age'].mean())
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,Female,New York,55000.0,12/05/2021
1,Bob,30.0,Male,Los Angeles,68000.5,05-07-2020
2,Charlie,34.25,Male,Chicago,72000.0,2021.08.01
3,Diana,66.0,Female,Mumbai,,01-01-2020
4,Eve,17.0,Female,Delhi,42000.0,2019/03/25


In [76]:
#5
df['Join Date'] = pd.to_datetime(df['Join Date'], format='mixed', errors='coerce')
df


Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,Female,New York,55000.0,2021-12-05
1,Bob,30.0,Male,Los Angeles,68000.5,2020-05-07
2,Charlie,34.25,Male,Chicago,72000.0,2021-08-01
3,Diana,66.0,Female,Mumbai,,2020-01-01
4,Eve,17.0,Female,Delhi,42000.0,2019-03-25


In [77]:
#6
df['Income']=df['Income'].fillna(df['Income'].mean())
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date
0,Alice,24.0,Female,New York,55000.0,2021-12-05
1,Bob,30.0,Male,Los Angeles,68000.5,2020-05-07
2,Charlie,34.25,Male,Chicago,72000.0,2021-08-01
3,Diana,66.0,Female,Mumbai,59250.125,2020-01-01
4,Eve,17.0,Female,Delhi,42000.0,2019-03-25


In [78]:
#6
df['Age Group'] = df['Age'].astype(int).apply(
    lambda x: 'Minor' if x < 18 else ('Adult' if x < 60 else 'Old')
)
df

Unnamed: 0,Name,Age,Gender,City,Income,Join Date,Age Group
0,Alice,24.0,Female,New York,55000.0,2021-12-05,Adult
1,Bob,30.0,Male,Los Angeles,68000.5,2020-05-07,Adult
2,Charlie,34.25,Male,Chicago,72000.0,2021-08-01,Adult
3,Diana,66.0,Female,Mumbai,59250.125,2020-01-01,Old
4,Eve,17.0,Female,Delhi,42000.0,2019-03-25,Minor
