### 🔁 Pandas

1. Load & View Data: Load data.csv into a DataFrame and display the first 5 rows.
2. Select Columns: Print only the name and age columns.
3. Filter Rows: Get rows where age > 30.
4. Handle Missing Values: Show missing value count and fill missing city values with ‘Unknown’.
5. Group & Aggregate: Group by department and calculate average salary.
6. Sort Values: Sort by salary in descending order.
7. New Column Creation: Add age_group: ‘Youth’ (<25), ‘Adult’ (<60), ‘Senior’ (60+).
8. Datetime Conversion: Convert join_date to datetime and extract the year.
9. Merge DataFrames: Merge employee_df and department_df on dept_id. (You’ll need to create both DataFrames manually)
10. Pivot Table: Create a pivot showing average salary by department and gender.

1. Load & View Data: Load data.csv into a DataFrame and display the first 5 rows.


In [28]:
import pandas as pd
data = pd.read_csv('data/data.csv')
data.head(5)

Unnamed: 0,name,age,gender,department,salary,city,join_date
0,Alice,28,Female,Engineering,80000,New York,2021-05-12
1,Bob,35,Male,HR,60000,,2020-08-19
2,Charlie,22,Male,Marketing,50000,Los Angeles,2022-01-15
3,Diana,45,Female,Engineering,95000,Chicago,2019-03-11
4,Eva,31,Female,HR,62000,San Francisco,2021-07-23


2. Select Columns: Print only the name and age columns.

In [29]:
data[['name', 'age']]

Unnamed: 0,name,age
0,Alice,28
1,Bob,35
2,Charlie,22
3,Diana,45
4,Eva,31


Filter Rows: Get rows where age > 30.


In [30]:
data[data['age']  > 30]

Unnamed: 0,name,age,gender,department,salary,city,join_date
1,Bob,35,Male,HR,60000,,2020-08-19
3,Diana,45,Female,Engineering,95000,Chicago,2019-03-11
4,Eva,31,Female,HR,62000,San Francisco,2021-07-23


Handle Missing Values: Show missing value count and fill missing city values with ‘Unknown’.


In [31]:
print (data.isnull().sum())
data['city'] = data['city'].fillna("UNKNOWN")
print (data.isnull().sum())


name          0
age           0
gender        0
department    0
salary        0
city          1
join_date     0
dtype: int64
name          0
age           0
gender        0
department    0
salary        0
city          0
join_date     0
dtype: int64



Group & Aggregate: Group by department and calculate average salary.


In [32]:
data.groupby(['department'])['salary'].mean()

department
Engineering    87500.0
HR             61000.0
Marketing      50000.0
Name: salary, dtype: float64

Sort Values: Sort by salary in descending order.



In [33]:
data.sort_values(by=['salary'], ascending=False)

Unnamed: 0,name,age,gender,department,salary,city,join_date
3,Diana,45,Female,Engineering,95000,Chicago,2019-03-11
0,Alice,28,Female,Engineering,80000,New York,2021-05-12
4,Eva,31,Female,HR,62000,San Francisco,2021-07-23
1,Bob,35,Male,HR,60000,UNKNOWN,2020-08-19
2,Charlie,22,Male,Marketing,50000,Los Angeles,2022-01-15


New Column Creation: Add age_group: ‘Youth’ (<25), ‘Adult’ (<60), ‘Senior’ (60+).



In [34]:
data['age_group'] = data['age'].apply(
    lambda age: 'Youth' if age < 25 else 'Adult' if age< 60 else 'Senior'
) 
data.head()

Unnamed: 0,name,age,gender,department,salary,city,join_date,age_group
0,Alice,28,Female,Engineering,80000,New York,2021-05-12,Adult
1,Bob,35,Male,HR,60000,UNKNOWN,2020-08-19,Adult
2,Charlie,22,Male,Marketing,50000,Los Angeles,2022-01-15,Youth
3,Diana,45,Female,Engineering,95000,Chicago,2019-03-11,Adult
4,Eva,31,Female,HR,62000,San Francisco,2021-07-23,Adult
