# 🧠 Pandas DataFrame Practice
This notebook contains practice questions involving reading, filtering, slicing, manipulating, and handling missing values in a Pandas DataFrame.

**Dataset:** `employee_data_with_nulls.csv` (simulated HR dataset with intentional nulls)

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('employee_data_with_nulls.csv')
df.head()

Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08,Terminated
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25,Terminated
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23,Active
3,Employee_4,42.0,Male,Bangalore,HR,28748.0,2021-01-07,Active
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09,Terminated


## 🔍 Section A: Data Loading and Exploration

1. Load the dataset and display the first 10 rows.

In [3]:
df.head(10)

Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08,Terminated
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25,Terminated
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23,Active
3,Employee_4,42.0,Male,Bangalore,HR,28748.0,2021-01-07,Active
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09,Terminated
5,Employee_6,44.0,Male,Chennai,IT,91199.0,2019-06-23,Terminated
6,Employee_7,,Female,Bangalore,HR,59766.0,2015-09-20,Resigned
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23,Terminated
8,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31,Resigned
9,Employee_10,45.0,Female,Chennai,HR,79384.0,2021-07-21,Resigned


2. Display data types and check memory usage of each column.

In [4]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        50 non-null     object 
 1   Age         47 non-null     float64
 2   Gender      50 non-null     object 
 3   City        50 non-null     object 
 4   Department  46 non-null     object 
 5   Salary      45 non-null     float64
 6   JoinDate    50 non-null     object 
 7   Status      50 non-null     object 
dtypes: float64(2), object(6)
memory usage: 17.3 KB


3. Display the column names in the DataFrame.

In [5]:
print("Column names:")
print(df.columns.tolist())

Column names:
['Name', 'Age', 'Gender', 'City', 'Department', 'Salary', 'JoinDate', 'Status']


4. Display the total number of elements and the shape of the DataFrame.

In [6]:
print(f"Total number of elements: {df.size}")
print(f"Shape of DataFrame: {df.shape}")
print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")

Total number of elements: 400
Shape of DataFrame: (50, 8)
Number of rows: 50
Number of columns: 8


5. Show a statistical summary of numerical columns.

In [7]:
df.describe()

Unnamed: 0,Age,Salary
count,47.0,45.0
mean,37.574468,63068.777778
std,9.128895,22101.322375
min,23.0,25854.0
25%,29.5,46959.0
50%,39.0,64353.0
75%,45.0,80680.0
max,54.0,93840.0


## 🟦 Section B: Filtering and Slicing

6. Display all records where `Salary` is greater than ₹50,000.

In [8]:
high_salary = df[df['Salary'] > 50000]
print(f"Number of employees with salary > 50,000: {len(high_salary)}")
high_salary

Number of employees with salary > 50,000: 31


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08,Terminated
5,Employee_6,44.0,Male,Chennai,IT,91199.0,2019-06-23,Terminated
6,Employee_7,,Female,Bangalore,HR,59766.0,2015-09-20,Resigned
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23,Terminated
8,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31,Resigned
9,Employee_10,45.0,Female,Chennai,HR,79384.0,2021-07-21,Resigned
10,Employee_11,24.0,Male,Bangalore,Finance,76005.0,2016-09-15,Resigned
11,Employee_12,43.0,Female,Delhi,HR,71576.0,2017-10-13,Resigned
12,Employee_13,23.0,Female,Delhi,Finance,64353.0,2017-05-30,Resigned
13,Employee_14,45.0,Male,Mumbai,IT,87003.0,2020-08-11,Active


7. Filter rows where `City` is 'Chennai' and `Department` is 'Sales'.

In [9]:
chennai_sales = df[(df['City'] == 'Chennai') & (df['Department'] == 'Sales')]
print(f"Number of employees in Chennai Sales: {len(chennai_sales)}")
chennai_sales

Number of employees in Chennai Sales: 1


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
24,Employee_25,36.0,Female,Chennai,Sales,75859.0,2021-01-08,Active


8. Display employees aged between 25 and 35.

In [10]:
age_25_35 = df[(df['Age'] >= 25) & (df['Age'] <= 35)]
print(f"Number of employees aged 25-35: {len(age_25_35)}")
age_25_35

Number of employees aged 25-35: 14


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23,Active
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23,Terminated
18,Employee_19,33.0,Male,Bangalore,Sales,,2020-08-18,Resigned
26,Employee_27,28.0,Male,Mumbai,Finance,88734.0,2019-08-24,Terminated
28,Employee_29,30.0,Male,Chennai,Finance,37688.0,2016-08-09,Active
30,Employee_31,25.0,Male,Mumbai,,62157.0,2018-12-14,Resigned
32,Employee_33,35.0,Male,Delhi,IT,77083.0,2015-04-14,Resigned
33,Employee_34,30.0,Male,Mumbai,Finance,90733.0,2015-09-11,Active
38,Employee_39,28.0,Female,Delhi,IT,76663.0,2020-02-28,Active
39,Employee_40,29.0,Male,Mumbai,,40708.0,2020-04-03,Active


9. List employees who joined after the year 2020.

In [11]:
df['JoinDate'] = pd.to_datetime(df['JoinDate'])
joined_after_2020 = df[df['JoinDate'].dt.year > 2020]
print(f"Number of employees who joined after 2020: {len(joined_after_2020)}")
joined_after_2020

Number of employees who joined after 2020: 11


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25,Terminated
3,Employee_4,42.0,Male,Bangalore,HR,28748.0,2021-01-07,Active
9,Employee_10,45.0,Female,Chennai,HR,79384.0,2021-07-21,Resigned
15,Employee_16,23.0,Male,Chennai,IT,90318.0,2021-04-20,Resigned
20,Employee_21,46.0,Female,Delhi,HR,,2022-11-03,Resigned
21,Employee_22,48.0,Male,Delhi,Sales,32392.0,2021-03-05,Active
24,Employee_25,36.0,Female,Chennai,Sales,75859.0,2021-01-08,Active
34,Employee_35,47.0,Female,Delhi,Sales,59698.0,2022-01-01,Resigned
37,Employee_38,49.0,Female,Delhi,HR,67107.0,2021-01-09,Active
42,Employee_43,25.0,Male,Chennai,HR,,2022-11-09,Terminated


10. Show records where `Status` is not 'Active'.

In [12]:
not_active = df[df['Status'] != 'Active']
print(f"Number of employees who are not active: {len(not_active)}")
not_active

Number of employees who are not active: 29


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08,Terminated
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25,Terminated
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09,Terminated
5,Employee_6,44.0,Male,Chennai,IT,91199.0,2019-06-23,Terminated
6,Employee_7,,Female,Bangalore,HR,59766.0,2015-09-20,Resigned
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23,Terminated
8,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31,Resigned
9,Employee_10,45.0,Female,Chennai,HR,79384.0,2021-07-21,Resigned
10,Employee_11,24.0,Male,Bangalore,Finance,76005.0,2016-09-15,Resigned
11,Employee_12,43.0,Female,Delhi,HR,71576.0,2017-10-13,Resigned


11. Show first 5 rows with only `Name`, `Salary`, and `Department` columns.

In [13]:
df[['Name', 'Salary', 'Department']].head(5)

Unnamed: 0,Name,Salary,Department
0,Employee_1,83053.0,Finance
1,Employee_2,46959.0,Finance
2,Employee_3,30530.0,
3,Employee_4,28748.0,HR
4,Employee_5,,IT


12. Display rows from index 10 to 20.

In [14]:
df.iloc[10:21]

Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
10,Employee_11,24.0,Male,Bangalore,Finance,76005.0,2016-09-15,Resigned
11,Employee_12,43.0,Female,Delhi,HR,71576.0,2017-10-13,Resigned
12,Employee_13,23.0,Female,Delhi,Finance,64353.0,2017-05-30,Resigned
13,Employee_14,45.0,Male,Mumbai,IT,87003.0,2020-08-11,Active
14,Employee_15,51.0,Female,Bangalore,IT,77733.0,2018-08-20,Terminated
15,Employee_16,23.0,Male,Chennai,IT,90318.0,2021-04-20,Resigned
16,Employee_17,42.0,Female,Delhi,Sales,48664.0,2016-03-28,Terminated
17,Employee_18,54.0,Male,Bangalore,Finance,92172.0,2018-03-29,Terminated
18,Employee_19,33.0,Male,Bangalore,Sales,,2020-08-18,Resigned
19,Employee_20,43.0,Female,Mumbai,Sales,25854.0,2019-08-10,Active


13. Show the last 7 rows of numeric columns only.

In [15]:
numeric_cols = df.select_dtypes(include=[np.number]).columns
df[numeric_cols].tail(7)

Unnamed: 0,Age,Salary
43,23.0,59754.0
44,,36411.0
45,25.0,27911.0
46,50.0,92270.0
47,39.0,33680.0
48,47.0,36111.0
49,31.0,62504.0


14. Display employees with age 42 and 45.

In [16]:
age_42_45 = df[df['Age'].isin([42, 45])]
print(f"Number of employees with age 42 or 45: {len(age_42_45)}")
age_42_45

Number of employees with age 42 or 45: 6


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status
3,Employee_4,42.0,Male,Bangalore,HR,28748.0,2021-01-07,Active
8,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31,Resigned
9,Employee_10,45.0,Female,Chennai,HR,79384.0,2021-07-21,Resigned
13,Employee_14,45.0,Male,Mumbai,IT,87003.0,2020-08-11,Active
16,Employee_17,42.0,Female,Delhi,Sales,48664.0,2016-03-28,Terminated
27,Employee_28,42.0,Male,Delhi,Finance,,2016-09-19,Terminated


## 🟨 Section C: Data Manipulation

15. Add a new column `Tax` which is 10% of `Salary`.

In [17]:
df['Tax'] = df['Salary'] * 0.10
print("Tax column added successfully!")
df[['Name', 'Salary', 'Tax']].head()

Tax column added successfully!


Unnamed: 0,Name,Salary,Tax
0,Employee_1,83053.0,8305.3
1,Employee_2,46959.0,4695.9
2,Employee_3,30530.0,3053.0
3,Employee_4,28748.0,2874.8
4,Employee_5,,


16. Replace all 'HR' values in `Department` with 'Human Resources'.

In [18]:
df['Department'] = df['Department'].replace('HR', 'Human Resources')
print("HR replaced with Human Resources")
print("Unique departments after replacement:")
print(df['Department'].unique())

HR replaced with Human Resources
Unique departments after replacement:
['Finance' nan 'Human Resources' 'IT' 'Sales']


17. Add a new row in the DataFrame

In [19]:
new_row = {
    'Name': 'New_Employee',
    'Age': 30,
    'Gender': 'Male',
    'City': 'Pune',
    'Department': 'IT',
    'Salary': 55000,
    'JoinDate': '2023-01-15',
    'Status': 'Active',
    'Tax': 5500
}

df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
print(f"New row added. DataFrame now has {len(df)} rows")
df.tail(3)

New row added. DataFrame now has 51 rows


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status,Tax
48,Employee_49,47.0,Male,Bangalore,Human Resources,36111.0,2016-02-02 00:00:00,Terminated,3611.1
49,Employee_50,31.0,Female,Chennai,IT,62504.0,2018-07-24 00:00:00,Active,6250.4
50,New_Employee,30.0,Male,Pune,IT,55000.0,2023-01-15,Active,5500.0


18. Drop the `Status` column from the DataFrame.

In [20]:
df = df.drop('Status', axis=1)
print("Status column dropped successfully!")
print(f"Remaining columns: {df.columns.tolist()}")
df.head()

Status column dropped successfully!
Remaining columns: ['Name', 'Age', 'Gender', 'City', 'Department', 'Salary', 'JoinDate', 'Tax']


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Tax
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08 00:00:00,8305.3
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25 00:00:00,4695.9
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23 00:00:00,3053.0
3,Employee_4,42.0,Male,Bangalore,Human Resources,28748.0,2021-01-07 00:00:00,2874.8
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09 00:00:00,


19. Drop the row with index=5

In [21]:
df = df.drop(index=5)
print("Row with index 5 dropped successfully!")
print(f"DataFrame now has {len(df)} rows")
df.head(10)

Row with index 5 dropped successfully!
DataFrame now has 50 rows


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Tax
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08 00:00:00,8305.3
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25 00:00:00,4695.9
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23 00:00:00,3053.0
3,Employee_4,42.0,Male,Bangalore,Human Resources,28748.0,2021-01-07 00:00:00,2874.8
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09 00:00:00,
6,Employee_7,,Female,Bangalore,Human Resources,59766.0,2015-09-20 00:00:00,5976.6
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23 00:00:00,8608.7
8,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31 00:00:00,9384.0
9,Employee_10,45.0,Female,Chennai,Human Resources,79384.0,2021-07-21 00:00:00,7938.4
10,Employee_11,24.0,Male,Bangalore,Finance,76005.0,2016-09-15 00:00:00,7600.5


20. Reset the row-index

In [22]:
df = df.reset_index(drop=True)
print("Row index reset successfully!")
df.head(10)

Row index reset successfully!


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Tax
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08 00:00:00,8305.3
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25 00:00:00,4695.9
2,Employee_3,29.0,Female,Chennai,,30530.0,2020-05-23 00:00:00,3053.0
3,Employee_4,42.0,Male,Bangalore,Human Resources,28748.0,2021-01-07 00:00:00,2874.8
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09 00:00:00,
5,Employee_7,,Female,Bangalore,Human Resources,59766.0,2015-09-20 00:00:00,5976.6
6,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23 00:00:00,8608.7
7,Employee_9,45.0,Male,Mumbai,IT,93840.0,2020-12-31 00:00:00,9384.0
8,Employee_10,45.0,Female,Chennai,Human Resources,79384.0,2021-07-21 00:00:00,7938.4
9,Employee_11,24.0,Male,Bangalore,Finance,76005.0,2016-09-15 00:00:00,7600.5


## 🟥 Section D: Handling Missing Values

21. Find the total number of missing values in each column.

In [23]:
missing_values = df.isnull().sum()
print("Missing values per column:")
print(missing_values)
print(f"\nTotal missing values in dataset: {missing_values.sum()}")

Missing values per column:
Name          0
Age           3
Gender        0
City          0
Department    4
Salary        5
JoinDate      0
Tax           5
dtype: int64

Total missing values in dataset: 17


22. Display all rows where `Salary` is missing.

In [24]:
salary_missing = df[df['Salary'].isnull()]
print(f"Number of rows with missing Salary: {len(salary_missing)}")
salary_missing

Number of rows with missing Salary: 5


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Tax
4,Employee_5,40.0,Female,Delhi,IT,,2018-12-09 00:00:00,
17,Employee_19,33.0,Male,Bangalore,Sales,,2020-08-18 00:00:00,
19,Employee_21,46.0,Female,Delhi,Human Resources,,2022-11-03 00:00:00,
26,Employee_28,42.0,Male,Delhi,Finance,,2016-09-19 00:00:00,
41,Employee_43,25.0,Male,Chennai,Human Resources,,2022-11-09 00:00:00,


23. Count values in each row.

In [25]:
df = pd.read_csv('employee_data_with_nulls.csv')
df['Non_Null_Count'] = df.count(axis=1)
print("Non-null value count for each row:")
df[['Name', 'Non_Null_Count']].head(10)

Non-null value count for each row:


Unnamed: 0,Name,Non_Null_Count
0,Employee_1,8
1,Employee_2,8
2,Employee_3,7
3,Employee_4,8
4,Employee_5,7
5,Employee_6,8
6,Employee_7,7
7,Employee_8,8
8,Employee_9,8
9,Employee_10,8


24. Drop all rows with any missing values.

In [26]:
df_no_missing = df.dropna()
print(f"Original DataFrame shape: {df.shape}")
print(f"DataFrame after dropping rows with missing values: {df_no_missing.shape}")
print(f"Rows dropped: {df.shape[0] - df_no_missing.shape[0]}")
df_no_missing.head()

Original DataFrame shape: (50, 9)
DataFrame after dropping rows with missing values: (39, 9)
Rows dropped: 11


Unnamed: 0,Name,Age,Gender,City,Department,Salary,JoinDate,Status,Non_Null_Count
0,Employee_1,50.0,Female,Delhi,Finance,83053.0,2019-12-08,Terminated,8
1,Employee_2,36.0,Female,Mumbai,Finance,46959.0,2021-11-25,Terminated,8
3,Employee_4,42.0,Male,Bangalore,HR,28748.0,2021-01-07,Active,8
5,Employee_6,44.0,Male,Chennai,IT,91199.0,2019-06-23,Terminated,8
7,Employee_8,32.0,Female,Bangalore,IT,86087.0,2019-11-23,Terminated,8


25. Drop coloumns where all values are missing.

In [27]:
df = pd.read_csv('employee_data_with_nulls.csv')
df_clean = df.dropna(axis=1, how='all')
print(f"Original DataFrame shape: {df.shape}")
print(f"DataFrame after dropping columns with all missing values: {df_clean.shape}")
print("Remaining columns:", df_clean.columns.tolist())

Original DataFrame shape: (50, 8)
DataFrame after dropping columns with all missing values: (50, 8)
Remaining columns: ['Name', 'Age', 'Gender', 'City', 'Department', 'Salary', 'JoinDate', 'Status']


26. Fill missing values in `Department` with 'Unknown'.

In [28]:
df = pd.read_csv('employee_data_with_nulls.csv')
df['Department'] = df['Department'].fillna('Unknown')
print("Missing Department values filled with 'Unknown'")
print("Department value counts:")
print(df['Department'].value_counts())

Missing Department values filled with 'Unknown'
Department value counts:
Department
HR         14
Finance    13
IT         11
Sales       8
Unknown     4
Name: count, dtype: int64


27. Replace missing `Salary` values with the mean of the column.

In [29]:
salary_mean = df['Salary'].mean()
df['Salary'] = df['Salary'].fillna(salary_mean)
print(f"Missing Salary values filled with mean: {salary_mean:.2f}")
print(f"Missing values in Salary column after filling: {df['Salary'].isnull().sum()}")
df[['Name', 'Salary']].head(10)

Missing Salary values filled with mean: 63068.78
Missing values in Salary column after filling: 0


Unnamed: 0,Name,Salary
0,Employee_1,83053.0
1,Employee_2,46959.0
2,Employee_3,30530.0
3,Employee_4,28748.0
4,Employee_5,63068.777778
5,Employee_6,91199.0
6,Employee_7,59766.0
7,Employee_8,86087.0
8,Employee_9,93840.0
9,Employee_10,79384.0


28. Replace missing `Age` values with the median of the column.

In [30]:
age_median = df['Age'].median()
df['Age'] = df['Age'].fillna(age_median)
print(f"Missing Age values filled with median: {age_median}")
print(f"Missing values in Age column after filling: {df['Age'].isnull().sum()}")
df[['Name', 'Age']].head(10)

Missing Age values filled with median: 39.0
Missing values in Age column after filling: 0


Unnamed: 0,Name,Age
0,Employee_1,50.0
1,Employee_2,36.0
2,Employee_3,29.0
3,Employee_4,42.0
4,Employee_5,40.0
5,Employee_6,44.0
6,Employee_7,39.0
7,Employee_8,32.0
8,Employee_9,45.0
9,Employee_10,45.0


29. Display unique values from `Department`.

In [31]:
unique_departments = df['Department'].unique()
print("Unique departments:")
print(unique_departments)
print(f"\nNumber of unique departments: {len(unique_departments)}")
print("\nDepartment value counts:")
print(df['Department'].value_counts())

Unique departments:
['Finance' 'Unknown' 'HR' 'IT' 'Sales']

Number of unique departments: 5

Department value counts:
Department
HR         14
Finance    13
IT         11
Sales       8
Unknown     4
Name: count, dtype: int64


30. Convert uppercase for column 'City'.

In [32]:
df['City'] = df['City'].str.upper()
print("City column converted to uppercase")
print("Unique cities after conversion:")
print(df['City'].unique())
df[['Name', 'City']].head(10)

City column converted to uppercase
Unique cities after conversion:
['DELHI' 'MUMBAI' 'CHENNAI' 'BANGALORE']


Unnamed: 0,Name,City
0,Employee_1,DELHI
1,Employee_2,MUMBAI
2,Employee_3,CHENNAI
3,Employee_4,BANGALORE
4,Employee_5,DELHI
5,Employee_6,CHENNAI
6,Employee_7,BANGALORE
7,Employee_8,BANGALORE
8,Employee_9,MUMBAI
9,Employee_10,CHENNAI
