<a href="https://colab.research.google.com/github/Madihajavaid12345/data-manipulation-project-using-pandas-/blob/main/Employee_Data_Cleanup_%26_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

🎯 Goal:
Learn how to clean data, filter it with conditions, rename columns, and handle missing values.



🔹 1. Create a DataFrame with this data (notice some missing values None):

In [131]:
import pandas as pd
Data = {
     'Emp_Name': ['Ahmed', 'Zahra', 'Bilal', 'Saad', 'Rabi'],
     'Age': [35, 37, None, 23, 27],
     'Department':['Hr', 'Finance', 'IT', None, 'IT'],
     'Salary': [50000, 65000, 55000, 70000, None]
}
df = pd.DataFrame(Data)
print(df)

  Emp_Name   Age Department   Salary
0    Ahmed  35.0         Hr  50000.0
1    Zahra  37.0    Finance  65000.0
2    Bilal   NaN         IT  55000.0
3     Saad  23.0       None  70000.0
4     Rabi  27.0         IT      NaN


🔹 2. Check basic info:

In [132]:
print(df.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Emp_Name    5 non-null      object 
 1   Age         4 non-null      float64
 2   Department  4 non-null      object 
 3   Salary      4 non-null      float64
dtypes: float64(2), object(2)
memory usage: 292.0+ bytes
None


In [133]:
print(df.describe())

             Age        Salary
count   4.000000      4.000000
mean   30.500000  60000.000000
std     6.608076   9128.709292
min    23.000000  50000.000000
25%    26.000000  53750.000000
50%    31.000000  60000.000000
75%    35.500000  66250.000000
max    37.000000  70000.000000


🔹 3. Find rows with missing values

In [134]:
print(df.isnull())

   Emp_Name    Age  Department  Salary
0     False  False       False   False
1     False  False       False   False
2     False   True       False   False
3     False  False        True   False
4     False  False       False    True


🔹 4. Fill missing "Age" with average age

In [135]:
print(df['Age'])

0    35.0
1    37.0
2     NaN
3    23.0
4    27.0
Name: Age, dtype: float64


In [141]:
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df['Age'])

3    23.000000
0    35.000000
1    37.000000
2    31.666667
Name: Age, dtype: float64


🔹 5. Fill missing "Department" with "Not Assigned".


In [142]:
df['Department'] = df['Department'].fillna('Not Assigned')
print(df['Department'])

3    Not Assigned
0              Hr
1         Finance
2              IT
Name: Department, dtype: object


🔹 6. Drop rows where "Salary" is missing.

In [143]:
df = df.dropna(subset=['Salary'])
print(df['Salary'])

3    70000.0
0    50000.0
1    65000.0
2    55000.0
Name: Salary, dtype: float64


🔹 7. Rename "Emp_Name" to "Employee"


In [144]:
df = df.rename(columns={'Emp_Name': 'Employee'})
print(df)

  Employee        Age    Department   Salary
3     Saad  23.000000  Not Assigned  70000.0
0    Ahmed  35.000000            Hr  50000.0
1    Zahra  37.000000       Finance  65000.0
2    Bilal  31.666667            IT  55000.0


🔹 8. Filter and print only employees from "IT" department.


In [145]:
IT_employees = df.query('Department == "IT"')
print(IT_employees)

  Employee        Age Department   Salary
2    Bilal  31.666667         IT  55000.0


🔹 9. Sort employees by "Age" in ascending order.


In [146]:
df = df.sort_values(by='Age')
print(df)

  Employee        Age    Department   Salary
3     Saad  23.000000  Not Assigned  70000.0
2    Bilal  31.666667            IT  55000.0
0    Ahmed  35.000000            Hr  50000.0
1    Zahra  37.000000       Finance  65000.0


🔹 10. Reset index of the cleaned DataFrame

In [147]:
df = df.reset_index(drop=True)
print(df)

  Employee        Age    Department   Salary
0     Saad  23.000000  Not Assigned  70000.0
1    Bilal  31.666667            IT  55000.0
2    Ahmed  35.000000            Hr  50000.0
3    Zahra  37.000000       Finance  65000.0


Save your final cleaned data to a file called "cleaned_employees.csv".

In [150]:
df.to_csv('cleaned_employees.csv')

To download it to your computer, run this extra line after saving:

In [155]:
from google.colab import files
files.download('cleaned_employees.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>