In [1]:
# Title: Data Cleaning using Pandas
# Description: Check for missing values and handle them by imputing the median.

import pandas as pd
import numpy as np

# Create a sample DataFrame with missing values
data = {
    'Age': [25, 30, np.nan, 22, 28, np.nan, 35],
    'Salary': [50000, 60000, 55000, np.nan, 58000, 62000, np.nan]
}
df = pd.DataFrame(data)

# Display missing values
print("Missing values before imputation:\n", df.isnull().sum())

# Impute missing values with the median
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)

# Display the updated DataFrame
print("\nData after imputing missing values with the median:\n", df)


Missing values before imputation:
 Age       2
Salary    2
dtype: int64

Data after imputing missing values with the median:
     Age   Salary
0  25.0  50000.0
1  30.0  60000.0
2  28.0  55000.0
3  22.0  58000.0
4  28.0  58000.0
5  28.0  62000.0
6  35.0  58000.0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Salary'].fillna(df['Salary'].median(), inplace=True)
