# Missing Values Cheatsheet

Techniques for handling missing values in pandas DataFrames.

In [1]:
# Load data
import pandas as pd
df = pd.read_csv('./titanic/train.csv')
print('Data loaded')

Data loaded


In [2]:
# Check for missing values
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [3]:
# Drop rows with missing values
df_dropped = df.dropna()
print(df_dropped.shape)

(183, 12)


In [4]:
# Drop columns with missing values
df_dropped_col = df.dropna(axis=1)
print(df_dropped_col.shape)

(891, 9)


In [5]:
# Fill missing values with a specific value
df_filled = df.fillna(0)
print(df_filled.isnull().sum())

PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64


In [6]:
# Fill missing values with mean
df['Age'].fillna(df['Age'].mean(), inplace=True)
print(df['Age'].isnull().sum())

0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(), inplace=True)


In [7]:
# Fill missing values with median
df['Fare'].fillna(df['Fare'].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Fare'].fillna(df['Fare'].median(), inplace=True)


In [8]:
# Fill missing values with mode
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)


In [9]:
# Forward fill
df_ffill = df.fillna(method='ffill')

  df_ffill = df.fillna(method='ffill')


In [10]:
# Backward fill
df_bfill = df.fillna(method='bfill')

  df_bfill = df.fillna(method='bfill')


In [11]:
# Interpolate
df_interp = df.interpolate()

  df_interp = df.interpolate()
