# Handling Missing Data

In [46]:
import pandas as pd

So now we have to deal with handling missing values in a series or a dataframe

Suppose we have a dataframe

In [47]:
data = pd.DataFrame([{"Name":"Rubab", "Age":13}, {"Name":"Hira", "Age":12}, {"Name":"Fatima", "Age":15}, {"Age":12}, {"Name":"Hira"}])

In [48]:
data

Unnamed: 0,Name,Age
0,Rubab,13.0
1,Hira,12.0
2,Fatima,15.0
3,,12.0
4,Hira,


I have added two missing values in the data. This is a small dataframe of no more than 5 rows and two columns but in real life datasets, the number of rows and columns reach out to multiple lakhs.

## How to find out the null values in the data

In [49]:
data.isnull()

Unnamed: 0,Name,Age
0,False,False
1,False,False
2,False,False
3,True,False
4,False,True


Wherever there is a null value, True will be shown

In [50]:
data.isnull().sum()

Name    1
Age     1
dtype: int64

This above code will show the null values column wise

And to get the overall total number of NULL values in the dataset, the sum() function is again used like this

In [51]:
data.isnull().sum().sum()

2

So there are 2 null values in the dataset

## This is how we remove the null values from the dataset

In [52]:
data.dropna()

Unnamed: 0,Name,Age
0,Rubab,13.0
1,Hira,12.0
2,Fatima,15.0


The rows containing the null values will be dropped.

However, the original dataframe will now change

In [53]:
data

Unnamed: 0,Name,Age
0,Rubab,13.0
1,Hira,12.0
2,Fatima,15.0
3,,12.0
4,Hira,


The null values persist in the original data

## For this, we have to do either of the following:

In [54]:
data = data.dropna()

In [55]:
data

Unnamed: 0,Name,Age
0,Rubab,13.0
1,Hira,12.0
2,Fatima,15.0


# or

In [56]:
data.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.dropna(inplace=True)


***