# Working with Missing Data in Pandas

In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:

**None**: A Python object used to represent missing values in object-type arrays.

**NaN**: A special floating-point value from NumPy which is recognized by all systems that use IEEE floating-point standards.

We will see how to detect, handle and fill missing values in a DataFrame to keep the data clean and ready for analysis.

## Checking Missing Values in Pandas

### 1. Using isnull()

**isnull()** returns a DataFrame of Boolean value where True represents missing data (NaN). This is simple if we want to find and fill missing data in a dataset.

#### Example 1: Finding Missing Values in a DataFrame

In [None]:
# Importing libraries
import pandas as pd
import numpy as np

In [2]:
data={"First Score": [100, 90, np.nan, 95],
        "Second Score": [30, 45, 56, np.nan],
        "Third Score": [np.nan, 40, 80, 98]}
dataFrame=pd.DataFrame(data)

nullData=dataFrame.isnull()
nullData

Unnamed: 0,First Score,Second Score,Third Score
0,False,False,True
1,False,False,False
2,True,False,False
3,False,True,False


#### Example 2: Filtering Data Based on Missing Values

Here we used random Employee dataset. The isnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

In [7]:
employees=pd.read_csv("../../datasets/employees.csv")
employees.head()

boolSeries=pd.isnull(employees["Gender"])
missingGenderData=employees[boolSeries]
missingGenderData

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
20,Lois,,4/22/1995,7:18 PM,64714,4.934,True,Legal
22,Joshua,,3/8/2012,1:58 AM,90816,18.816,True,Client Services
27,Scott,,7/11/1991,6:58 PM,122367,5.218,False,Legal
31,Joyce,,2/20/2005,2:40 PM,88657,12.752,False,Product
41,Christine,,6/28/2015,1:08 AM,66582,11.308,True,Business Development
...,...,...,...,...,...,...,...,...
961,Antonio,,6/18/1989,9:37 PM,103050,3.050,False,Legal
972,Victor,,7/28/2006,2:49 PM,76381,11.159,True,Sales
985,Stephen,,7/10/1983,8:10 PM,85668,1.909,False,Legal
989,Justin,,2/10/1991,4:58 PM,38344,3.794,False,Legal


In [None]:
# Filtering missing value of the column First Score in Example 1

boolSeries1=nullData["First Score"]
missingFirstScoreData= pd.isnull(dataFrame[boolSeries1])
missingFirstScoreData

### 2. Using isna()

**isna()** returns a DataFrame of Boolean values where True indicates missing data (NaN). It is used to detect missing values just like isnull().

#### Example: Finding Missing Values in a DataFrame

In [None]:
data = {'Name': ['Amit', 'Sita', np.nan, 'Raj'],
        'Age': [25, np.nan, 22, 28]}

dataFrame2=pd.DataFrame(data)
dataFrame2.isna()

Unnamed: 0,Name,Age
0,False,False
1,False,True
2,True,False
3,False,False


In [13]:
# Testing in dataFrame 
dataFrame.isna()

Unnamed: 0,First Score,Second Score,Third Score
0,False,False,True
1,False,False,False
2,True,False,False
3,False,True,False


### 3. Checking for Non-Missing Values Using notnull()

**notnull()** function returns a DataFrame with Boolean values where True indicates non-missing (valid) data. This function is useful when we want to focus only on the rows that have valid, non-missing values.

#### Example 1: Identifying Non-Missing Values in a DataFrame

In [14]:
notNullData=dataFrame.notnull()
notNullData

Unnamed: 0,First Score,Second Score,Third Score
0,True,True,False
1,True,True,True
2,False,True,True
3,True,False,True


#### Example 2: Filtering Data with Non-Missing Values

**notnull()** function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

In [17]:
boolTrueGender=employees["Gender"].notnull()
validGenderData=employees[boolTrueGender]
validGenderData

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.170,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.340,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
994,George,Male,6/21/2013,5:47 PM,98874,4.479,True,Marketing
996,Phillip,Male,1/31/1984,6:30 AM,42392,19.675,False,Finance
997,Russell,Male,5/20/2013,12:39 PM,96914,1.421,False,Product
998,Larry,Male,4/20/2013,4:45 PM,60500,11.985,False,Business Development


## Filling Missing Values in Pandas

Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.