# Check for Missing values in a dataframe
In this guide, we discuss the commands used to check for missing values in a dataframe
0. Import Pandas library
1. Load dataset
2. Navigate the dataset - .head() and .tail()
3. Check for missing values - .isna()
4. Check for missing values - .isnull()
5. Check for non-missing values - .notna()
6. Check for non-missing values - .notnull()

We practice these commands on the Titanic train dataset. 
The dataset can be downloaded from Kaggle website.

https://www.kaggle.com/c/titanic/data?select=train.csv

---------

### List of methods and properties discussed in this notebook

**Import libraries**
- import pandas as pd
- import numpy as np

**Load dataset**
- pd.read_csv()

**Overview of the dataset**
- df.head()
- df.tail()

**Check for missing values**
- pd.isna(df)
- pd.isnull(df)

**Check for non-missing values**
- pd.notna(df)
- pd.notnull(df)

-----------


## 0. Import Pandas library

In [19]:
#First, import the Pandas library
import pandas as pd
import numpy as np

<hr style="border:2px solid blue"> </hr>.

## 1. Load dataset

In [20]:
#Next, let's load the dataset into a Pandas dataframe
df = pd.read_csv('train.csv') 

#Since, we have the dataset in a csv file, we have used pd.read_csv().
#There are different functions based on the type of data we are trying to load in a dataframe.
#More details can be found here
#https://pandas.pydata.org/pandas-docs/stable/reference/io.html

#Pandas provides support to read following filetypes
#Table, CSV, Clipboard, Excel, JSON, HTML ,XML, Latex, HDFStore: PyTables (HDF5), Feather, Parquet, ORC, SAS, SPSS, SQL, Google BigQuery and STATA

<hr style="border:2px solid blue"> </hr>.

## 2. Overview of the dataset - head(), tail()

In [21]:
#Look at the top 5 rows of the dataset. 
#We can pass the count of the rows that we want to see as a parameter to the head(x). By default, its 5
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [22]:
#Look at the last 5 rows of the dataset. 
#We can pass the count of the rows that we want to see as a parameter to the tail(x). By default, its 5
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


<hr style="border:2px solid blue"> </hr>.

## 3. Check for missing values - .isna()

In [23]:
#Check for missing values
pd.isna(df)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,False,False,False,False,False,False,False,False,False,False,True,False
1,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,True,False
3,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...
886,False,False,False,False,False,False,False,False,False,False,True,False
887,False,False,False,False,False,False,False,False,False,False,False,False
888,False,False,False,False,False,True,False,False,False,False,True,False
889,False,False,False,False,False,False,False,False,False,False,False,False


In [24]:
#Get the sum of missing values in each column
pd.isna(df).sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

<hr style="border:2px solid blue"> </hr>.

## 4. Check for missing values - .isnull()

Detect missing values for an array-like object.

This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

In [25]:
#Get the sum of missing values in each column
pd.isnull(df).sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

<hr style="border:2px solid blue"> </hr>.

## 5. Check for non-missing values - .notna()

In [26]:
#Get the sum of non-missing values in each column
pd.notna(df).sum()

PassengerId    891
Survived       891
Pclass         891
Name           891
Sex            891
Age            714
SibSp          891
Parch          891
Ticket         891
Fare           891
Cabin          204
Embarked       889
dtype: int64

<hr style="border:2px solid blue"> </hr>.

## 6. Check for non-missing values - .notnull()

In [27]:
#Get the sum of non-missing values in each column
pd.notnull(df).sum()

PassengerId    891
Survived       891
Pclass         891
Name           891
Sex            891
Age            714
SibSp          891
Parch          891
Ticket         891
Fare           891
Cabin          204
Embarked       889
dtype: int64

<hr style="border:2px solid blue"> </hr>.

# **End of Sheet**