# $\color{purple}{\text{Understanding Missing Data and How to Deal with It (Part 3)}}$

## $\color{purple}{\text{Visualizing Missing Data and Analysis of Missingness Patterns}}$

### $\color{purple}{\text{Libraries for this lesson}}$

In [None]:
import pandas as pd
import missingno

## $\color{purple}{\text{Visualizing Missing Data with missingno}}$

The following data set was taken from [kaggle](https://www.kaggle.com/datasets/justinas/housing-in-london). This module draws heavily from this [tutorial](https://coderzcolumn.com/tutorials/data-science/missingno-visualize-missing-data-in-python)

In [None]:
london_housing = pd.read_csv('data/housing_in_london_yearly_variables.csv')

### $\color{purple}{\text{Bar Graph}}$
First thing most people do to check assess missing data is to count the nulls by feature.

In [None]:
london_housing.isnull().sum()

The bar graph method of `missingno` provides the same information graphically. It can be sorted and displayed logarithmically.

In [None]:
missingno.bar(london_housing, color='deepskyblue', figsize=(10,5), fontsize=12, sort='descending')

### $\color{purple}{\text{Matrix}}$
This provides a high level visual view of co-missing features.

In [None]:
missingno.matrix(london_housing, figsize=(10,5), fontsize=12, color=(0.27, 0.52, 1.0));

### $\color{purple}{\text{HeatMap}}$
Provides a more mathematical expression of co-missingness

In [None]:
missingno.heatmap(london_housing, cmap="RdYlGn", figsize=(10,5), fontsize=12);

### $\color{purple}{\text{Dendrogram}}$
This provides a breakdown of missingess clusters. It can be a valuable roadmap to guide imputation strategy

In [None]:
missingno.dendrogram(london_housing, figsize=(10,5), fontsize=12);

## $\color{purple}{\text{Takeaways}}$
* Visualization as part of exploratory data analysis can help quickly gauge the extent of missingness
* Can easily assess the different missingness patterns
* Can be used as a tool to guide imputation strategies

### $\color{purple}{\text{References}}$
 * missingno Source: https://github.com/ResidentMario/missingno
 *_missingno - Visualize Missing Data_: https://coderzcolumn.com/tutorials/data-science/missingno-visualize-missing-data-in-python
 * McDonald, A.: Using the missingno Python library to Identify and Visualise Missing Data Prior to Machine Learning, _Towards DataScience_, https://towardsdatascience.com/using-the-missingno-python-library-to-identify-and-visualise-missing-data-prior-to-machine-learning-34c8c5b5f009