# A New Package in Town: Missingno
## Visualize missing values for deep insights
<img src='images/puzzle.jpg'></img>
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://pixabay.com/users/422737-422737/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=654110'>Hebi B.</a>
        on 
        <a href='https://pixabay.com/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=654110'>Pixabay</a>
    </strong>
</figcaption>

### Introduction <small id='intro'></small>

Missing data is an unavoidable challenge in data science. Because it is so common, there are so many techniques, methods and packages to impute missing data. This can be both a blessing and a curse. While having a wide range of techniques and methods under your toolbelt may prepare you for any obstacle, choosing one specific solution to your unique case can be a real head-scratcher.

But is it really true that each case is unique? Turns out, no. Regardless of the data, missingness can be grouped into these three categories: Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR). 

These 3 categories have their own patterns and features. Finding out which one of them the missing values fall into can significantly narrow down the set of solutions you can apply to. There are big differences between each missingness type and blindly implementing a random solution may seriously compromise the next stages of your workflow. 

In the next sections, you will learn about the differences between each missingness category in detail with examples. Mainly, we will use a visual approach to find the patterns of missingness using `Missingno` package. 

### Setup <small id='setup'></small>

In [1]:
# Scientific libraries
import numpy as np
import pandas as pd
# Missingno to be imported later

# Visual setup
%config InlineBackend.figure_format = 'retina'

I generated a fake dataset to show you examples of different missingness types:

In [2]:
survey = pd.read_csv('data/missingness.csv')
survey.sample(5)

Unnamed: 0,first_name,last_name,age,favorite_os,IQ
9571,Andrea,,61,AndroidOS,113.0
6969,Daisy,Powell,61,AndroidOS,101.0
5941,Josie,Adams,65,AndroidOS,106.0
1516,Brooks,White,46,iOS,104.0
1761,Karter,Russel,49,iOS,107.0


In [3]:
survey.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   first_name   9774 non-null   object 
 1   last_name    9770 non-null   object 
 2   age          10000 non-null  int64  
 3   favorite_os  9309 non-null   object 
 4   IQ           9432 non-null   float64
dtypes: float64(1), int64(1), object(3)
memory usage: 390.8+ KB


### Missingno Basics <small id='basics'></small>

### Missingness Types: MCAR <small id='mcar'></small>

### Missingness Types: MAR <small id='mar'></small>

### Missingness Types: MNAR or NMAR <small id='mnar'></small>

### Missingness Correlation Heatmaps <small id='heatmap'></small>