# Missingness example

In [None]:
import pandas as pd
import numpy as np
from pace.missingness import *

In [None]:
df = pd.read_csv("../data/test_data_merged_10000.csv", low_memory=False)

In [None]:
df.tail()

### Construction from a dataframe

In [None]:
m = Missingness.from_data_frame(df)

### Counting missingness patterns

`Missingness.counts()` returns each distinct missingness pattern, along with the number of records satisfying the pattern (in the final column, `_count`)

In [None]:
m.counts().head()

Can pass a subset of pattern keys (equivalent to `m.counts().loc[selection]`, but could be faster):

In [None]:
m.counts(pattern_selection=[2, 5])

Convenience function to use this to make the heatmap data

In [None]:
heatmap_data(m)

### Selecting a subset of columns

This will return a new Missingness object, based on the column selection (formerly distinct patterns may need to be merged)

In [None]:
m.select_columns(["Key", "Num_DIAG", "DIAG_01", "DIAG_02", "DIAG_03"]).counts()

### Describing missingness patterns

Can match on particular missingness patterns. For example, the following describes any pattern with `DIAG_01` and `DIAG_02` missing (ignoring the other columns).

In [None]:
Col("DIAG_02") & Col("OPDATE_02")

Can use the description to select individual records from the dataframe. The index of the dataframe, `pattern_key`, refers to the particular missingness pattern, and `_index` is the index in the original dataframe.

In [None]:
m.matches(Col("DIAG_02") & Col("MYOPDATE_02"))

Extracting the matching records from the original dataframe

In [None]:
df.loc[m.matches(Col("DIAG_02") & Col("MYOPDATE_02"))['_index']]

A helper function (based on `matches`) to extract the missingness counts for each column:

In [None]:
value_bar_chart_data(m)