# Data Quality Assurance Report

In the following, we use [`pandas_profiling`](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/index.html) to create a quality report of a dataset, as a step in providing a QA report. 

The dataset file `telecom.csv`, accompanied by this notebook, is a dataset with target variable `churnid` indicating if a customer of a telecommunication company has churned. Here, we use a very small portion of the dataset to decrease the final notebook size to satisfy Github's file size requiremnet, to less than 25M.

The data description file `telecom_info.xlsx` is also available.

### Setup

In [1]:
# import the required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pandas_profiling import ProfileReport

In [2]:
# read csv file and create a dataframe
data = pd.read_csv("telecom.csv")

In [3]:
# get data profiling
profile = ProfileReport(data, 
                        title="Data Quality Report",
                       # minimal=True
                       )
# display the report in widget or HTML format
#profile.to_widgets()             # for widget report
#profile.to_notebook_iframe()     # for HTML report

**Remark:** We can create the report using 

- `profile.to_widgets()` for widget report
- `profile.to_notebook_iframe()` for HTML report

However, the created report will not be displayed when we reopen the notebook. Indeed, the `to_notebook_iframe` above (in the HTML case) does not actually return anything, except it uses the IPython display function to display the output in the notebook. As a workaround, we save generate a HTML report file as follows.

In [4]:
# generate a HTML report file
profile.to_file('profile_report.html')

Summarize dataset:   0%|          | 0/40 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

Now, we can refer to the saved (accompanied) html file `profile_report.html` to see the report.

Yet, another way to create a report, which is displayed in the saved notebook after reopening it, is to use Python's `Panel` library: see [here](https://pypi.org/project/panel/), [here](https://panel.holoviz.org/index.html), and [here](https://discourse.holoviz.org/t/cant-display-pandas-profiling-report/760).