# **Sample Quality Check - Durham Police Department Arrest Reports**
### Introduction
<font color=#FF0000>*TBD*</font><br><br>
This document contains sample code and instructions on how to evaluate the conditions of data once it is in a table format based on factors such as accuracy, completeness, consistency, reliability and whether it's up to date.
- **Quality metrics**: 
    - Completenes % (Counts & proportions of NAs)
        - Which NAs are relevant? Which should we try to impute or delete entirely?
    - Consistency (Value Counts, search for typos)
        - How to fix inconsistent categorical values?
    - Reliability (Perceived vs. Self reported, which values should be consistent?)
    - Currency (Dates, how old is too old?)
- **Summary statistics**:
    - Mean, min, max for continuous variables, crosstabs for discrete
    - Cross-comparison counts for discrete categorical variables 
- **Distributions**:
    - Histograms for continuous variables
    - Crosstabs, barplots for discrete categoricals variables


In [None]:
# Load that data in table format:
# https://www.practicaldatascience.org/html/pandas_series.html offers a quick tutorial on how to use the Pandas library if you are not familiar.
# The most common table data format is csv (comma separated values).
# Other common functions you may use to load the data are: pd.read_excel, pd.read_stata.

try:
    import piplite
    await piplite.install(['ipywidgets'])
except ImportError:
    pass
import ipywidgets as widgets
import io

import pandas as pd

pd.set_option("display.max_columns", 500)
pd.set_option("display.max_rows", 500)


uploader = widgets.FileUpload()
display(uploader)

In [None]:
if(uploader):
    uploaded_file = uploader.value[0]
    pd.read_csv(io.BytesIO(uploaded_file.content),index_col=[0])

In [None]:
# Take a first look:
pd.set_option("display.max_rows", None)
numRows = widgets.Dropdown(
    options=['5', '10', '15', '20'],
    value='5',
    description='Number:',
    disabled=False,
)
print("Select the number of rows you would like to preview")
display(numRows)

In [None]:
arrests.sample(int(numRows.value))