# **Sample Quality Check - Durham Police Department Arrest Reports**
### Introduction
<font color=#FF0000>*TBD*</font><br><br>
This document contains sample code and instructions on how to evaluate the conditions of data once it is in a table format based on factors such as accuracy, completeness, consistency, reliability and whether it's up to date.
- **Quality metrics**: 
    - Completenes % (Counts & proportions of NAs)
        - Which NAs are relevant? Which should we try to impute or delete entirely?
    - Consistency (Value Counts, search for typos)
        - How to fix inconsistent categorical values?
    - Reliability (Perceived vs. Self reported, which values should be consistent?)
    - Currency (Dates, how old is too old?)
- **Summary statistics**:
    - Mean, min, max for continuous variables, crosstabs for discrete
    - Cross-comparison counts for discrete categorical variables 
- **Distributions**:
    - Histograms for continuous variables
    - Crosstabs, barplots for discrete categoricals variables

### **Step #1:**

In [43]:
# Load that data in table format:
# https://www.practicaldatascience.org/html/pandas_series.html offers a quick tutorial on how to use the Pandas library if you are not familiar.
# The most common table data format is csv (comma separated values). 
# Other common functions you may use to load the data are: pd.read_excel, pd.read_stata. 

import pandas as pd 
arrests = pd.read_csv('/Users/clarissaache/Documents/Data+/Small-Town-Policing-Accountability-Data-2022/10 Clean Data/arrests_charges.csv')

# Take a first look:
pd.set_option('display.max_rows', None)
arrests.sample(5)

Unnamed: 0.1,Unnamed: 0,agencyname,datetimeofarrest,file,arrestnumber,scars_tattoes_bodymarkings_etc,age,race,sex,citizenship,skintone,height,weight,haircolor,eyecolor,armed,typeofarrest,placeofarrest,page_num,charges,charge_type,charge_counts,charge_IBRcode,charge_statutenumber,charge_warrantdate,chargenum
21390,21390,Durham Police Department,04/28/2022 14:23,00 Raw Data/data/arrests0003659.pdf,330292,,27,B,M,US,Dark,5'07,145.0,Black,Brown,UNARMED,ON VIEW,"3533 HILLSBOROUGH RD, DURHAM",0.0,Possession Of Drug Paraphernalia,Misd,1.0,35B,90-113.22,04/28/2022,2
7173,7173,Durham Police Department,02/18/2020 18:26,00 Raw Data/data/arrests0009320.pdf,317162,"TATT BTH ARM; TATT LEFT FOREARM /""BROOKLYN""",25,B,M,US,Medium,5'09,155.0,Brown,Brown,UNARMED,ON VIEW,"143 COMMERCE ST - U, DURHAM",0.0,Possess Control Substance Schedule Ii,Fel,1.0,35A,90-95(A3)2,02/18/2020,1
5891,5891,Durham Police Department,03/01/2021 15:00,00 Raw Data/data/arrests0004980.pdf,322331,,19,B,M,US,Dark,5'06,120.0,Black,Brown,UNARMED,ON VIEW,"1200 SPRUCE ST, DURHAM",0.0,Possessing Stolen Goods,Fel,1.0,280,14-71.1,03/01/2021,1
13432,13432,Durham Police Department,01/26/2021 02:03,00 Raw Data/data/arrests0006011.pdf,321820,TATT CHEST / CROWN; TATT RIGH WRIST /CHI...,29,B,F,US,Medium,5'01,124.0,Black,Brown,UNARMED,TAKEN INTO CUSTODY,"219 S MANGUM ST, DURHAM",0.0,Communicating Threats,Misd,1.0,13C,14-277.1,07/04/2015,1
5007,5007,Durham Police Department,01/14/2019 19:57,00 Raw Data/data/arrests0006938.pdf,308739,TATT LEFT NECK / LISA,38,B,F,US,Medium,5'07,210.0,Black,Brown,UNARMED,TAKEN INTO CUSTODY,"219 S MANGUM ST, DURHAM",0.0,Warrant Service For Other Jurisdiction,Misd,1.0,9910,WARR,01/14/2019,1


### **Step #2: Which type of data do we have?**
Typically, police records inform of interactions between the police and the 