In [209]:
import pandas as pd
import os
import numpy as np
import sklearn as sk
from sklearn import metrics
import matplotlib.pyplot as plt
import itertools
import notebook_support
reload(notebook_support)
from IPython.display import Image
from IPython.core.display import display, HTML
import random

# Preventing and Diagnosing Bias in CivicScape

Increasingly, police departments are trying to harness the power of data and algorithms to increase both the safety of and relationships with the communities they serve. At CivicScape, we are committed to both aspects of law enforcement’s fundamental mission. That means we're not only concerned about the general accuracy of our tool; we're also concerned with issues of bias in data and algorithms. Unfortunately, crime data comes from humans collecting data about human actions, and so like many other data sources, it is subject to human error. That is why the quality of the our tool is unparalleled: CivicScape is built with a primary focus on using only the most reliable data and includes multiple safeguards to enforce that standard. This doesn't mean that law enforcement and communities can't use crime data to anticipate crime, but it does mean we must understand and measure bias in crime data that can result in disparate public safety outcomes within a community. 

We believe that the CivicScape tool is an important advancement in the ability to deploy officers effectively for police departments who in need of ways to do more with fewer resources, just as we believe that measuring how risk assessments impact different community stakeholders is an ongoing and necessary responsibility in implementing this tool. By making our code and data open-source, we are inviting feedback and conversation about CivicScape in the belief that many eyes make our tools better for all. Our motto, Safety in Numbers, is not just about the potential benefit to public safety that CivicScape represents in its numerical risk scores, but also the additional safety from openness and transparency around the tool's risk scores. 

Conversations about transparency and accountability in law enforcement and algorithms alike are vital conversations that we want to forward. Here are the questions that we’re trying to answer and keep learning about in this notebook: 
* What is bias in data and why does it matter? 
* Why is there bias in crime data? 
* What kinds of crime data have bias? 
* What kinds of crime data does CivicScape use in models and why? 
* What can we do to prevent bias in crime risk assessment? 

Lots of research is going on about how algorithms can be transparent, accountable, and fair. We look forward to being involved in this important conversation. 

## What is bias in data and why does it matter? 

Bias in data is systematic difference between truth and what is captured by data. Bias is a huge concern for those who use data to draw conclusions because biased data will, on average, result in the wrong conclusion (Cochrane Review). Because we can’t take a true measure of crime, we are left to make decisions with imperfect, potentially biased alternatives. As such, any analyses -- from plotted points on a map to a sophisticated machine learning system --  that use crime data in the form of arrests, reports, or surveys will produce biased estimates (Pepper, Petrie, and Sullivan, 2010). 

While biased estimates are not exclusive to crime data, there are particular challenges unique to crime data that have impeded a full understanding of and quantification of these biases.  For instance, national crime statistics are by nature aggregations of local crime statistics. To the extent that local crime data are biased, national crime statistics assume this bias as well. The more complex issue with national aggregations is that the magnitude and direction of this bias is not known and is not stable over time in some sources (See Pepper, Petrie, Sullivan, 2010; Maltz, 2006; Maltz and Targonoski 2002, 2003; Jarvis and Lynch 2008 among others).

An especially concerning impact of bias in data is that its use in algorithms could exacerbate disparities, a primary concern given the significant racial and ethnic disparity in the rates of arrest, conviction and incarceration in the criminal justice system (Tonry and Milewski, 2008; Lum and Isaac, 2016). There is also disagreement about and continuing research on how best to measure racial and ethnic discrimination (Blank, Dabady, & Citro, 2004). One approach to measuring it is to determine if the impact of a recommendation or policy is different for different subsets of stakeholders, known as disparate impact measurement. There is not, however, a consensus about the right baseline against which to measure racial discrimination. 

### Why is there bias in crime data? 

In application to crime data, bias is driven by the systematic over- or under-reporting of crime data that can result from the data source and from those who record it. Data bias results from the complex actions, characteristics, and relationships of communities and law enforcement. Consider a typical police jurisdiction and the many factors that could drive bias in crime data. Police might patrol more in high-crime communities which can increase arrests in those communities. At the same time, communities experiencing high crime might draw additional police presence. Add to this that some communities may make disproportionately high calls for service, but other communities are less likely to do so. Further, the interactions between police and community can contribute to bias in crime data, where poor police-community relations can cause individuals to mistrust and not cooperate with police, or for police to act defensively if they feel unsafe or unwanted in a community. 

Different ways of measuring crime are subject to different levels of bias. Crime data that is least biased is the data closest to the crime event, such as incident reports. Whereas data such as convictions, and to a lesser extent arrests or calls for service, are one or more steps removed from the crime event. For example, while African Americans comprise 13% percent of the population, they represent 28% of total arrests but 38% felony convictions in state courts (Nellis and King, 2009). Similarly, Latinos comprise 15% of the population, but 22.3% of those in state prison. Surveys of crime victims and of police contact are also useful comparisons to address the opposite issue- under-reporting of crime.

### What kinds of crime are biased? 

Some estimates about to what extent crime data is under-reported originate from comparing surveys of crime victims to crimes reported to police. Nationally, the percentage of violent and serious violent victimizations reported to police has been stable over the past two decades. In 1993, 42% of violent victimization was reported to police and in 2015, it was 47%. Similarly for serious violent victimization, in 1993, 51% was reported and in and 2015, 55% was reported to police. Reporting of property crime has increased from 1993 (32%) to 2015 (35%).

Specifically in 2015, comparing victimization and arrest reports show: 

* Robberies - 62% reported to police in 2015 nationally
* Aggravated assaults - 62% reported to police in 2015 nationally 
* Simple assaults - 42% reported to police in 2015 nationally
* Sexual assaults - 32% reported to police in 2015 nationally
* Domestic violence - 58% reported to police in 2015 nationally

Comparing property crime victimization to arrests reports, nationally, property crime went unreported to police in 65% of cases in 2015. Specifically, comparing victimization and arrest reports show: 

* Motor vehicle theft - 69% reported to police in 2015 nationally.
* Burglary - 50% reported to police in 2015 nationally
* Theft - 29% reported to police in 2015 nationally

Minor marijuana possession cases are one of the most biased in terms of the discrepancy between the population who uses and is arrested for using drugs. The American Civil Liberties Union (ACLU) finds that marijuana use is roughly equal among African Americans and whites, yet African Americans are 3.73 times as likely to be arrested for marijuana possession. Overall data on drug use has shown that it is relatively representative of the general population, but it is more likely that drug sellers will face arrest and prison. There are no reliable surveys of drug selling, but given that people are most likely to buy drugs from someone of their same race, most researchers think that selling should be proportionate as well (Sentencing Project; Tonry et al). 

## What kind of crime data does CivicScape use? And why?

Because of the realities of crime data, **CivicScape limits the kinds of crime that are used in our models.** First, CivicScape uses types of crime data that are closest to an event and least likely to be under-reported.  These include: arrests, incidents, and calls for service. CivicScape doesn’t use data concerning prosecution outcomes, conviction outcomes, and incarceration outcomes are because they are not timely enough to make many resource allocation decisions and importantly because the further from a crime event a data point is collected, the more biased the data becomes.

Second, **CivicScape is focused on violent crime and property crime without the inclusion of low-level misdemeanor events.** Rather than focus on narcotics-related, or other lower-level crimes that we know are under-reported to a large degree, CivicScape focuses on crime data that is less likely to be under-reported. For violent crime, this means excluding simple assaults in favor of aggravated assaults and homicide. For property crime, this means excluding simple theft. This limits our dataset considerably and is determined for each city based on their reporting practices.

Further, **CivicScape doesn’t consider race or ethnicity of individuals in our tool.** This is not to say that we aren’t using variables that might be closely correlated with race and ethnicity. We use weather and historical violent crime data to run our risk scores. We do include a geographic component, a cell area. While this does not contain race or income information directly - intentionally- we acknowledge that in some cases, location of a crime event can include information that is indirectly related to race, ethnicity and income. 

### When can we use crime data to accurately and precisely make decisions about law enforcement resources? 

On a daily basis, communities, police departments, prosecutors, public defenders, residents, victims and families are already making decisions about crime with imperfect information. Though we’ve seen evidence that algorithms can only make disparate outcomes worse, even as we continue to learn about algorithmic fairness, algorithms have the potential to increase transparency in data and  the law enforcement component of the criminal justice system. 

At CivicScape, we’re focused on enhancing community safety and community trust by focusing on the following to use crime data to make accurate assessments about crime risk:

* We can use the least biased crime data based on what we know about magnitude and direction of bias. That is, no narcotics or misdemeanor crime data is used to assess crime risk in our models. 
* CivicScape currently refreshes all models with new data as often as daily. It is crucial to do this in order to avoid sending police to parts of a jurisdiction based on outdated crime data and to preserve the accuracy and precision of the model.  
* CivicScape opens and makes transparent all algorithm code, and we make model evaluations available whenever possible based on our jurisdiction partners.
* CivicScape makes data audit reports available whenever possible based on our jurisdiction partners. 
* We measure models for bias in place-based outcomes to understand how the CivicScape algorithm impacts different subsets of individuals differently. In our next section, we walk through how we approach this at CivicScape. 

**We next walk through and provide code to measure risk scores for racial and ethnic minority disparate impact.**


## Evaluating Bias in CivicScape vs. CompSTAT 

In order to evaluate how the CivicScape model performs in racial and ethnic minority communities, we need an appropriate comparison to understand how CivicScape models impact police deployment. While no perfect comparison exists, we can compare CivicScape models to those that we know police regularly implement, such as CompSTAT. Discussed further in our Police Practices Notebook, CompSTAT is a commonly-used police deployment tool that uses prior crime to determine how they deploy. By replicating the code below with your own data, you will be able to run the same disparate impact exercises CivicScape uses internally.

**To Begin:** In the following module, edit the file paths to link to your own data.


In [210]:
# Use the following string to tell this notebook where the risk scores are located
riskscore_path = "../data/risk_assessments.csv"
# Use the following strings to tell this notebook where your census data are
census_econ_path = "../data/2000_census_econ_data.csv"
census_race_path = "../data/2000_census_race_data.csv"
# Use the following string to tell this notebook where the historic crime data are located
compstat_path = "../data/historical_grouped_3_year.csv"

If you'd like to constrict the times of the CivicScape or CompSTAT models, do so below: 

In [211]:
hours = (0, 5) # Enter the hour range to include on a 24-hour clock, e.g. (3, 15) returns 3AM-3PM
days = [] # Enter each day to include as a letter i.e. ['M', 'T', 'W', 'R', 'F', 'Sa', 'Su']
date_range = () # Enter the dates as YYYY-MM-DD, e.g. ('2014-10-24', '2015-05-28')

Then, run the following modules:

In [212]:
riskscores = pd.read_csv(riskscore_path)
historical = pd.read_csv(compstat_path)
census = notebook_support.get_census_data(census_econ_path, census_race_path, race_keep=2000)
risk, hist = notebook_support.data_prep(riskscores, historical)
risk_keep, risk, hist = notebook_support.historical_prep(risk, hist)


final_risk = risk.merge(census, left_on='census_tra', right_on='geo_id')
final_risk = notebook_support.constrict_times(final_risk, hours=hours, days=days, date_range=date_range)
final_compstat = hist.merge(census, left_on='census_tra', right_on='geo_id')
final_compstat = notebook_support.bias_police_deployment_analysis(final_compstat, assumptions)


#final_compstat = notebook_support.constrict_times(final_compstat, hours=hours, days=days, date_range=date_range)


Data prep done!
This file contains risk_assessments for the test date range 2014-10-24 00:00:00 through 2015-05-28 00:00:00.

Looks like your historical dataset doesn't have days, so we'll look at months instead.

The period from 2014-10-01 00:00:00 to 2014-10-24 00:00:00 will be left off the dataset because data are missing.

The period from 2015-05-28 00:00:00 to 2015-09-01 00:00:00 will be left off the dataset because data are missing.

The final overlapping period for analysis is: 2014-10-24 00:00:00 through 2015-05-28 00:00:00

You've restricted the risk score data for the following hours:
    0:00 until 5:00


First, we break out the census tracts into quintiles based on race and ethnicity. For race and ethnicity, we take the top two quintiles of census tracts for that racial or ethnic group as predominantly composed of that race or ethnic group. For income, we take the top quintile and the bottom quintile as high income and low income, respectively.

In [213]:
#hist_merged, risk_merged, ranks = notebook_support.get_paper_comparisons(risk_keep, riskscores, final_compstat, show=False)
#average_risk, hist_merged = notebook_support.bias_police_deployment_analysis(hist_merged, risk_merged, assumptions)
#
#final_compstat = average_risk

In [214]:
## CivicScape
top_black = final_risk[final_risk.black_per_q >= 4]
top_white = final_risk[final_risk.white_per_q >= 4]
top_hisp = final_risk[final_risk.hisp_per_q >= 4]
bottom_income = final_risk[final_risk.income_quintile <= 1]
top_income = final_risk[final_risk.income_quintile >= 5]


In [215]:
## CompSTAT
top_black_compstat = final_compstat[final_compstat.black_per_q >= 4]
top_white_compstat = final_compstat[final_compstat.white_per_q >= 4]
top_hisp_compstat = final_compstat[final_compstat.hisp_per_q >= 4]
bottom_income_compstat = final_compstat[final_compstat.income_quintile <= 1]
top_income_compstat = final_compstat[final_compstat.income_quintile >= 5]


Much like in the Model Data Practices notebook, we have to set a threshold for CivicScape. You can set the threshold to any value that seems reasonable to you; however, we also provide the optimal threshold below.

In [216]:
fpr, tpr, thresholds = metrics.roc_curve(final_risk.crime_count, final_risk.risk_assessment, pos_label=1)
notebook_support.optimal_threshold(thresholds, fpr, tpr)


The optimal threshold for your data is: 0.2.



In [217]:
threshold_CivicScape = 0.2
threshold_CompSTAT = 0.2
assumptions = notebook_support.bias_build_assumptions_dict(threshold_CivicScape, threshold_CompSTAT)


Now, a quick reminder on definitions: 

- **Accuracy**: percent of the time that the model correctly anticipates whether a crime occurs.
- **True Postitive Rate**: when a crime does happen, the percent of the time the model correctly anticipates it.
- **True Negative Rate**: when a crime doesn't happen, percent of the time the model correctly anticipates none happen.
- **False Positive Rate**: when a crime doesn't happen, precent of the time the model antipates one would.
- **False Negative Rate**: when a crime does happen, percent of the time the model misses it. 

<img src="visuals/confusion_matrix.png" height="300" width="300" align="left">  

For reference, here is the confusion matrix for CivicScape and CompSTAT models across the entire jurisdiction first before we examine subsets:

In [None]:
notebook_support.confusion_matrix(final_risk, threshold=threshold_CivicScape, title='All Census Tracts - CivicScape')
notebook_support.confusion_matrix(hist_merged, threshold=threshold_CompSTAT, title='All Census Tracts - CompSTAT')

Here is a side-by-side comparison for each racial neighborhood type, first for CompSTAT and next for CivicScape: 

In [219]:
## CivicScape 

black_fp = 'visuals/black_neighborhoods_{}.png'.format(random.randint(0, 100))
white_fp = 'visuals/white_neighborhoods_{}.png'.format(random.randint(0, 100))
hisp_fp = 'visuals/hisp_neighborhoods_{}.png'.format(random.randint(0, 100))
low_fp = 'visuals/lowincome_neighborhoods_{}.png'.format(random.randint(0, 100))
high_fp = 'visuals/highincome_neighborhoods_{}.png'.format(random.randint(0, 100))
notebook_support.confusion_matrix(top_black, threshold=threshold_CivicScape, output=black_fp, title='Black Top Two Quartile Neighborhoods - CivicScape')
notebook_support.confusion_matrix(top_white, threshold=threshold_CivicScape, output=white_fp, title='White Top Two Quartile Neighborhoods - CivicScape')
notebook_support.confusion_matrix(top_hisp, threshold=threshold_CivicScape, output=hisp_fp, title='Hispanic Top Two Quartile Neighborhoods - CivicScape')
notebook_support.confusion_matrix(bottom_income, threshold=threshold_CivicScape, output=low_fp, title='Income Bottom Two Quintile Neighborhoods - CivicScape')
notebook_support.confusion_matrix(top_income, threshold=threshold_CivicScape, output=high_fp, title='Income Top Two Quintile Neighborhoods - CivicScape')


In [220]:
## CompSTAT 

black_fp_compstat = 'visuals/black_neighborhoods_{}.png'.format(random.randint(0, 100))
white_fp_compstat = 'visuals/white_neighborhoods_{}.png'.format(random.randint(0, 100))
hisp_fp_compstat = 'visuals/hisp_neighborhoods_{}.png'.format(random.randint(0, 100))
low_fp_compstat = 'visuals/lowincome_neighborhoods_{}.png'.format(random.randint(0, 100))
high_fp_compstat = 'visuals/highincome_neighborhoods_{}.png'.format(random.randint(0, 100))
notebook_support.confusion_matrix(top_black_compstat, threshold=threshold_CompSTAT, output=black_fp, title='Black Top Two Quartile Neighborhoods - COMPSTAT')
notebook_support.confusion_matrix(top_white_compstat, threshold=threshold_CompSTAT, output=white_fp, title='White Top Two Quartile Neighborhoods - COMPSTAT')
notebook_support.confusion_matrix(top_hisp_compstat, threshold=threshold_CompSTAT, output=hisp_fp, title='Hispanic Top Two Quartile Neighborhoods - COMPSTAT')
notebook_support.confusion_matrix(bottom_income_compstat, threshold=threshold_CompSTAT, output=low_fp, title='Income Bottom Two Quintile Neighborhoods - COMPSTAT')
notebook_support.confusion_matrix(top_income_compstat, threshold=threshold_CompSTAT, output=high_fp, title='Income Top Two Quintile Neighborhoods - COMPSTAT')



In [None]:
display(HTML('<img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"> <img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"> <img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px">'.format(white_fp_compstat, black_fp_compstat, hisp_fp_compstat)))
display(HTML('<img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"> <img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"> <img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px">'.format(white_fp, black_fp, hisp_fp)))

And for the lowest income vs. highest income neighborhoods:

In [None]:
display(HTML('<img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"><img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px">'.format(low_fp_compstat, high_fp_compstat)))
display(HTML('<img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px"><img src={} height="400" width="320" style="float: left; margin-left: 2px; margin-bottom: 5px; margin-right: 2px; margin-top: 5px">'.format(low_fp, high_fp)))

And here are all those statistics in a table for easy comparison:

In [None]:
## CivicScape
full = notebook_support.metrics_table(final_risk, 'Full Model')
black = notebook_support.metrics_table(top_black, 'Top Black Tracts')
white = notebook_support.metrics_table(top_white, 'Top White Tracts')
hisp = notebook_support.metrics_table(top_hisp, 'Top Hispanic Tracts')
low = notebook_support.metrics_table(bottom_income, 'Lowest Income Tracts')
high = notebook_support.metrics_table(top_income, 'Top Income Tracts')
all_metrics = pd.concat([full, black, white, hisp, low, high], axis=0)
all_metrics

In [None]:
## Compstat
full_compstat = notebook_support.metrics_table(final_compstat, 'Full Model')
black_compstat = notebook_support.metrics_table(top_black_compstat, 'Top Black Tracts')
white_compstat = notebook_support.metrics_table(top_white_compstat, 'Top White Tracts')
hisp_compstat = notebook_support.metrics_table(top_hisp_compstat, 'Top Hispanic Tracts')
low_compstat = notebook_support.metrics_table(bottom_income_compstat, 'Lowest Income Tracts')
high_compstat = notebook_support.metrics_table(top_income_compstat, 'Top Income Tracts')
all_metrics_compstat = pd.concat([full_compstat, black_compstat, white_compstat, hisp_compstat, low_compstat, high_compstat], axis=0)
all_metrics_compstat


## Bias in Data and Algorithms: how can we continue to become more transparent, effective and efficient?

Data and algorithms can drive disparate outcomes but they are also an opportunity for transparency and increased efficiency. By making crime and broader police data available and open, and analyzing it for racial and ethnic differences, it is possible to improve police efficiency. 

Several initiatives are forwarding this transparency, including: 
* White House Police Data Initiative - forwarding collaboration between law enforcement and communities to increase transparency in law enforcement data.
* The Police Foundation - hosts a listing of departments who release crime data along with information about the release.
* National Justice Database Initiative - the Center for Policing Equity has launched a national analysis framework for analyzing bias in police data. The effort, originally conceived of by a major city police chief, is aimed at not only making police data available but analyzing it in a way that is aimed at uncovering the racial and ethnic differences in policing practices. 

For more background on this conversation, you can access our working paper on the topic as well. 


### Please check back with us as we continue to add to discussions and new evaluations on transparency, effectiveness and accountability in crime risk assessment.



## References

1. "Women Less Likely to be Shown High Paying Ads for Jobs on Google, Study Shows." *The Guardian*. July 2014. https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study
2. Lum, K. and Isaac, W., 2016. To predict and serve?. Significance, 13: 14–19 
3. The War on Marijuana in Black and White. American Civil Liberties Union, 2013. Available: https://www.aclu.org/feature/war-marijuana-black-and-white 
4. White House Police Data Initiative Fact Sheet. https://www.whitehouse.gov/the-press-office/2016/04/22/fact-sheet-white-house-police-data-initiative-highlights-new-commitments
5. Police Foundation data available: https://publicsafetydataportal.org 
6. https://www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf 
7. http://policingequity.org/national-justice-database/ 
