# Police Deployment Analysis

While we can go into all the math and metrics around the performance of a model, ultimately its primary use--as a deployment tool--is the most important test: can the CrimesScape model actually give us a meaningful and important improvement over existing methods? 

This notebook runs through a deployment analysis that gives context to the ways in which deployment using CivicScape can have substantial impact. Every police department deploys officers in a different way that is right for their community; in this document, we make some clarifying assumptions that allow us to quantify the improvements that could be attained through the use of CivicScape. We will go through those below. 

### Comparison Point: COMPSTAT

If you have already been through the Model Data Practices document, you're familiar with COMPSTAT: we're using the same system as a baseline here.

While there are a few other machine-learning driven systems that also do crime risk assessment, much like CivicScape, most communities in the US still rely on a system that is broadly refered to as COMPSTAT: a data-driven policing deployment approach that encompasses a wide variety of different techniques and methodologies. Often, COMPSTAT uses historical crime count averages to anticipate high-crime areas for a given beat assignment. 

This seems like a pretty good comparison point for us. Since many police departments use something similar, it will help us understand how the additional features we've added--plus the algorithms used--can improve upon existing systems. So, we use *three year historical averages* for tract/day combinations and use these to establish a baseline for comparison against the same test data that we compared our model to above. You could also try configuring your own set of risk assessments and using this notebook to compare. 

In [54]:
import folium
import pandas as pd
import notebook_support
import numpy as np

# Example City Coordinates for Folium maps; extend for new cities
coords = {"CHICAGO": (41.8781, -87.6298), "OAKLAND": (37.8044, 122.2711), "NEW YORK CITY": (40.7831, 73.9712), 
 "NEW ORLEANS": (29.9511, 90.0715)}

## Prepping the Data

We need a few files to run this analysis: 

1. *CivicScape Risk Scores* - a CSV file
2. *Historical Crime Data* (three-year historical) - a CSV file
3. *Census Tracts Geojson* - a geographic file for census tracts
4. *Police Beats Geojson* - the regions to which police are deployed; this may be districts, potentially, or, if there aren't available geographic information, just feed in the same census tract file. 

**To Begin**: enter in the file paths for the data you wish you use.

In [55]:
census_tracts = 'geojsons/CensusTracts2000.geojson'
police_beats = 'geojsons/PoliceBeats.geojson'
riskscores_path = "../data/risk_assessments.csv"
hist_path = "../data/historical_grouped_3_year.csv"

Then, run the following two modules to load and clean the data.

In [56]:
riskscores = pd.read_csv(riskscores_path)
beats = pd.read_json(police_beats)
tracts = pd.read_json(census_tracts)
historical = pd.read_csv(hist_path)
notebook_support.data_check(riskscores, historical)


Data files were loaded! You're ready to start!



In [57]:
risk, hist = notebook_support.data_prep(riskscores, historical)
risk_keep, risk, hist = notebook_support.historical_prep(risk, hist)


Data prep done!
This file contains risk_assessments for the test date range 2014-10-24 00:00:00 through 2015-05-28 00:00:00.

Looks like your historical dataset doesn't have days, so we'll look at months instead.

The period from 2014-10-01 00:00:00 to 2014-10-24 00:00:00 will be left off the dataset because data are missing.

The period from 2015-05-28 00:00:00 to 2015-09-01 00:00:00 will be left off the dataset because data are missing.

The final overlapping period for analysis is: 2014-10-24 00:00:00 through 2015-05-28 00:00:00


## Select the Community and Timeframe

Now, select the community that you'd like to run the analysis for. We have a lot of pre-set jurisdictions based on where CivicScape already has data, listed below. If you'd like to add another city, please use the first code module above to insert the lat/long location. As long as your other data are in the correct format, this notebook should still work.

In [58]:
print("\nAvailable Jurisdictions for Analysis: {}\n".format(coords.keys()))


Available Jurisdictions for Analysis: ['NEW YORK CITY', 'NEW ORLEANS', 'OAKLAND', 'CHICAGO']



In [59]:
city = 'chicago'
map_cs_risk = folium.Map(location=coords[city.upper()])

Now, please select a timeframe. Keep in mind that if you want to do a deployment comparison, you should restrict the data to a specific timeframe and shift. We currently have this set to Fridays 12:00-8:00am for the period 11/01/2014 through 02/01/2015, but you should edit it for your data and interest. Ideally, it'd be best to look at a single shift time (e.g. 12-8am or whatever makes sense for your city).

In [60]:
hours = () # Enter the hour range to include on a 24-hour clock, e.g. (3, 15) returns 3AM-3PM
days = [] # Enter each day to include as a letter i.e. ['M', 'T', 'W', 'R', 'F', 'Sa', 'Su']
date_range = ('2014-11-01', '2014-11-08') # Enter the dates as YYYY-MM-DD, e.g. ('2014-10-24', '2015-05-28')
riskscores, historical = notebook_support.constrict_times(risk, historical=hist, hours=hours, days=days, date_range=date_range)


You've restricted the risk score data for the following dates:
    2014-11-01 00:00:00 through 2014-11-08 00:00:00


## Set Assumptions

This model makes a lot of assumptions in order to be able to quantify potential impact. 

### Assumption Set 1: How Officers are Deployed

We assume a two-tiered process for deployment. First, we assume that a subset of the officers available for a given shift are placed on regular patrol; that is, they are deployed to a standard patrol of a particular beat. We then assume that the remaining officers are available for areas which are high-risk: we call these 'floating' patrol officers. We need you to tell us 

1. How many officers are available.
2. What percent of those officers are for regular patrol, as a decimal.
3. How many floating officers to assign to a beat when it's considered risky.
4. The shift length (though, it's recommended you just restrict your time period to a single shift above). If your time period you selected doesn't cover a full shift, you'll get an error below. 

Enter these values below: 

In [61]:
patrol_officers_for_shift = 2500 # How many officers available
percent_regular_patrol = .60 # What percent are regular patrol
percent_floating_patrol = 1-percent_regular_patrol # We assume other officers are available for floating patrol
additional_when_highrisk_beat = 3 # How many officers to assign to high risk beats
shift_length = 8 # How long the shift is in hours

### Assumption Set 2: Probabilities

Like with the Model Data Practices notebook, we have to set thresholds for when to deploy officers. It may be a good idea to use the optimal threshold from that notebook for the CivicScape threshold, since that should reduce False Positives and False Negatives, but you can play around with it. We have set those values to some default values below.

In [62]:
threshold_for_CivicScape_assignment = .46
threshold_for_COMPSTAT_assignment = .66

It's also not guaranteed that a patrol is able to stop a crime from happening, even with additional officers. For that reason, we add some uncertainty; we don't assume that additional police *will* stop a crime in every instance, only that there is some probability for doing so. Below, we have set the likelihood a CivicScape floating patrol stops a crime and the likelihood a COMPSTAT floating patrol stops a crime when the risk score is high.

We kept these the same for CivicScape and COMPSTAT:

In [63]:
CS_likelihood_patrol_stops_crime = .60 # What is the likelihood a CivicScape floating patrol stops a crime?
COMP_likelihood_patrol_stops_crime = .60 # What is the likelihood a COMPSTAT floating patrol stops a crime?

Lastly, we do a couple of calculations and check to make sure that the time range isn't too short for the shift. If you get an error, it's because the hour period you selected above is too small.

In [64]:
officers_per_tract = percent_regular_patrol*patrol_officers_for_shift / len(tracts)
additional_when_highrisk_tract = additional_when_highrisk_beat / officers_per_tract

if hours != () and hours[1] - hours[0] < shift_length: 
    print("\nWARNING: Time range too short for shift length.\n")
    raise ValueError
    
assumptions = notebook_support.build_assumptions_dict(threshold_for_CivicScape_assignment, threshold_for_COMPSTAT_assignment, 
                                     CS_likelihood_patrol_stops_crime, threshold_for_COMPSTAT_assignment, 
                                     COMP_likelihood_patrol_stops_crime, additional_when_highrisk_tract, 
                                     percent_floating_patrol, patrol_officers_for_shift)

## Results

We estimate the following for COMPSTAT and CivicScape:

1. **Tracts at risk**: The number of tracts which are 'high risk' based on the threshold set above. 
2. **Additional Officers Necessary**: How many floating officers are needed to deploy to each beat containing a high-risk tract.
3. **Overtime Officers**: If there aren't enough floating officers available, how many overtime officers would need to come outside of their scheduled shift.
4. **Extra officers**: Officers that are availabile, but not needed for floating patrols.
4. **Estimated Crimes Stopped**: Given the probability that a floating patrol would stop a crime if sent to a high-risk beat, how many crimes are estimated to have been either deterred or halted.


In [65]:
hist_merged, risk_merged, ranks = notebook_support.get_paper_comparisons(risk_keep, riskscores, historical, show=False)

In [66]:
average_risk, hist_merged = notebook_support.police_deployment_analysis(hist_merged, risk_merged, assumptions)

('\n\nCrimeScape tracts at risk: ', 268)
('    Additional officers necessary: ', 268.0)
('    Overtime officers: ', 0)
('    Extra officers: ', 732.0)
    Estimated crimes stopped: 37.00
('\nCOMPSTAT tracts at risk: ', 268)
('    Additional officers necessary: ', 268.0)
('    Overtime officers: ', 0)
('    Extra officers: ', 732.0)
    Estimated crimes stopped: 31.00




Here is the CivicScape risk map for this time period:

In [None]:
map_civ = average_risk
map_civ['census_tra'] = map_civ['census_tra'].astype(str)
for i in range(len(map_civ)):
    while len(map_civ.loc[i, 'census_tra']) != 6: 
        map_civ.loc[i, 'census_tra'] = '0' + map_civ.loc[i, 'census_tra']

map_cs_risk.choropleth(geo_path=census_tracts, data=map_civ,
             columns=['census_tra', 'risk_assessment'],
             key_on='feature.properties.census_tra',
             fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Risk Score',
             reset=True)
# folium.GeoJson(open(police_beats), name='geojson').add_to(map_cs_risk)
map_cs_risk

And here is the corresponding map for the COMPSTAT scores:

In [68]:
map_hist = hist_merged
map_hist['census_tra'] = map_hist['census_tra'].astype(str)
for i in range(len(map_hist)):
    while len(map_hist.loc[i, 'census_tra']) != 6: 
        map_hist.loc[i, 'census_tra'] = '0' + map_hist.loc[i, 'census_tra']
map_comp_risk = folium.Map(location=coords[city.upper()])
map_comp_risk.choropleth(geo_path=census_tracts, data=map_hist,
             columns=['census_tra', 'risk_score'],
             key_on='feature.properties.census_tra',
             fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Risk Score',
             reset=True)
map_comp_risk