In [None]:
# modules for research report
from datascience import *
import numpy as np
import random
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

# module for YouTube video
from IPython.display import YouTubeVideo

# okpy config
from client.api.notebook import Notebook
ok = Notebook('police-scorecard-final-project.ok')
_ = ok.auth(inline=True)

# California Police Scorecard

This dataset combines data from [8 Can't Wait](https://8cantwait.org/) and [Campaign Zero](https://www.joincampaignzero.org/#vision)
on Californian police departments. It has been cleaned for your convenience: all missing values have been removed, and low-quality
observations and variables have been filtered out. A brief descriptive summary of the dataset is provided below.

**NB: You may not copy any public analyses of this dataset. Doing so will result in a zero.**

## Summary

> The **California Police Scorecard** features data from 2016-2018 regarding police department performance and policies in all 58 California counties (with information about multiple police departments per county). We split the data into three datasets: **an arrests dataset, an accountability dataset, and a demographic dataset**. 

> The **arrests data** (157 rows, 26 columns) quantifies police shootings, uses of force, arrests (with information about race) and homicides, budget, and more. The **accountability data** (157 rows, 17 columns) marks what policies are in place (1: yes, 0: no) for police departments to hold their police officers accountable. The **demographic data** (157 rows, 16 columns) provides race and economic information about the citizens in a police district to contextualize police behavior in their respective communities. 

> The dataset has also recently expanded to include information about Sheriffs’ Departments, the operator of county jails, which has the same metrics used to evaluate police departments. 

For a quick glance of how each police department is performing, **Campaign Zero has rated the departments in three areas of policing: police violence, police accountability, and approach to policing**. They have averaged these scores to provide an **overall score** (the higher, the better) that summarizes the departments’ performance in those three areas. Notably, **the majority of police departments have received an ‘F’ grade**, indicating excess levels of lethal force, injured civilians, incidence of racial bias, over policing, complaints sustained, etc. You can read more about the methodology and formula used to compute the scores here: https://policescorecard.org/about. 

Lastly, **we recommend exploring the Campaign Zero analysis - https://policescorecard.org/ - of the scorecard data to see what work has already been done and serve as inspiration for your own data project**. Campaign Zero’s mission is to end police brutality in America by implementing research-backed policy solutions. From Campaign Zero: ‘The scorecard is designed to help communities, researchers, police leaders and policy-makers take informed action to reduce police use of force and improve accountability and public safety in their jurisdictions.’ Furthermore, this is an opportunity to apply the skills you’ve learned in Data 8 to social justice topics if you’ve felt moved to tackle systemic racism and police brutality in America. 



## Data Description

This dataset consists of three tables stored in the `data` folder:
1. `police-demographic`: provides basic demographic and economic information about the people living within each police department’s jurisdiction
2. `police-arrests` provides information about the number of arrests and shootings that occured within ech police department
3. `police_accountability` provides information about the level of accountability for each police department as a result of county laws or police union agreements

A description of each table's variables is provided below:

`police-demographic`:
- `Region of California`: Region of California (Northern, Central, Southern) where the police district is located
- `Total Population`:	Total number of people living in the police district
- `White Population`:	Total number identified as White in the police district
- `Black Population`:	Total number identified as Black in the police district
- `Hispanic Population`:	Total number identified as Hispanic in the police district
- `Native American Population`:	Total number identified as Native American in the police district
- `Asian Population`:	Total number identified as Asian in the police district
- `Pacific Islander Population`:	Total number identified as Pacific Islander in the police district
- `Other Population`:	Total number not identified as any of the races above in the police district
- `Multiracial Population`:	Total number identified as multiracial in the police district
- `Percent HS Graduates in Jurisdiction`:	Percent of the population in the police district who graduated from high school
- `Unemployment Rate`:	Unemployment rate in 2018
- `Median Income`:	Median income in 2018
- `Poverty Rate`:	What percent of the population living in the police district is under the US poverty line (earning below \\$12,490 per year) in 2018?



`police-arrests`:
- `Overall Score`:	Average of Police Violence Score, Accountability Score, Approach to Policing. More information can be found in the links below.
- `Police Violence Score`:	Average of Percentile Less Lethal Force Used per Arrest, Percentile Deadly Force Used per Arrest, Percentile Unarmed Civilians Killed or Seriously Injured, Percentile Racial Bias in Arrests and Deadly Force. More information can be found in the links below.
- `Police Accountability Score`:	$\dfrac{2}{3} \cdot$ Percentile Civilian Complains Sustained + $\dfrac{1}{6} \cdot $ Percent Discrimination and Excessive Force Complains Sustained + $\dfrac{1}{6} \cdot$ Percent Criminal Complaints Sustained. More information can be found in the links below.
- `Approach to Policing Score`:	Average of Percentile Misdemeanor Arrests per Population, Percent Homicides Cleared. More information can be found in the links below.
- `People Killed or Seriously Injured by Police, 2016-2018`:	Total number of people killed or seriously injured by police between 2016 to 2018
- `Percent who Did Not Reportedly Have a Gun`:	Of all incidents were people were killed or seriously injured by police between 2016 to 2018, what percent reportedly did not have a gun?
- `Percent Who were Confirmed Unarmed`: Of all incidents were people were killed or seriously injured by police between 2016 to 2018, what percent were confirmed unarmed?
- `People Deadly Force Used Against Who were Perceived to Have a Gun`: Of all incidents where deadly force was used, how many people were perceived to have a gun?
- `People Deadly Force Used Against Who were Confirmed with a Gun`:	Of all incidents where deadly force was used, how many people were confirmed to have a gun?
- `2016 Police Shootings`: Number of police shootings in 2016
- `2017 Police Shootings`: Number of police shootings in 2017
- `2018 Police Shootings`: Number of police shootings in 2018
- `Total Arrests, 2016-2018`:	Total number of arrests made between 2016 to 2018
- `Homicides (2013-2018)`:	Total number of homicides from 2013 to 2018
- `percent_police_budget`:	The percent of the county’s total budget that goes towards the police
- `Log of police_budget`:	The police’s budget from 2018 in US\\$, after being log transformed. The actual police budget can be retrieved using the following formula: $\exp \left ( {\text{Log of police_budget}} \right )$
- `Asian/Pacific Islander Drug Possession Arrests, 2016`:	Total number of Asian/Pacific Islander drug possession arrests in 2016
- `Black Drug Possession Arrests, 2016`:	Total number of Black drug possession arrests in 2016
- `Hispanic Drug Possession Arrests, 2016`:	Total number of Hispanic drug possession arrests in 2016
- `Unknown Race Drug Possession Arrests, 2016`:	Total number of drug possession arrests in 2016 by people of an unknown race
- `Other Race Drug Possession Arrests, 2016`:	Total number of drug possession arrests in 2016 by people of other race
- `White Drug Possession Arrests, 2016`:	Total number of White drug possession arrests in 2016

`police-accountability`:
- `disqualifies_complaints`:	Does language in the police union contract disqualify misconduct complaints that are submitted too many days after an incident occurs or if an investigation takes too long to complete?
- `restricts_delays_interrogations`:	Does language in the police union contract prevent officers from being interrogated immediately after being involved in an incident or otherwise restricts how, when or where they can be interrogated?
- `gives_officers_unfair_access_to_information`:	Does language in the police union contract grant officers access to information civilians don’t get prior to being interrogated?
- `limits_oversight_discipline`:	Does language in the police union contract limit disciplinary consequences or otherwise hinder the capacity of civilian oversight structures or the media to hold police accountable?
- `requires_city_pay_for_misconduct`:	Does language in the police union contract require cities to pay costs related to police misconduct? This includes giving officers paid leave while under investigation or paying legal fees and/or the cost of settlements.
- `erases_misconduct_records`:	Does language in the police union contract prevent information on past misconduct investigations from being recorded or retained in an officer’s personal file?
- `requires_deescalation`:	Does language in the police’s use of force policies require officers to de-escalate situations by communicating with subjects, maintaining distance or otherwise eliminating the need to use force?
- `bans_chokeholds_and_strangleholds`:	Does language in the police’s use of force policies ban the use of chokeholds and strangleholds against civilians?
- `duty_to_intervene`:	Does language in the police’s use of force policies require officers to intervene and stop excessive force used by other officers and reports these incidents to their supervisor?
- `requires_warning_before_shooting`:	Does language in the police’s use of force policies require officers to give a verbal warning, when possible, before shooting a civilian?
- `restricts_shooting_at_moving_vehicles`:	Does language in the police’s use of force policies restrict officers from shooting at moving vehicles?
- `requires_comprehensive_reporting`:	Does language in the police’s use of force policies require officers to report each time they use force or threaten to use force against civilians?
- `requires_exhaust_all_other_means_before_shooting`:	Does language in the police’s use of force policies require officers to exhaust all other reasonable means before resorting to deadly force?
- `has_use_of_force_continuum`:	Does language in the police’s use of force policies have a force continuum that limits the types of force and/or weapons that can be used to respond to specific types of resistance?

Additionally, here are some important definitions referenced in the variable descriptions above:
- Less Lethal Force: The total number of uses of tasers, batons, projectiles, pepper spray, other weapons and strangleholds against civilians.

- Civilian complaints: the total number of complaints, by type of complaint, reported by civilians against law enforcement personnel from 2016-2018.

- Deadly Force: All firearms discharges and all use of force incidents resulting in the death or serious injury of a civilian.

Last, the sources used for the data and descriptions. It is recommended that if you are confused about any of the descriptions, then you should take a look at the links below:
- http://useofforceproject.org/?sa=D&ust=1553928833132000#project
- https://policescorecard.org/about
- https://www.checkthepolice.org/?sa=D&ust=1553928833132000#project

## Inspiration
A variety of exploratory analyses, hypothesis tests, and prediction problems can be tackled with this data. Here are a few ideas to get you started:

1. What is the trend over time for police shootings?
2. Are there geographic trends regarding violence and force used by police or sheriffs' departments?
3. Is there a statistically significant difference in the distribution of drug arrests for Blacks, Hispanics, Asians, Whites, etc?
4. What, if any, correlations exist between police policies/budget and civilians seriously injured or shot by officers in police departments? 
5. What is the impact of graduation and poverty rates on the amount or types of crime in a city?
6. Can you predict the overall policing score from police budgets, arrests, or amounts of deadly force used?

The data sources websites may also provide some inspiration:
- [8 Can't Wait](https://8cantwait.org/)
- [Campaign Zero](https://www.joincampaignzero.org/#vision)

The analysis of police data has fueled media campaigns advocating for policies that increase police accountability and reduce police
violence. We're excited to see what creativity you can bring to this growing discussion.

Don't forget to review the [Final Project Guidelines](https://docs.google.com/document/d/1NuHDYTdWGwhPNRov8Y3I8y6R7Rbyf-WDOfQwovD-gmw/edit?usp=sharing) for a complete list of requirements.

## Preview

The tables are loaded in the code cells below. Take some time to explore them!

In [None]:
#Load the arrests data. 
arrests = Table().read_table('data/police-arrests.csv')
print('The arrests dataset has {} rows and {} columns.'.format(arrests.num_rows, arrests.num_columns))
arrests.show(5)

In [None]:
#Load the accountability data
accountability = Table().read_table('data/police-accountability.csv')
print('The accountability dataset has {} rows and {} columns.'.format(accountability.num_rows, accountability.num_columns))
accountability.show(5)

In [None]:
#Load the demographic data
demographic = Table().read_table('data/police-demographic.csv')
print('The demographic dataset has {} rows and {} columns.'.format(demographic.num_rows, demographic.num_columns))
demographic.show(5)

<br>

# Research Report

## Introduction

*Replace this text with your introduction*

## Hypothesis Testing and Prediction Questions

**Please bold your hypothesis testing and prediction questions.**

*Replace this text with your hypothesis testing and prediction questions*

## Exploratory Data Analysis

**You may change the order of the plots and tables.**

**Quantitative Plot:**

In [None]:
# Use this cell to generate your quantitative plot
...

*Replace this text with an analysis of your plot*

**Qualitative Plot:**

In [None]:
# Use this cell to generate your qualitative plo# Use this cell to generate your qualitative plot
...

*Replace this text with an analysis of your plot*

**Aggregated Data Table:**

In [None]:
# Use this cell to generate your aggregated data table
...

*Replace this text with an analysis of your plot*

**Table Requiring a Join Operation:**

In [None]:
# Use this cell to join two datasets
...

*Replace this text with an analysis of your plot*

## Hypothesis Testing

**Do not copy code from demo notebooks or homeworks! You may split portions of your code into distinct cells. Also, be sure to
set a random seed so that your results are reproducible.**

In [None]:
# set the random seed so that results are reproducible
random.seed(1231)

...

## Prediction

**Be sure to set a random seed so that your results are reproducible.**

In [None]:
# set the random seed so that results are reproducible
random.seed(1231)

...

## Conclusion

*Replace this text with your conclusion*

## Presentation

*In this section, you'll need to provide a link to your video presentation. If you've uploaded your presentation to YouTube,
you can include the URL in the code below. We've provided an example to show you how to do this. Otherwise, provide the link
in a markdown cell.*

**Link:** *Replace this text with a link to your video presentation*

In [None]:
# Full Link: https://www.youtube.com/watch?v=BKgdDLrSC5s&feature=emb_logo
# Plug in string between "v=" and ""&feature":
YouTubeVideo('BKgdDLrSC5s')

# Submission

*Just as with the other assignments in this course, please submit your research notebook to Okpy. We suggest that you
submit often so that your progress is saved.*

In [None]:
# Run this line to submit your work
_ = ok.submit()