# Final Project: Police Killings in the US

## Introduction
Police violence has been a hot button issue in the US for a while now. This dataset includes all recorded incidents of police violence that resulted in the deaths of the victims in the first half of 2015. It was created by The Guardian in an effort to finally begin thorough documentation of these killings as existing documentation is lackluster and incomplete. You can read more about their project here: https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/about-the-counted. This specific dataset was merged with census data by 538 so that more meaningful insights can be extracted. Note that the link provided above also contains the download for a more up-to-date data set, but misses the attributes detailing county statistics. For more information regarding the variables, the descriptions can be found on the 538 data repository https://github.com/fivethirtyeight/data/tree/master/police-killings

## Questions to Consider
1. What are some effective visuals you can make to detail patterns of police killings?
2. Are there any states or counties that stand out in terms of numbers killed? 
3. Can you determine similarities among counties/states with high amounts of killings?
4. What pictures can you paint about the victims?
5. Could you build a prediction model for counties to determine what race is more likely to be a victim of police violence?
6. Can you use PCA to determine the most important factors behind police killings?
7. What domain knowledge is needed here? How dangerous can bias be as we approach these problems?
8. Is there anything else you would add on to the dataset to extract even more information? 
9. Are there any shortcomings in this dataset that you feel fails to address other important questions relating to the topic?

In [6]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt 

police_killings = pd.read_csv("police_killings.txt", delimiter = ",", encoding = "ISO-8859-1")
police_killings.iloc[:10,]

Unnamed: 0,name,age,gender,raceethnicity,month,day,year,streetaddress,city,state,...,share_hispanic,p_income,h_income,county_income,comp_income,county_bucket,nat_bucket,pov,urate,college
0,A'donte Washington,16,Male,Black,February,23,2015,Clearview Ln,Millbrook,AL,...,5.6,28375,51367.0,54766,0.937936,3.0,3.0,14.1,0.097686,0.16851
1,Aaron Rutledge,27,Male,White,April,2,2015,300 block Iris Park Dr,Pineville,LA,...,0.5,14678,27972.0,40930,0.683411,2.0,1.0,28.8,0.065724,0.111402
2,Aaron Siler,26,Male,White,March,14,2015,22nd Ave and 56th St,Kenosha,WI,...,16.8,25286,45365.0,54930,0.825869,2.0,3.0,14.6,0.166293,0.147312
3,Aaron Valdez,25,Male,Hispanic/Latino,March,11,2015,3000 Seminole Ave,South Gate,CA,...,98.8,17194,48295.0,55909,0.863814,3.0,3.0,11.7,0.124827,0.050133
4,Adam Jovicic,29,Male,White,March,19,2015,364 Hiwood Ave,Munroe Falls,OH,...,1.7,33954,68785.0,49669,1.384868,5.0,4.0,1.9,0.06355,0.403954
5,Adam Reinhart,29,Male,White,March,7,2015,18th St and Palm Ln,Phoenix,AZ,...,79.0,15523,20833.0,53596,0.388704,1.0,1.0,58.0,0.073651,0.102955
6,Adrian Hernandez,22,Male,Hispanic/Latino,March,27,2015,4000 Union Ave,Bakersfield,CA,...,44.2,25949,58068.0,48552,1.195996,4.0,4.0,17.2,0.131461,0.203801
7,Adrian Solis,35,Male,Hispanic/Latino,March,26,2015,1500 Bayview Ave,Wilmington,CA,...,84.1,25043,66543.0,55909,1.190202,4.0,4.0,12.2,0.094347,0.090438
8,Alan Alverson,44,Male,White,January,28,2015,Pickett Runn Rd,Sunset,TX,...,66.3,16778,30391.0,38310,0.793292,2.0,1.0,37.7,0.140833,0.047601
9,Alan James,31,Male,White,February,7,2015,200 Abbie St SE,Wyoming,MI,...,26.5,22005,44553.0,51667,0.862311,3.0,2.0,18.4,0.174167,0.102692


In [7]:
police_killings.describe()

Unnamed: 0,day,year,latitude,longitude,state_fp,county_fp,tract_ce,geo_id,county_id,pop,h_income,county_income,comp_income,county_bucket,nat_bucket,urate,college
count,467.0,467.0,467.0,467.0,467.0,467.0,467.0,467.0,467.0,467.0,465.0,467.0,465.0,440.0,465.0,465.0,465.0
mean,15.830835,2015.0,36.403224,-96.972666,25.342612,91.584582,236936.614561,25434430000.0,25434.197002,4783.719486,46627.182796,52527.331906,0.895913,2.497727,2.496774,0.117399,0.220217
std,8.65897,0.0,5.193357,16.953842,16.766458,110.185129,341262.721715,16801400000.0,16801.379755,2374.565749,20511.194907,12948.263811,0.333584,1.393115,1.298412,0.069175,0.158347
min,1.0,2015.0,19.915194,-159.6427,1.0,1.0,100.0,1003010000.0,1003.0,0.0,10290.0,22545.0,0.184049,1.0,1.0,0.011335,0.013547
25%,8.0,2015.0,33.33524,-111.954636,8.0,29.0,5201.5,8022008000.0,8022.0,3357.5,32625.0,43804.0,0.645365,1.0,1.0,0.068592,0.106167
50%,16.0,2015.0,35.769779,-94.761902,24.0,63.0,40200.0,24033800000.0,24033.0,4447.0,42759.0,50856.0,0.869612,2.0,2.0,0.105181,0.169544
75%,23.0,2015.0,39.937452,-82.961582,40.0,111.0,378450.0,40112470000.0,40112.0,5815.5,56190.0,56832.0,1.081454,4.0,3.0,0.140833,0.284542
max,31.0,2015.0,61.218408,-68.100007,56.0,740.0,980000.0,56005000000.0,56005.0,26826.0,142500.0,110292.0,2.865216,5.0,5.0,0.507614,0.82807


In [8]:
police_killings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 467 entries, 0 to 466
Data columns (total 34 columns):
name                    467 non-null object
age                     467 non-null object
gender                  467 non-null object
raceethnicity           467 non-null object
month                   467 non-null object
day                     467 non-null int64
year                    467 non-null int64
streetaddress           463 non-null object
city                    467 non-null object
state                   467 non-null object
latitude                467 non-null float64
longitude               467 non-null float64
state_fp                467 non-null int64
county_fp               467 non-null int64
tract_ce                467 non-null int64
geo_id                  467 non-null int64
county_id               467 non-null int64
namelsad                467 non-null object
lawenforcementagency    467 non-null object
cause                   467 non-null object
armed               