# Abstract
As the Black Lives Matter movement took root across the United States in the wake of George Floyd's death at the hands of Minneapolis police officers, the addition of yet another name in the list of black people killed by police sparked a renewed emphasis on racism and police brutality in the U.S.’s political and cultural conversation.

Using data sets from Kaggle and the US Census Bureau, this notebook aims to visualize the disproportionate killings of minorities by police within the tri-state area.

## Imports

In [1]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# Autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

## Data Cleaning

In [2]:
# Data import
pk = pd.read_csv('./data/police_killings.csv')
cd = pd.read_csv('./data/2018_census_5YE.csv')

# Data cleaning
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

cd_cols = ['Total Population', 'Male', 'Female', 'Hispanic', 'White',
           'Black', 'Native American', 'Asian', 'Pacific Islander', 'Unknown Race']

pk.replace(to_replace=r'^Unknown race$', value='Unknown Race', regex=True, inplace=True)
cd.replace(to_replace=r',', value='', regex=True, inplace=True)
cd.replace({'State': us_state_abbrev}, inplace=True)
cd = cd.set_index('State')
cd = cd.sort_index()
cd[cd_cols] = cd[cd_cols].apply(pd.to_numeric, errors = 'coerce', axis=1)

# Debug
#pk.head()
#cd.head()

## Feature Engineering

In [3]:
# Functions
def race_data(data, race):
    new_data = data[data["Victim's race"] == race]
    sort_data = new_data[["Victim's name", "State"]]
    data_grouped = sort_data.groupby('State')["Victim's name"].nunique()
    data_df = data_grouped.to_frame()
    data_df = data_df.rename(columns = {"Victim's name" : race + ' Police Killings'})
    return data_df

def race_data_state(data, race):
    new_data = data[data["Victim's race"] == race]
    sort_data = new_data[["Victim's name", "State"]]
    data_grouped = sort_data.groupby('State')["Victim's name"].nunique()
    data_df = data_grouped.to_frame()
    data_df = data_df.reset_index()
    data_df = data_df.rename(columns = {'index' : 'State', "Victim's name" : 'Count'})
    return data_df

def per_100K(data, race):
    data = 100000 * (data[race + ' Police Killings'] / data[race + ' Population'])
    return data
    
# Police killings total by race per state
hispanic_df = race_data(pk, 'Hispanic')
black_df = race_data(pk, 'Black')
white_df = race_data(pk, 'White')
asian_df = race_data(pk, 'Asian')
native_df = race_data(pk, 'Native American')
pacific_df = race_data(pk, 'Pacific Islander')
unknown_df = race_data(pk, 'Unknown Race')

# Police killings total across all races per state
pk_total = pk[["Victim's name", 'State']]
state_total = pk_total.groupby('State')["Victim's name"].nunique()
state_total_df = state_total.to_frame()
state_total_df = state_total_df.rename(columns = {"Victim's name": 'Total Police Killings'})

# Combining PK and population into df per race
hispanic_df['Total Police Killings'] = state_total_df['Total Police Killings']
hispanic_df[['Hispanic Population', 'Total State Population']] = cd[['Hispanic', 'Total Population']]
black_df['Total Police Killings'] = state_total_df['Total Police Killings']
black_df[['Black Population', 'Total State Population']] = cd[['Black', 'Total Population']]
white_df['Total Police Killings'] = state_total_df['Total Police Killings']
white_df[['White Population', 'Total State Population']] = cd[['White', 'Total Population']]
asian_df['Total Police Killings'] = state_total_df['Total Police Killings']
asian_df[['Asian Population', 'Total State Population']] = cd[['Asian', 'Total Population']]
native_df['Total Police Killings'] = state_total_df['Total Police Killings']
native_df[['Native American Population', 'Total State Population']] = cd[['Native American', 'Total Population']]
pacific_df['Total Police Killings'] = state_total_df['Total Police Killings']
pacific_df[['Pacific Islander Population', 'Total State Population']] = cd[['Pacific Islander', 'Total Population']]
unknown_df['Total Police Killings'] = state_total_df['Total Police Killings']
unknown_df[['Unknown Race Population', 'Total State Population']] = cd[['Unknown Race', 'Total Population']]

# Adding percentages for each state per race
hispanic_df['Hispanic PK as Percentage'] = 100 * (hispanic_df['Hispanic Police Killings'] / hispanic_df['Total Police Killings'])
hispanic_df['Hispanic Pop as Percentage'] = 100 * (hispanic_df['Hispanic Population'] / hispanic_df['Total State Population'])
black_df['Black PK as Percentage'] = 100 * (black_df['Black Police Killings'] / black_df['Total Police Killings'])
black_df['Black Pop as Percentage'] = 100 * (black_df['Black Population'] / black_df['Total State Population'])
white_df['White PK as Percentage'] = 100 * (white_df['White Police Killings'] / white_df['Total Police Killings'])
white_df['White Pop as Percentage'] = 100 * (white_df['White Population'] / white_df['Total State Population'])
asian_df['Asian PK as Percentage'] = 100 * (asian_df['Asian Police Killings'] / asian_df['Total Police Killings'])
asian_df['Asian Pop as Percentage'] = 100 * (asian_df['Asian Population'] / asian_df['Total State Population'])
native_df['Native American PK as Percentage'] = 100 * (native_df['Native American Police Killings'] / native_df['Total Police Killings'])
native_df['Native American Pop as Percentage'] = 100 * (native_df['Native American Population'] / native_df['Total State Population'])
pacific_df['Pacific Islander PK as Percentage'] = 100 * (pacific_df['Pacific Islander Police Killings'] / pacific_df['Total Police Killings'])
pacific_df['Pacific Islander Pop as Percentage'] = 100 * (pacific_df['Pacific Islander Population'] / pacific_df['Total State Population'])
unknown_df['Unknown Race PK as Percentage'] = 100 * (unknown_df['Unknown Race Police Killings'] / unknown_df['Total Police Killings'])
unknown_df['Unknown Race Pop as Percentage'] = 100 * (unknown_df['Unknown Race Population'] / unknown_df['Total State Population'])

# Combining everything into single df
unified_perc_df = pd.DataFrame([hispanic_df['Hispanic PK as Percentage'], hispanic_df['Hispanic Pop as Percentage'],
                               black_df['Black PK as Percentage'], black_df['Black Pop as Percentage'],
                               white_df['White PK as Percentage'], white_df['White Pop as Percentage'],
                               asian_df['Asian PK as Percentage'], asian_df['Asian Pop as Percentage'],
                               native_df['Native American PK as Percentage'], native_df['Native American Pop as Percentage'],
                               pacific_df['Pacific Islander PK as Percentage'], pacific_df['Pacific Islander Pop as Percentage'],
                               unknown_df['Unknown Race PK as Percentage'], unknown_df['Unknown Race Pop as Percentage']])

unified_perc_df = unified_perc_df.transpose()
unified_perc_df = unified_perc_df.fillna(0)

# Police killings by state for choropleth map
hispanic_state = race_data_state(pk, 'Hispanic')
black_state = race_data_state(pk, 'Black')
white_state = race_data_state(pk, 'White')
asian_state = race_data_state(pk, 'Asian')
native_state = race_data_state(pk, 'Native American')
pacific_state = race_data_state(pk, 'Pacific Islander')
unknown_state = race_data_state(pk, 'Unknown Race')

# Convert statistics to traditional per 100k population per race
hispanic_100K = per_100K(hispanic_df, 'Hispanic')
black_100K = per_100K(black_df, 'Black')
white_100K = per_100K(white_df, 'White')
asian_100K = per_100K(asian_df, 'Asian')
native_100K = 10 * (native_df['Native American Police Killings'] / native_df['Total Police Killings'])
pacific_100K = 10 * (pacific_df['Pacific Islander Police Killings'] / pacific_df['Total Police Killings'])
unknown_100K = 10 * (unknown_df['Unknown Race Police Killings'] / unknown_df['Total Police Killings'])

# Combine into single per 100k df
unified_100K_df = pd.DataFrame([hispanic_100K, black_100K, white_100K, asian_100K,
                                native_100K, pacific_100K, unknown_100K]).transpose()
unified_100K_df = unified_100K_df.reset_index()
unified_100K_df = unified_100K_df.rename(columns = {'index': 'State', 0: 'Hispanic', 1: 'Black', 2:'White', 3:'Asian',
                                                    4: 'Native American', 5:'Pacific Islander', 6:'Unknown Race'})
unified_100K_df = unified_100K_df.fillna(0)

# Debug
#hispanic_df.head()
#black_df.head()
#white_df.head()
#asian_df.head()
#native_df.head()
#pacific_df.head()
#unknown_df.head()
#print(pk_total)
#print(state_total)
#state_total_df.head()
#unified_perc_df.head()
#hispanic_state.head()
#unified_100K_df.head()

# Analysis/Modeling


In [4]:
# Functions
state_sum = pk.groupby(['State', "Victim's race"]).count()
def create_pie_chart(input_data, state):
    labels = input_data.loc[state]["Victim's name"].index
    values = input_data.loc[state]["Victim's name"]
    trace = go.Pie(labels = labels, values = values, hole = 0.4, pull = [0, 0.2, 0, 0, 0, 0, 0])
    data = [trace]
    fig = go.Figure(data = data)
    fig.update_layout(
    title = {
        'text': state + " Police Killings",
        'y': 0.9,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
    iplot(fig)

def create_pie_charts(input_data, state):
    labels = 'Hispanic', 'Black', 'White', 'Asian', 'Native American', 'Pacific Islander', 'Unknown Race'
    fig = make_subplots(rows = 1, cols = 2, specs = [[{'type':'domain'}, {'type':'domain'}]],
                       subplot_titles = ('Police Murder Rate', 'Total Population'))
    fig.add_trace(go.Pie(labels = labels, values = input_data[[0, 2, 4, 6, 8, 10, 12]],
                         name = "Police Killings", pull = [0, 0.2, 0, 0, 0, 0, 0]), 1, 1)
    fig.add_trace(go.Pie(labels = labels, values = input_data[[1, 3, 5, 7, 9, 11, 13]],
                         name = "Total Population", pull = [0, 0.2, 0, 0, 0, 0, 0]), 1, 2)
    fig.update_traces(hole = 0.4, hoverinfo = "label+percent+name")
    fig.update_layout(
    title = {
        'text': state,
        'y': 0.9,
        'x': 0.45,
        'xanchor': 'center',
        'yanchor': 'top'})
    fig.show()
    
def create_choropleth_map(input_data, race):
    fig = go.Figure(go.Choropleth(
        locations = input_data['State'],
        z = input_data['Count'].astype(float),
        locationmode = 'USA-states',
        colorscale = 'Reds',
        autocolorscale = False,
        text = input_data['State'],
        marker_line_color = 'white',
        colorbar_title = 'Fatalities', showscale = True))
    fig.update_layout(
        title_text='US Police Killings By Race: ' + race,
        title_x=0.5,
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True, # lakes
        lakecolor='rgb(255, 255, 255)'))
    fig.update_layout(template="plotly_dark")
    fig.show()

def create_bar_chart(input_data):
    fig = go.Figure()
    fig.add_trace(go.Bar(x = input_data['Hispanic'],
                         y = input_data['State'],
                         name = 'Hispanic',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['Black'],
                         y = input_data['State'],
                         name = 'Black',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['White'],
                         y = input_data['State'],
                         name = 'White',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['Asian'],
                         y = input_data['State'],
                         name = 'Asian',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['Native American'],
                         y = input_data['State'],
                         name = 'Native American',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['Pacific Islander'],
                         y = input_data['State'],
                         name = 'Pacific Islander',
                         orientation = 'h'))
    fig.add_trace(go.Bar(x = input_data['Unknown Race'],
                         y = input_data['State'],
                         name = 'Unknown Race',
                         orientation = 'h'))
    fig.update_layout(
        title = 'Police Killings By Race Per 100K',
        xaxis = dict(
            tickfont_size = 14
        ),
        yaxis = dict(
            title = 'State',
            titlefont_size = 16,
            type = 'category',
            tickmode = 'linear',
        ),
        legend = dict(
            bgcolor='rgba(255, 255, 255, 0)',
            bordercolor='rgba(255, 255, 255, 0)'
        ),
        barmode='stack',
        bargap = 0.15,
        width=1000,
        height=1000
    )
    fig.update_layout(yaxis={'categoryorder':'array',
               'categoryarray':["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]})
    fig.update_yaxes(autorange="reversed")
    fig.update_yaxes(automargin=True)
    fig.show()

## Initial Analysis

In [5]:
create_pie_chart(state_sum, 'NJ')
create_pie_chart(state_sum, 'NY')
create_pie_chart(state_sum, 'CT')

Although initial analysis show disparity in police deaths of people of color, these statistics don't mean much without a reference to the distribution of race in a state's population.

## Accounting for Distribution of Race Per State

In [6]:
create_pie_charts(unified_perc_df.loc['NJ'], 'New Jersey')
create_pie_charts(unified_perc_df.loc['NY'], 'New York')
create_pie_charts(unified_perc_df.loc['CT'], 'Connecticut')

### Observation
When compared to the race distributions in the total population, it's shown that minorities, specifically black Americans, are disproportionately affected by police violence in NJ and NYC. Connecticut seems to be the exception; the police murder rate is proportionate the the distribution of race in the state's population.

## Geography of Police Killings

In [7]:
create_choropleth_map(hispanic_state, 'Hispanic')

In [8]:
create_choropleth_map(black_state, 'Black')

In [9]:
create_choropleth_map(white_state, 'White')

In [10]:
create_choropleth_map(asian_state, 'Asian')

In [11]:
create_choropleth_map(native_state, 'Native American')

In [12]:
create_choropleth_map(pacific_state, 'Pacific Islander')

In [13]:
create_choropleth_map(unknown_state, 'Unknown Race')

### Observation
These choropleth maps show an interesting pattern when it comes to police violence. With the exception of Native American and Pacific Islander deaths (which are limited to only a few states due to their low population numbers), three states seem to be the culprit in police violence across all races: California, Texas, and Florida. 

## Unified Police Killings Across All States By Race Per 100K

In [14]:
create_bar_chart(unified_100K_df)

### Observation
This graph visualizes the rate of mortality rate per race across all states, which is measured by the standard of deaths per every 100K. However, due to the low population numbers of Native Americans, Pacific Islanders, and Unknown Race, I had to measure their death rates out of every 10 people to prevent giant spikes in the graph. With this configuration, each state shows disproportionate numbers of Black and Hispanic deaths to police violence when compared to other races.

# Conclusions

Although racism certainly plays a role in the disproportionate murder of minorities by police, there are institutional problems with US law enforcement as a whole that enable such violence.

According to Amnesty International:
* Not one US state complies with international law and standards on the use of lethal force by police.
* In the USA, the majority of deaths at the hands of police are the result of an officer using a firearm.
* In many cases, officers have shot people multiple times, indicating use of force that was neither necessary nor proportionate. Michael Brown, for instance, who was unarmed, was shot six times.
* According to Mapping Police Violence, in 2019 Black people were 24% of those killed by the police, despite being only 13% of the population.
* A 1996 law authorized the US Department of Defence to provide surplus equipment to law enforcement agencies. This has resulted in police having equipment designed for military use to be deployed at protests.

Without reform on an institutional level, those who fall victim to police violence will rarely find justice and those responsible will never be held accountable.

# Sources
police_killings.csv : https://www.kaggle.com/jpmiller/police-violence-in-the-us

2018 Census Data: https://data.census.gov/cedsci/table?q=United%20States&tid=ACSDP5Y2018.DP05&hidePreview=false

Amnesty International: https://www.amnesty.org/en/what-we-do/police-brutality/