# Abstract
As the Black Lives Matter movement took root across the United States in the wake of George Floyd's death at the hands of Minneapolis police officers, the addition of yet another name in the list of black people killed by police sparked a renewed emphasis on racism and police brutality in the U.S.’s political and cultural conversation.

Using data sets from Kaggle and the US Census Bureau, this notebook aims to visualize the disproportionate killings of minorities by police within the tri-state area.

## Imports

In [58]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# Autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

## Data Cleaning

In [59]:
# Data import
pk = pd.read_csv('./data/police_killings.csv')
cd = pd.read_csv('./data/2018_census_5YE.csv')

# Data cleaning
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

cd_cols = ['Total Population', 'Male', 'Female', 'Hispanic', 'White',
           'Black', 'Native American', 'Asian', 'Pacific Islander', 'Unknown Race']

pk.replace(to_replace=r'^Unknown race$', value='Unknown Race', regex=True, inplace=True)
cd.replace(to_replace=r',', value='', regex=True, inplace=True)
cd.replace({'State': us_state_abbrev}, inplace=True)
cd = cd.set_index('State')
cd = cd.sort_index()
cd[cd_cols] = cd[cd_cols].apply(pd.to_numeric, errors = 'coerce', axis=1)

# Debug
#pk.head()
#cd.head()

## Feature Engineering

In [60]:
# Functions
def race_data(data, race):
    new_data = data[data["Victim's race"] == race]
    sort_data = new_data[["Victim's name", "State"]]
    data_grouped = sort_data.groupby('State')["Victim's name"].nunique()
    data_df = data_grouped.to_frame()
    data_df = data_df.rename(columns = {"Victim's name" : race + ' Police Killings'})
    return data_df

# Police killings total by race per state
hispanic_df = race_data(pk, 'Hispanic')
black_df = race_data(pk, 'Black')
white_df = race_data(pk, 'White')
asian_df = race_data(pk, 'Asian')
native_df = race_data(pk, 'Native American')
pacific_df = race_data(pk, 'Pacific Islander')
unknown_df = race_data(pk, 'Unknown Race')

# Police killings total across all races per state
pk_total = pk[["Victim's name", 'State']]
state_total = pk_total.groupby('State')["Victim's name"].nunique()
state_total_df = state_total.to_frame()
state_total_df = state_total_df.rename(columns = {"Victim's name": 'Total Police Killings'})

# Combining PK and population into df per race
hispanic_df['Total Police Killings'] = state_total_df['Total Police Killings']
hispanic_df[['Hispanic Population', 'Total State Population']] = cd[['Hispanic', 'Total Population']]
black_df['Total Police Killings'] = state_total_df['Total Police Killings']
black_df[['Black Population', 'Total State Population']] = cd[['Black', 'Total Population']]
white_df['Total Police Killings'] = state_total_df['Total Police Killings']
white_df[['White Population', 'Total State Population']] = cd[['White', 'Total Population']]
asian_df['Total Police Killings'] = state_total_df['Total Police Killings']
asian_df[['Asian Population', 'Total State Population']] = cd[['Asian', 'Total Population']]
native_df['Total Police Killings'] = state_total_df['Total Police Killings']
native_df[['Native American Population', 'Total State Population']] = cd[['Native American', 'Total Population']]
pacific_df['Total Police Killings'] = state_total_df['Total Police Killings']
pacific_df[['Pacific Islander Population', 'Total State Population']] = cd[['Pacific Islander', 'Total Population']]
unknown_df['Total Police Killings'] = state_total_df['Total Police Killings']
unknown_df[['Unknown Race Population', 'Total State Population']] = cd[['Unknown Race', 'Total Population']]

# Adding percentages for each state per race
hispanic_df['Hispanic PK as Percentage'] = 100 * (hispanic_df['Hispanic Police Killings'] / hispanic_df['Total Police Killings'])
hispanic_df['Hispanic Pop as Percentage'] = 100 * (hispanic_df['Hispanic Population'] / hispanic_df['Total State Population'])
black_df['Black PK as Percentage'] = 100 * (black_df['Black Police Killings'] / black_df['Total Police Killings'])
black_df['Black Pop as Percentage'] = 100 * (black_df['Black Population'] / black_df['Total State Population'])
white_df['White PK as Percentage'] = 100 * (white_df['White Police Killings'] / white_df['Total Police Killings'])
white_df['White Pop as Percentage'] = 100 * (white_df['White Population'] / white_df['Total State Population'])
asian_df['Asian PK as Percentage'] = 100 * (asian_df['Asian Police Killings'] / asian_df['Total Police Killings'])
asian_df['Asian Pop as Percentage'] = 100 * (asian_df['Asian Population'] / asian_df['Total State Population'])
native_df['Native American PK as Percentage'] = 100 * (native_df['Native American Police Killings'] / native_df['Total Police Killings'])
native_df['Native American Pop as Percentage'] = 100 * (native_df['Native American Population'] / native_df['Total State Population'])
pacific_df['Pacific Islander PK as Percentage'] = 100 * (pacific_df['Pacific Islander Police Killings'] / pacific_df['Total Police Killings'])
pacific_df['Pacific Islander Pop as Percentage'] = 100 * (pacific_df['Pacific Islander Population'] / pacific_df['Total State Population'])
unknown_df['Unknown Race PK as Percentage'] = 100 * (unknown_df['Unknown Race Police Killings'] / unknown_df['Total Police Killings'])
unknown_df['Unknown Race Pop as Percentage'] = 100 * (unknown_df['Unknown Race Population'] / unknown_df['Total State Population'])

# Combining everything into single df
unified_perc_df = pd.DataFrame([hispanic_df['Hispanic PK as Percentage'], hispanic_df['Hispanic Pop as Percentage'],
                               black_df['Black PK as Percentage'], black_df['Black Pop as Percentage'],
                               white_df['White PK as Percentage'], white_df['White Pop as Percentage'],
                               asian_df['Asian PK as Percentage'], asian_df['Asian Pop as Percentage'],
                               native_df['Native American PK as Percentage'], native_df['Native American Pop as Percentage'],
                               pacific_df['Pacific Islander PK as Percentage'], pacific_df['Pacific Islander Pop as Percentage'],
                               unknown_df['Unknown Race PK as Percentage'], unknown_df['Unknown Race Pop as Percentage']])

unified_perc_df = unified_perc_df.transpose()
unified_perc_df = unified_perc_df.fillna(0)

# Debug
#hispanic_df.head()
#black_df.head()
#white_df.head()
#asian_df.head()
#native_df.head()
#pacific_df.head()
#unknown_df.head()
#print(pk_total)
#print(state_total)
#state_total_df.head()
#unified_perc_df.head()

# Analysis/Modeling


In [77]:
# Functions
state_sum = pk.groupby(['State', "Victim's race"]).count()
def create_pie_chart(input_data, state):
    labels = input_data.loc[state]["Victim's name"].index
    values = input_data.loc[state]["Victim's name"]
    trace = go.Pie(labels = labels, values = values, hole = 0.4, pull = [0, 0.2, 0, 0, 0, 0, 0])
    data = [trace]
    fig = go.Figure(data = data)
    fig.update_layout(
    title = {
        'text': state + " Police Killings",
        'y': 0.9,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
    iplot(fig)

def create_pie_charts(input_data, state):
    labels = 'Hispanic', 'Black', 'White', 'Asian', 'Native American', 'Pacific Islander', 'Unknown Race'
    fig = make_subplots(rows = 1, cols = 2, specs = [[{'type':'domain'}, {'type':'domain'}]],
                       subplot_titles = ('Police Murder Rate', 'Total Population'))
    fig.add_trace(go.Pie(labels = labels, values = input_data[[0, 2, 4, 6, 8, 10, 12]],
                         name = "Police Killings", pull = [0, 0.2, 0, 0, 0, 0, 0]), 1, 1)
    fig.add_trace(go.Pie(labels = labels, values = input_data[[1, 3, 5, 7, 9, 11, 13]],
                         name = "Total Population", pull = [0, 0.2, 0, 0, 0, 0, 0]), 1, 2)
    fig.update_traces(hole = 0.4, hoverinfo = "label+percent+name")
    fig.update_layout(
    title = {
        'text': state,
        'y': 0.9,
        'x': 0.45,
        'xanchor': 'center',
        'yanchor': 'top'})
    fig.show()

## Initial Analysis

In [78]:
create_pie_chart(state_sum, 'NJ')
create_pie_chart(state_sum, 'NY')
create_pie_chart(state_sum, 'CT')

Although initial analysis show disparity in police deaths of people of color, these statistics don't mean much without a reference to the distribution of race in a state's population.

## Accounting for Distribution of Race Per State

In [79]:
create_pie_charts(unified_perc_df.loc['NJ'], 'New Jersey')
create_pie_charts(unified_perc_df.loc['NY'], 'New York')
create_pie_charts(unified_perc_df.loc['CT'], 'Connecticut')

When compared to the race distributions in the total population, it's shown that minorities, specifically black Americans, are disproportionately affected by police violence in NJ and NYC. Connecticut seems to be the exception; the police murder rate is proportionate the the distribution of race in the state's population.

# Results
Show graphs and stats here

# Conclusions and Next Steps
Summarize findings here