<i>   </i>

# Bias in Fatal Police Encounters throughout the U.S.

The New York Times dubbed the Black Lives Matter (BLM) movements as "potentially the largest movement in U.S. history" (NYT, 2020). The movement rapidly started gaining pace and new members after the murder of George Floyd on May 25, 2020. The murder of George Floyd came as a breaking point for the black community in the U.S., but the issue the BLM movement points towards is much larger: Systematic racial bias of police in the U.S. towards people of colour. In the following work the writer will attempt to illustrate the extend of the problem throughout the U.S. and disentangle factors that influence police bias.

Note: New datasets are described in <b><font color='DarkGoldenRod'>this colour</font></b>.<br>
Note: Make sure that "your_notebook_url" in the following cell is set to the correct address.

In [48]:
your_notebook_url = "http://localhost:8891"

# 1. Data Processing

<i>Here an overview of the stages in "Data Processing": </i>

![Here an overview of the stages in the "Data Processing" step.](mermaid_processing.png "Here an overview of the stages in the 'Data Processing' step.")



## Fatal Encounters
To understand the issue, we first need an account of fatal police interactions similar to the one George Floyd got caught up in.

<b><font color='DarkGoldenRod'>fatal_encounters.csv:</font></b>
The fatal encounters dataset lists details about over 30.799 fatal encounters with the police. Details include, race, age, gender, location, description of the encounter, name of the suspect, location of the encounter and others.

In [49]:
from scripts.show import show_file
show_file('fatal_encounter.csv',separator =';').head(2)

Unnamed: 0,Unique ID,Name,Age,Gender,Race,Race with imputations,Imputation probability,URL of image (PLS NO HOTLINKS),Date of injury resulting in death (month/day/year),Location of injury (address),...,URL Temp,Brief description,"Dispositions/Exclusions INTERNAL USE, NOT FOR ANALYSIS",Intended use of force (Developing),Supporting document link,"Foreknowledge of mental illness? INTERNAL USE, NOT FOR ANALYSIS",Unnamed: 32,Unnamed: 33,Unique ID formula,Unique identifier (redundant)
0,25747.0,Mark A. Horton,21.0,Male,African-American/Black,African-American/Black,Not imputed,,01.01.00,Davison Freeway,...,,Two Detroit men killed when their car crashed ...,Unreported,Pursuit,https://drive.google.com/file/d/1-nK-RohgiM-tZ...,No,,,,25747.0
1,25748.0,Phillip A. Blurbridge,19.0,Male,African-American/Black,African-American/Black,Not imputed,,01.01.00,Davison Freeway,...,,Two Detroit men killed when their car crashed ...,Unreported,Pursuit,https://drive.google.com/file/d/1-nK-RohgiM-tZ...,No,,,,25748.0


As mentioned in the introduction, we are interested in factors that influence police bias in such encounters. Lets start by looking only at Age, Race, State and Gender. This will also allow us to investigate how the issue compares throughout the U.S.

### <b> First, let's have a look at race: </b>

In [50]:
from scripts import preprocessing as pp
pp.fatal_encounters('Race').head(5)

Unnamed: 0,State,Abbrv,Race,Percent_FE
0,Alabama,AL,Asian,0.151745
1,Alabama,AL,Black,26.858877
2,Alabama,AL,Other,37.936267
3,Alabama,AL,White,35.053111
4,Alaska,AK,Asian,3.252033


These numbers alone do not tell us much about bias yet. Imagine a state X, where there are twice as many white people than people of colour. Now lets assume that there was an equal amount of fatal encounters for both white and people of colour.
The table above would indicate 50/50, and we would assume no bias, although there are people of colour are twice as likely to end up in a fatal police encounter.

This is why we need information about the population as well.

<b>Lets load some statewise population data and select the races that are mentioned in the fatal encounters dataset: </b>

<b><font color='DarkGoldenRod'>Racebystateperc.csv:</font></b>
This dataset lists population percentages for several relevant races for every state in the United States. Races listed include: White, Black, Other, Asian, Hawaiian, and Indian.


In [51]:
show_file('Racebystateperc.csv', separator = ',',columns=['State', 'WhiteTotalPerc', 'BlackTotalPerc']).head(3)

Unnamed: 0,State,WhiteTotalPerc,BlackTotalPerc
0,Alabama,0.6809,0.2664
1,Alaska,0.6458,0.0328
2,Arizona,0.7722,0.045


This dataset gives us the what we need: population distibution for race in percent for every state. 

There are two things which we have to change here:  
1. The population percentages per race should add up to one within one state. 
    - We can do this by adding the race-identifier "other" to remaining people in every state.
    
    
2. This format is not practical for further processing. We need a "Race" column and a column that holds 
     the propulation proportions.
     - Lets stack the dataframe by state and race for this

   

In [52]:
pp.race_pop().head()

Unnamed: 0,State,Race,Proportion_pop
0,Alabama,White,68.09
1,Alabama,Black,26.64
2,Alabama,Other,5.27
3,Alaska,White,64.58
4,Alaska,Black,3.28


### Estimating Bias

We now have all the information we need to calculate police bias towards race in fatal encounters.

If there is no police bias we would expect the same distribution of race in the fatal encounters as in the population of a state. To illustrate let's imagine the state X from earlier. In this state 66% of the population is white. Given a no police bias, 66% of the fatal encounters would also be white people. 

If say only 10% of the fatal encounters were white people that would be a deviation of <b> -56% </b> from the expected value.

Let's call this number <b>bias</b>:
- <b>negative bias</b> represents lower likelihood than expected to end up in a fatal encounter.
- <b>positive bias</b> represents higher liklihood than expected to end up in a fatal encounter.

<b> Let's see how this looks in our dataset: </b>


In [53]:
import scripts.concatenating as cc

fe_race = pp.fatal_encounters(filter_var = 'Race')
population_race = pp.race_pop()

racial_bias = cc.bias_per_state(fatal_encounters_df=fe_race, population_percentage_df = population_race, filter_var = 'Race')
racial_bias.head(6)

Unnamed: 0,State,Abbrv,Race,Percent_FE,Proportion_pop,Bias
0,Alabama,AL,Black,26.858877,26.64,0.218877
1,Alabama,AL,Other,37.936267,5.27,32.666267
2,Alabama,AL,White,35.053111,68.09,-33.036889
3,Alaska,AK,Black,5.691057,3.28,2.411057
4,Alaska,AK,Other,57.723577,32.14,25.583577
5,Alaska,AK,White,33.333333,64.58,-31.246667


There seems to be a bias in favor of white people in both states displayed here.


Race seems to be a factor in infuencing police bias, but what about others?
Earlier on we mentioned that we were interested in multiple factors and how they influence police bias throughout the U.S.

### <b>Let's look at gender next:</b>

In [54]:
fe_gender = pp.fatal_encounters('Gender')
fe_gender.head(4)

Unnamed: 0,State,Abbrv,Gender,Percent_FE
0,Alabama,AL,Female,11.68437
1,Alabama,AL,Male,88.31563
2,Alaska,AK,Female,8.130081
3,Alaska,AK,Male,91.869919


Possible genders that have been reported in the fatal encounters dataset are "male", "female" and "transgender":

In [55]:
set(fe_gender['Gender'])

{'Female', 'Male', 'Transgender'}

Now we need a data on state-wise gender distribution as we did for 'Race' before:


<b><font color='DarkGoldenRod'>Gender_distribution.csv:</font></b>
This dataset lists male and female percentages of every state's total population, for every state in the U.S.

In [56]:
show_file('Gender_distribution.csv').head(3)

Unnamed: 0,Location,Male,Female,Total
0,United States,0.489,0.511,1.0
1,Alabama,0.481,0.519,1.0
2,Alaska,0.507,0.493,1.0


This dataset only holds male and female state-wise population proportions. So let's find some data on transgender too:

<b><font color='DarkGoldenRod'>Transgender_per_state.csv:</font></b>
This dataset lists estimates of transgender population percentages and absolute numbers for every state in the U.S. Additinally, it ranks states by these values in ascending order.

In [57]:
show_file('Transgender_per_state.csv').head(3)

Unnamed: 0,State,Population,Percent,Rank
0,United States,1397150.0,0.58,0.0
1,Alabama,22.5,0.61,15.0
2,Alaska,2.7,0.49,33.0


This dataset includes absolute transgender population, percent-based estimates of the transgender population per state and a ranking of states based on high to low transgender population.

Before we do anything with the data, we need to clean and transform the datasets for concatenation:

In [58]:
male_female_pop = pp.load_male_female()
male_female_pop.head(3)

Unnamed: 0,State,Male,Female
0,United States,0.489,0.511
1,Alabama,0.481,0.519
2,Alaska,0.507,0.493


Let's get rid of the rank and absolute population data.

In [59]:
transgender_pop = pp.load_transgender()
transgender_pop.head(3)

Unnamed: 0,State,Population,Transgender
0,United States,1397150.0,0.58
1,Alabama,22.5,0.61
2,Alaska,2.7,0.49


Now we need to combine both datasets. To avoid any bias in this operation, we will cut off equal amounts from all gender groups:

In [60]:
gender_pop = cc.concat_gender_pop(male_female_pop,transgender_pop)
gender_pop[3:12]

Unnamed: 0,State,Gender,Proportion_pop
3,Alabama,Male,47.808369
4,Alabama,Female,51.585329
5,Alabama,Transgender,0.606302
6,Alaska,Male,50.452781
7,Alaska,Female,49.059608
8,Alaska,Transgender,0.487611
9,Arizona,Male,48.996223
10,Arizona,Female,50.387597
11,Arizona,Transgender,0.61618


Now we are ready to estimate <b>gender bias:</b>

In [61]:
bias_df_gender = cc.bias_per_state(pp.fatal_encounters('Gender'), gender_pop, filter_var='Gender')
bias_df_gender.head(4)

Unnamed: 0,State,Abbrv,Gender,Percent_FE,Proportion_pop,Bias
0,Alabama,AL,Female,11.68437,51.585329,-39.900959
1,Alabama,AL,Male,88.31563,47.808369,40.507261
2,Alaska,AK,Female,8.130081,49.059608,-40.929527
3,Alaska,AK,Male,91.869919,50.452781,41.417137


This time, there seems to be a bias in favour of female persons when it comes to gender.

### Finally, let's have a look at age:

In [62]:
fe_age = pp.fatal_encounters('Age')
fe_age.head(4)

Unnamed: 0,State,Abbrv,Age,Percent_FE
0,Alabama,AL,1.0,0.30349
1,Alabama,AL,2.0,0.151745
2,Alabama,AL,3.0,0.151745
3,Alabama,AL,5.0,0.30349


Sadely, it appears as if children are also subject to fatal encounters. Let's find some data on the age distribution per state and have a look at an Minnesota as an example (the state George Floyed was murdered):

<b><font color='DarkGoldenRod'>Age_distribution.csv:</font></b>
Amongst other factors, this dataset lists yearly population estimates in the 2010s for every state for male, female and total population. It additionally groups states by region and divison.

In [63]:
raw_ages = show_file('Age_distribution.csv', separator=',')
raw_ages[raw_ages['NAME'] == 'Minnesota']

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,NAME,SEX,AGE,ESTBASE2010_CIV,POPEST2010_CIV,POPEST2011_CIV,POPEST2012_CIV,POPEST2013_CIV,POPEST2014_CIV,POPEST2015_CIV,POPEST2016_CIV,POPEST2017_CIV,POPEST2018_CIV,POPEST2019_CIV
6264,40,2,4,27,Minnesota,0,0,69009,69265,68499,68103,68955,69960,69638,69710,69429,67799,67629
6265,40,2,4,27,Minnesota,0,1,69762,69406,69866,68914,68680,69557,70526,70598,70699,70351,68358
6266,40,2,4,27,Minnesota,0,2,72316,72055,69674,70119,69361,69410,70006,71282,71665,71468,70924
6267,40,2,4,27,Minnesota,0,3,72956,72726,72365,69945,70641,69881,70043,70696,72160,72283,71994
6268,40,2,4,27,Minnesota,0,4,71460,71792,72988,72529,70388,70919,70153,70706,71510,72823,72717
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6520,40,2,4,27,Minnesota,2,82,12029,11957,11581,11780,11564,11599,11322,10933,11640,11958,12183
6521,40,2,4,27,Minnesota,2,83,11488,11536,11491,11172,11289,11115,11133,10924,10483,11230,11567
6522,40,2,4,27,Minnesota,2,84,10673,10681,10812,10864,10558,10619,10445,10566,10300,9933,10670
6523,40,2,4,27,Minnesota,2,85,72357,72783,73676,75046,76097,77103,77424,78018,78461,78647,78603


This dataset includes estimates based on different years. Let's take the latest one and clean up a bit:

In [64]:
ages_pop = pp.load_ages_pop(include_total=False)
ages_pop[ages_pop['State']=='Minnesota']

Unnamed: 0,State,Age,agewise_total,statewise_total,Proportion_pop
2088,Minnesota,0,67629,5637494,1.199629
2089,Minnesota,1,68358,5637494,1.212560
2090,Minnesota,2,70924,5637494,1.258077
2091,Minnesota,3,71994,5637494,1.277057
2092,Minnesota,4,72717,5637494,1.289882
...,...,...,...,...,...
2169,Minnesota,81,23269,5637494,0.412754
2170,Minnesota,82,21322,5637494,0.378218
2171,Minnesota,83,20107,5637494,0.356666
2172,Minnesota,84,18311,5637494,0.324807


Now we are ready to estimate <b>age bias:</b>

In [65]:
cc.bias_per_state(fe_age, ages_pop,filter_var='Age').head(4)

Unnamed: 0,State,Abbrv,Age,Percent_FE,agewise_total,statewise_total,Proportion_pop,Bias
0,Alabama,AL,1.0,0.30349,58290,4889347,1.192184,-0.888694
1,Alabama,AL,2.0,0.151745,59073,4889347,1.208198,-1.056453
2,Alabama,AL,3.0,0.151745,59799,4889347,1.223047,-1.071302
3,Alabama,AL,5.0,0.30349,59568,4889347,1.218322,-0.914832


It appears that age bias is not as detrimental as gender or racial biases.

Since we now have all our data, we are ready for plotting:

# 2. Plotting

<i> Here overview of the different steps in "Plotting": </i>
    

![](mermaid_plotting.png "Here an overview of the stages in the 'Plotting' step.")




In the introduction we mentioned that goal of this work is to illustrate the extend of the police bias throughout the U.S. and disentangle factors that influence it. 

<b> Let's start with the latter. </b>

In [66]:
from bokeh.io import output_notebook
output_notebook()

## 2.1. Factors influencing bias


Characteristics include Age, Gender, Race and the state someone lives in. To achive this, let's select a state and illustrate the charateristics with bar plots. We will use Minnosota, the state George Floyed was killed, as an example:

In [67]:
from scripts.detail_plots import detail_plot
from bokeh.io import show

my_plots = detail_plot('Minnesota')
show(my_plots)

It seems that in our example state Minnesota, white female individuals are least likely to end up in a police encounter. While there is an effect of age, it apprears marginal compared to gender and racial biases.

## 2.2 Bias throughout the U.S.


Goal is to make the graphic accessible, such that it is easy to understand how bias compares between U.S. states. For this let's plot the bias on a geographical map. This would allow the whomever to look at it to extract location data  and compare this map with other geopolitical maps.

Further, to keep the illustration simple and clear, let's focus on a single characteristic at a time.


In [68]:
import scripts.map as smap

<b> First we need some geometric data of the outlines of each state. </b>

In [69]:
geometry = smap.load_geometry()
geometry.head(3)

Unnamed: 0,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,24,1714934,0400000US24,24,MD,Maryland,0,25151100280,6979966958,"MULTIPOLYGON (((-76.04621 38.02553, -76.00734 ..."
1,19,1779785,0400000US19,19,IA,Iowa,0,144661267977,1084180812,"POLYGON ((-96.62187 42.77925, -96.57794 42.827..."
2,10,1779781,0400000US10,10,DE,Delaware,0,5045925646,1399985648,"POLYGON ((-75.77379 39.72220, -75.75323 39.757..."


- want to look at one characteristic at the time
- how specific characteristic in trait compares to others
- lets just assume something for now

Since we wanted to look at a single characteritic at a time, we should merge the geometric data with a already preprocessed data. More specifically, we need a flexible function that takes in the geometic data, the characteristic we are interested in at the moment, and information about how the user relates to this characteristic.

For example if we are interested how race bias for George Floyd compared thoughout the U.S., we would use "Race" and "Black".

In [70]:
my_map = smap.merge_map(geometry,'Race','Black')
my_map.head(3)

Unnamed: 0,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry,State,Abbrv,Race,Percent_FE,Proportion_pop,Bias
0,24,1714934,0400000US24,24,MD,Maryland,0,25151100280,6979966958,"MULTIPOLYGON (((-76.04621 38.02553, -76.00734 ...",Maryland,MD,Black,39.770554,29.89,9.880554
1,19,1779785,0400000US19,19,IA,Iowa,0,144661267977,1084180812,"POLYGON ((-96.62187 42.77925, -96.57794 42.827...",Iowa,IA,Black,11.636364,3.71,7.926364
2,10,1779781,0400000US10,10,DE,Delaware,0,5045925646,1399985648,"POLYGON ((-75.77379 39.72220, -75.75323 39.757...",Delaware,DE,Black,39.344262,22.18,17.164262


# 3. Bringing it all together

We now have all the information we need, we can start bringing the information from (1) and (2) together into a single illustration. 
We need to keep some things in mind when making the illustration:

1. It should be easy to see how your for example gender compares across states. Let's add some colour onto the U.S. map to display this. Darker colours indicate more extreme values. Red indicates more bias, green less bias.


2. It should also be clear what we are comparing on the map currently. Let's change the title dynamically to display the current selection.


3. Of cause we cannot assume everybody, who uses our illustation knows the names of all states. Let's add some hover capability to display statename and bias in that state.


4. The illustation should represent all values we found for all genders, races and ages in the data, whilst keeping the visuals simple. Let's allow the user to change fill colours based on a single characteristic and allow them to fill in their own characteristics.


4. Finally, let's display our plots from (1) whenever someone is interested in a specific state. Let's implement this with a click event on a state. 


In [71]:
show(smap.map_plotting, notebook_url=your_notebook_url)

# Conclusion

In this exploratory investigation, we found that Gender and Race are much better predictive factors than Age, when it comes to risk of ending up in a fatal police encounter.

Specifially, Race: "White" and Gender: "Female" seem to be most favored. Black males on the other hand seem be most at risk. If we relate this back to the BLM movement, it can be seen that there really seems to be a systematic difference in police encounters that end with the victim dead.

Additionally, there appear to be differences thoughout the U.S., with more southern and eastern states showing stronger bias against black people.

It should be noted, however, that there are possible alternative explanations for the "Bias" observed. For example, it is possible that people with a specific characteristic (e.g. Male) are  more likely to be involved in crime. This could then lead to a stronger Bias score, without stronger police bias. Therefore, further research should be conducted to isolate police bias from possible confounds & mediators.

Overall, findings show that police bias is a problem throughout the U.S. With the issue identified, further research must be done into how to counteract these biases and work towards an equal culture.