# Fact-Checking Findings

[Findings from José](https://docs.google.com/document/d/1fe0SxvyPyrN9eK5DnAuF5THvyeiOxMf1gTUl615MSU4/edit?usp=sharinghttps://docs.google.com/document/d/1fe0SxvyPyrN9eK5DnAuF5THvyeiOxMf1gTUl615MSU4/edit?usp=sharing)

[Findings from Will](https://docs.google.com/document/d/13pe-Y1Gd2NyWCqGBmcDJUq8CRBrcrMa1O50-meezV-M/edit?usp=sharinghttps://docs.google.com/document/d/13pe-Y1Gd2NyWCqGBmcDJUq8CRBrcrMa1O50-meezV-M/edit?usp=sharing)

In [1]:
import os
import pandas as pd
import altair as alt

data_dir = os.path.join(cwd, 'data')
processed_dir = os.path.join(data_dir, 'processed')
discipline_csv = os.path.join(processed_dir, 'cleaned_discipline_final.csv')
staff_roster_csv = os.path.join(processed_dir, 'staff_roster_cleaned.csv')

In [2]:
# variables for importing the data

columns = {
    'CONTROL #': 'control_number',
    'FINAL DISP DATE': 'final_disposition_date',
    'FINAL DISPOSITION': 'final_disposition',
    'FINAL # DAYS': 'final_number_of_days',
    'EMPLOYEE LAST NAME': 'last_name',
    'EMPLOYEE FIRST NAME': 'first_name',
    'ALLEGATION': 'allegation',
    'COMP SEX': 'complaintant_gender',
    'COMP RACE': 'complaintant_race',
    'full_name': 'dirty_full_name',
    'clean_name_x': 'clean_discipline_name',
    'roster_name_match': 'clean_roster_name',
    'gender': 'officer_gender',
    'clean_race_ethnicity': 'officer_race',
}

discipline_df = pd.read_csv(
    discipline_csv,
    parse_dates = ['FINAL DISP DATE'],
    dtype = {'CONTROL #': 'object'}
)
discipline_df = discipline_df.rename(columns=columns)

# keep only a selection of the columns to make things easier to work with
discipline_df = discipline_df[[
    'control_number',
    'final_disposition_date',
    'final_disposition',
    'final_number_of_days',
    'dirty_full_name',
    'clean_roster_name',
    'officer_gender',
    'officer_race',
    'allegation'
]].copy()

discipline_df['year'] = discipline_df.apply(
    lambda x: x['final_disposition_date'].year,
    axis=1
)

discipline_final = discipline_df[
    (discipline_df.officer_race!='unknown') & (discipline_df.clean_roster_name!='no match') & ~(discipline_df.officer_race.isna())
].copy()

discipline_final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12502 entries, 31 to 12895
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   control_number          12501 non-null  object        
 1   final_disposition_date  12502 non-null  datetime64[ns]
 2   final_disposition       12502 non-null  object        
 3   final_number_of_days    2620 non-null   float64       
 4   dirty_full_name         12502 non-null  object        
 5   clean_roster_name       12502 non-null  object        
 6   officer_gender          12502 non-null  object        
 7   officer_race            12502 non-null  object        
 8   allegation              12498 non-null  object        
 9   year                    12502 non-null  int64         
dtypes: datetime64[ns](1), float64(1), int64(1), object(7)
memory usage: 1.0+ MB


In [3]:
staff_roster_df = pd.read_csv(
    staff_roster_csv,
    parse_dates=['date_started']
)


staff_roster_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23645 entries, 0 to 23644
Data columns (total 9 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   name                  23645 non-null  object        
 1   job_desc              23645 non-null  object        
 2   dept_desc             23645 non-null  object        
 3   gender                23645 non-null  object        
 4   race_ethnicity        23643 non-null  object        
 5   date_started          23644 non-null  datetime64[ns]
 6   year                  23645 non-null  int64         
 7   clean_name            23645 non-null  object        
 8   clean_race_ethnicity  23643 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(7)
memory usage: 1.6+ MB


## Demographics

✅ - Looking at the roster data, officer amount has ranged between roughly 1800 and 2100 per year from 2010 to 2021.

✅ Max - 2105 in 2018.

✅ Min - 1816 in 2011


In [5]:
staff_roster_df.groupby(['year']).size()

year
2010    1902
2011    1816
2012    2029
2013    1933
2014    1930
2015    1954
2016    2014
2017    2024
2018    2105
2019    1974
2020    1977
2021    1987
dtype: int64

✅ - Average 1970

In [6]:
staff_roster_df.groupby(['year']).size().mean()

1970.4166666666667

## Discipline Data

✅  -- 8/12 corrected in the document of findings 

Over the past 10 years, white officers have been involved in roughly 80% of all disciplinary action while Blacks make up about 13%.

-- correction: if you are going to report one number as rounded, you have to be consistent


In [11]:
discipline_final.groupby(
    ['officer_race']
).size().to_frame('total') / len(discipline_final) * 100

Unnamed: 0_level_0,total
officer_race,Unnamed: 1_level_1
asian/pacific islander,1.415773
black,13.677812
hispanic,1.887698
multiracial,2.991521
native american,0.06399
white,79.963206


✅  - there were few terminations (33)


In [23]:
len(discipline_final[
    discipline_final.final_disposition.str.contains('TERMIN')
])

33

In [26]:
discipline_final[
    discipline_final.final_disposition.str.contains('TERMIN')
].final_disposition.value_counts()

TERMINATION                         24
TERMINATED                           8
TERMINATION, DEMOTION, SUSPENDED     1
Name: final_disposition, dtype: int64

✅ - or demotions (12).

In [24]:
len(discipline_final[
    discipline_final.final_disposition.str.contains('DEMOTION')
])

12

✅ - Of all 1,710 dispositions involving Black officers


In [29]:
len(discipline_final[
    discipline_final.officer_race == 'black'
])

1710

✅  29.8% were suspensions.

In [None]:
len(discipline_final[
    (discipline_final.final_disposition.str.contains('SUSPEN')) & (discipline_final.officer_race == 'black')
])/len(discipline_final[
    discipline_final.officer_race == 'black'
])

0.2982456140350877

✅ Of all 9,997 dispositions involving White officers


In [34]:
len(discipline_final[
    discipline_final.officer_race == 'white'
])

9997

✅ 19.8% were suspensions. 

In [35]:
len(discipline_final[
    (discipline_final.final_disposition.str.contains('SUSPEN')) & (discipline_final.officer_race == 'white')
])/len(discipline_final[
    discipline_final.officer_race == 'white'
])

0.1983595078523557

Of all dispositions involving females, 24% were suspensions.


Of all dispositions involving males, 21% were suspensions.


Thus, dispositions resulting in suspensions were 3% higher for female officers than male officers.
