# Summary of Findings


### Introduction
Our project seeks to answer the following question: Can we predict the resulting outcome of a complainant allegation by using features of the dataset? We're interested in this question because we want to see if race, gender, age, type of complainant, outcome of police officer and civilian interation, officer ID and the year of the complaint affect the outcome. We plan on using a random forest classifier for this problem, looking at the accuracy of our classifier as an indicator of success. Our choice of using a classifier over a regression model was due to the fact that the model must select between categorical outcomes, instead of numerical values. We want to predict the outcome because it would be interesting to see if different features of a complaint affect the outcome in a certain way.

### Baseline Model
Our baseline model included 9 features. We cleaned these variables by replacing or deleting null values so that they could be used within the model. There are 6 nominal variables and 3 quantitative variables. The nominal varaibles were one-hot encoded in our classifier. We used the accuracy to evaluate our model, getting a score of 0.90 on the training data and 0.55 on the test data. We think this is a relatively good accurcay, due to the fact that we didn't have to do anything to the features to get an accuracy right at 55% in the test data. Once we modify the features for our final model we will achieve a higher accuracy. In our final model, we will want to try to get our test accuracy higher, even if that means that our train score will decrease.

### Final Model
For our final model, we wanted to wanted to reduce the amount of complexity in some of the variables in our data set. The first variable we looked to reduce in complexity was our output variable, outcome of aligation. We did this by reducing the potential outcomes from 7 to 3. This was fair because really there was only three different outcomes, substantiated, unsubstantiated and exonerated. Another feature we looked to change was the complainants gender, we mapped to 3, male, female and other inorder to reduce complexity. We also reduced the outcomes of the complainants to no outcome, arrested, summons and other also to reduce complexity. Lastly, we wanted to standarize the ages of both the officers and complainantees. After this we did grid search on our random forest classifier in order to find the best values for our parameters, n_estimators, and max_depth. After this we used these values for our model and got a train score of 72 and a test score of 56. This means that our new model works better on data that is has not seen.

### Fairness Evaluation
We chose the officer's gender for our fairness evaluation, as we wanted to understand if the gender affects the fairness of the random forest classifier model. This data was easily evaluated as the only options were Male and Female, represented by M, F. Our parity measure was accuracy. The following were our null and alternate hypotheses.

Null Hypothesis: The male and female outcome predictions come from the same distribution.

Alternate Hypothesis: The male and female outcome predictions do not come from the same distribution.

We ran a permuation test to evaluate the model, getting a p-value of 0.17. This means that there was not a discrepancy between the outcome predictions of complaints against male and female officers, leading us to believe that the predictions between genders of officers come from the same distribution. Thus, we fail to reject the null.

# Code

In [1]:
# imports
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'  # Higher resolution figures

In [2]:
#sklearn imports
from sklearn.preprocessing import FunctionTransformer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import accuracy_score, log_loss


import sklearn.preprocessing as pp

from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

In [3]:
# reading in data
data = pd.read_csv("NYPD-Dataset.csv")
data

Unnamed: 0,unique_mos_id,first_name,last_name,command_now,shield_no,complaint_id,month_received,year_received,month_closed,year_closed,...,mos_age_incident,complainant_ethnicity,complainant_gender,complainant_age_incident,fado_type,allegation,precinct,contact_reason,outcome_description,board_disposition
0,10004,Jonathan,Ruiz,078 PCT,8409,42835,7,2019,5,2020,...,32,Black,Female,38.0,Abuse of Authority,Failure to provide RTKA card,78.0,Report-domestic dispute,No arrest made or summons issued,Substantiated (Command Lvl Instructions)
1,10007,John,Sears,078 PCT,5952,24601,11,2011,8,2012,...,24,Black,Male,26.0,Discourtesy,Action,67.0,Moving violation,Moving violation summons issued,Substantiated (Charges)
2,10007,John,Sears,078 PCT,5952,24601,11,2011,8,2012,...,24,Black,Male,26.0,Offensive Language,Race,67.0,Moving violation,Moving violation summons issued,Substantiated (Charges)
3,10007,John,Sears,078 PCT,5952,26146,7,2012,9,2013,...,25,Black,Male,45.0,Abuse of Authority,Question,67.0,PD suspected C/V of violation/crime - street,No arrest made or summons issued,Substantiated (Charges)
4,10009,Noemi,Sierra,078 PCT,24058,40253,8,2018,2,2019,...,39,,,16.0,Force,Physical force,67.0,Report-dispute,Arrest - other violation/crime,Substantiated (Command Discipline A)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33353,9992,Tomasz,Pulawski,078 PCT,2642,35671,8,2016,2,2017,...,36,Asian,Male,21.0,Discourtesy,Word,66.0,Moving violation,Moving violation summons issued,Unsubstantiated
33354,9992,Tomasz,Pulawski,078 PCT,2642,35671,8,2016,2,2017,...,36,Asian,Male,21.0,Abuse of Authority,Interference with recording,66.0,Moving violation,Moving violation summons issued,Unsubstantiated
33355,9992,Tomasz,Pulawski,078 PCT,2642,35671,8,2016,2,2017,...,36,Asian,Male,21.0,Abuse of Authority,Search (of person),66.0,Moving violation,Moving violation summons issued,Substantiated (Formalized Training)
33356,9992,Tomasz,Pulawski,078 PCT,2642,35671,8,2016,2,2017,...,36,Asian,Male,21.0,Abuse of Authority,Vehicle search,66.0,Moving violation,Moving violation summons issued,Substantiated (Formalized Training)


### Baseline Model

In [4]:
# selecting the columns we are looking at
data_cols = data[['fado_type', "outcome_description", "mos_gender","mos_ethnicity","complainant_ethnicity",
                  "complainant_gender", "complainant_age_incident",
                  "mos_age_incident","year_received","board_disposition","unique_mos_id"]]
df = data_cols.copy()
df

Unnamed: 0,fado_type,outcome_description,mos_gender,mos_ethnicity,complainant_ethnicity,complainant_gender,complainant_age_incident,mos_age_incident,year_received,board_disposition,unique_mos_id
0,Abuse of Authority,No arrest made or summons issued,M,Hispanic,Black,Female,38.0,32,2019,Substantiated (Command Lvl Instructions),10004
1,Discourtesy,Moving violation summons issued,M,White,Black,Male,26.0,24,2011,Substantiated (Charges),10007
2,Offensive Language,Moving violation summons issued,M,White,Black,Male,26.0,24,2011,Substantiated (Charges),10007
3,Abuse of Authority,No arrest made or summons issued,M,White,Black,Male,45.0,25,2012,Substantiated (Charges),10007
4,Force,Arrest - other violation/crime,F,Hispanic,,,16.0,39,2018,Substantiated (Command Discipline A),10009
...,...,...,...,...,...,...,...,...,...,...,...
33353,Discourtesy,Moving violation summons issued,M,White,Asian,Male,21.0,36,2016,Unsubstantiated,9992
33354,Abuse of Authority,Moving violation summons issued,M,White,Asian,Male,21.0,36,2016,Unsubstantiated,9992
33355,Abuse of Authority,Moving violation summons issued,M,White,Asian,Male,21.0,36,2016,Substantiated (Formalized Training),9992
33356,Abuse of Authority,Moving violation summons issued,M,White,Asian,Male,21.0,36,2016,Substantiated (Formalized Training),9992


In [5]:
#cleaning
df.dropna(subset=["outcome_description"], inplace = True)
df['complainant_ethnicity'] = df['complainant_ethnicity'].fillna("Unknown")
df['complainant_gender'] = df['complainant_gender'].fillna("Not described")
df['complainant_age_incident'] = df['complainant_age_incident'].fillna(0)
# no nulls left
df.isnull().sum()

fado_type                   0
outcome_description         0
mos_gender                  0
mos_ethnicity               0
complainant_ethnicity       0
complainant_gender          0
complainant_age_incident    0
mos_age_incident            0
year_received               0
board_disposition           0
unique_mos_id               0
dtype: int64

In [6]:
# pipeline
cat_feat = ['fado_type', "outcome_description","complainant_ethnicity","complainant_gender",
            "mos_gender","mos_ethnicity"]
cat_transformer =  OneHotEncoder(handle_unknown='ignore')

everything = ["mos_age_incident",'complainant_age_incident',"year_received"]
every_transformer = Pipeline(steps=[
        ('every', FunctionTransformer(lambda x: x))])

preprocessor = ColumnTransformer(
    transformers=[
        ('one_hot', cat_transformer, cat_feat),
        ("everything", every_transformer, everything)])

pl = Pipeline(steps=[('preprocessor', preprocessor), 
                     ('class', RandomForestClassifier())
                    ])

In [7]:
# pick the feature variables and output variables
X = df.drop(["board_disposition"], axis = 1 )
y = df.board_disposition
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

In [8]:
# basic model fit and scores for test and training data
pl.fit(X_train, y_train)

pred_train = pl.predict(X_train)
train_score = pl.score(X_train, y_train)

pred_test = pl.predict(X_test)
test_score = pl.score(X_test, y_test)

train_score,test_score

(0.8997037155669443, 0.5566898871006486)

### Final Model

In [9]:
# helper functions
def outcome_fix(outcome):
    if "No" in outcome:
        return outcome
    elif "Arrest" in outcome:
        return "Arrest"
    elif "Summons" in outcome:
        return "Summons"
    else:
        return "Other"
    
def _substantial(outcome):
    if outcome[:13] == 'Substantiated':
        return "Substantiated"
    elif outcome[:10] == "Exonerated":
        return 'Exonerated'
    else:
        return "Unsubstantiated"

def _gender(g):
    if g =="Male":
        return g
    elif g == "Female":
        return g
    else: 
        return "Other"
    
def helper(outcome):
    return outcome.apply(_substantial)
def helper_2(g):
    g["complainant_gender"] = g["complainant_gender"].apply(_gender)
    return g
def helper_3(g):
    g["outcome_description"] = g["outcome_description"].apply(outcome_fix)
    return g

In [10]:
#pipeline
cat_feat = ['fado_type',"complainant_ethnicity","mos_gender","mos_ethnicity","complainant_ethnicity"]
cat_transformer =  OneHotEncoder(handle_unknown='ignore')

gender_feat = ["complainant_gender"]
gender_transformer = Pipeline(steps=[
        ('gender_map', FunctionTransformer(helper_2) ),
        ("one_hot",OneHotEncoder(handle_unknown='ignore') ) ])

outcome_feat = ["outcome_description"]
outcome_transformer = Pipeline(steps=[
        ('outcome_map', FunctionTransformer(helper_3) ),
        ("one_hot",OneHotEncoder(handle_unknown='ignore') ) ])

everything = ["mos_age_incident",'complainant_age_incident']
std_transformer = StandardScaler()

year = ["year_received"]
year_transformer = Pipeline(steps=[
        ('every', FunctionTransformer(lambda x: x))])

preprocessor = ColumnTransformer(
    transformers=[
        ('one_hot', cat_transformer, cat_feat),
        ("everything", std_transformer, everything),
        ("Outcome", outcome_transformer,outcome_feat),
        ("gender",gender_transformer, gender_feat),
        ("year",year_transformer,year)
    ])

In [11]:
# transforming output variable and assigning test and train data
X = df.drop(["board_disposition"], axis = 1 )
y = df.board_disposition
dispo_simplifier = FunctionTransformer(helper)
dispo_simplifier.fit(y)
y = dispo_simplifier.transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

In [20]:
classifiers = [
    KNeighborsClassifier(3),
    SVC(kernel="rbf", C=0.025, probability=True),
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    GradientBoostingClassifier()]

log_cols=["Classifier", "Train_Score", "Test_Score"]
log = pd.DataFrame(columns=log_cols)

In [21]:
for clf in classifiers:
    pl2 = Pipeline(steps=[('preprocessor', preprocessor), 
                     ('class',clf)
                    ])
    pl2.fit(X_train, y_train)
    name = clf.__class__.__name__
    
    train_score = pl2.score(X_train, y_train)
    test_score = pl2.score(X_test, y_test)
    
    log_entry = pd.DataFrame([[name, train_score*100, test_score*100]], columns=log_cols)
    log = log.append(log_entry)

In [22]:
log

Unnamed: 0,Classifier,Train_Score,Test_Score
0,KNeighborsClassifier,73.502562,52.366082
0,SVC,46.372518,46.144607
0,DecisionTreeClassifier,89.642056,53.098727
0,RandomForestClassifier,89.638053,56.281528
0,GradientBoostingClassifier,54.252082,53.459044


In [23]:
# grid search

parameters = {
    "n_estimators": [150,200,300],
    'max_depth': [10,15,20,None]    
}
pl3 = Pipeline(steps=[('preprocessor', preprocessor), 
                     ('class',GridSearchCV(RandomForestClassifier(), parameters, cv = 2, n_jobs = 5))
                    ])

In [24]:
# fitting
pl3.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('one_hot',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['fado_type',
                                                   'complainant_ethnicity',
                                                   'mos_gender',
                                                   'mos_ethnicity',
                                                   'complainant_ethnicity']),
                                                 ('everything',
                                                  StandardScaler(),
                                                  ['mos_age_incident',
                                                   'complainant_age_incident']),
                                                 ('Outcome',
                                                  Pipeline(steps=[('outcome_map',
                                      

In [25]:
# best params of grid search
pl3["class"].best_params_

{'max_depth': 15, 'n_estimators': 300}

In [26]:
# using best parameters
pl4 = Pipeline(steps=[('preprocessor', preprocessor), 
                     ('class',RandomForestClassifier(max_depth=15,n_estimators=300))])
pl4.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('one_hot',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['fado_type',
                                                   'complainant_ethnicity',
                                                   'mos_gender',
                                                   'mos_ethnicity',
                                                   'complainant_ethnicity']),
                                                 ('everything',
                                                  StandardScaler(),
                                                  ['mos_age_incident',
                                                   'complainant_age_incident']),
                                                 ('Outcome',
                                                  Pipeline(steps=[('outcome_map',
                                      

In [27]:
# scores of model

pred_train = pl4.predict(X_train)
train_score = pl4.score(X_train, y_train)

pred_test = pl4.predict(X_test)
test_score = pl4.score(X_test, y_test)

train_score,test_score

(0.7261771300448431, 0.553927456161422)

### Fairness Evaluation

In [18]:
# fairness data frame with gender and prediction vs actaul 

results = pd.DataFrame()
results['mos_gender'] = X_test.mos_gender
results['prediction'] = pred_test
results['tag'] = y_test
results

Unnamed: 0,mos_gender,prediction,tag
11398,M,Unsubstantiated,Unsubstantiated
13127,M,Exonerated,Unsubstantiated
12129,M,Unsubstantiated,Unsubstantiated
18634,M,Unsubstantiated,Substantiated
25633,M,Unsubstantiated,Unsubstantiated
...,...,...,...
5930,M,Substantiated,Substantiated
15174,M,Unsubstantiated,Unsubstantiated
7578,M,Substantiated,Exonerated
30183,M,Unsubstantiated,Substantiated


In [19]:
# actual accuracy of prediction

results["accuracy"] = results.tag == results.prediction
obs_diff = results.groupby("mos_gender").mean().diff().iloc[-1][0]

In [21]:
# permuation test to see if gender accuracy is from the same distrubtion

metrs = []
for _ in range(100):
    s = (
        results.assign(mos_gender=results.mos_gender.sample(frac=1.0, replace=False).reset_index(drop=True))
        .groupby('mos_gender').mean()
        .diff()
        .iloc[-1][0]
    )
    
    metrs.append(s)
pd.Series(metrs >= obs_diff).mean()

0.45