### This notebook more clearly demonstrates the models that have been run on our DS. There is a slight variance between the DF used in part 1 and part 2. Part 1 was cleaned using an imputer and part 2 was done from an older by hand model. The imputer was implemented to clean up part 1 but there is little difference in the models performance.

In [4]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy import stats as stats

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import precision_score, recall_score
from sklearn.model_selection import train_test_split, GridSearchCV,\
cross_val_score, RandomizedSearchCV

from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

from pandas_profiling import ProfileReport

from sklearn.compose import ColumnTransformer,  make_column_selector as selector
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as ImPipeline
from imblearn.over_sampling import SMOTEN

from sklearn.metrics import plot_confusion_matrix, recall_score,\
    accuracy_score, precision_score, f1_score
from sklearn.metrics import roc_curve

from numpy import sqrt
from numpy import argmax

In [5]:
df = pd.read_csv('./Data/clean_terry.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Subject Age Group,Weapon Type,Officer YOB,Officer Gender,Officer Race,Subject Perceived Race,Subject Perceived Gender,Arrest Flag,Frisk Flag,Sector,hour
0,1792,1 - 17,,1992,M,White,American Indian or Alaska Native,Male,0,1,G,0
1,1798,1 - 17,,1982,M,Nat Hawaiian/Oth Pac Islander,Black or African American,Male,0,0,J,19
2,1804,1 - 17,,1983,M,White,Black or African American,Male,0,1,E,22
3,1805,1 - 17,,1985,F,White,Black or African American,Male,0,0,K,15
4,1806,1 - 17,,1985,F,White,Black or African American,Male,0,0,K,15


In [62]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39022 entries, 0 to 39021
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Subject Age Group         39022 non-null  object
 1   Weapon Type               39022 non-null  object
 2   Officer YOB               39022 non-null  int64 
 3   Officer Gender            39022 non-null  object
 4   Officer Race              39022 non-null  object
 5   Subject Perceived Race    39022 non-null  object
 6   Subject Perceived Gender  39022 non-null  object
 7   Arrest Flag               39022 non-null  int64 
 8   Frisk Flag                39022 non-null  int64 
 9   Sector                    39022 non-null  object
 10  hour                      39022 non-null  int64 
dtypes: int64(4), object(7)
memory usage: 3.3+ MB


In [35]:
df.drop('Unnamed: 0', axis = 1, inplace=True)

In [36]:
y = df['Arrest Flag']
X = df.drop('Arrest Flag', axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state= 42)

In [37]:
y_train.value_counts(normalize=True)

0    0.888471
1    0.111529
Name: Arrest Flag, dtype: float64

In [110]:
subpipe_cat = Pipeline(steps=[
    ('cat_impute', SimpleImputer(missing_values='Unknown', strategy='most_frequent', add_indicator = True)),
    ('cat_impute2', SimpleImputer(missing_values='-', strategy='most_frequent', add_indicator = True)),
    ('ohe', OneHotEncoder(sparse=False, handle_unknown='ignore'))
])

In [111]:
CT = ColumnTransformer(transformers=[
    ('subpipe_cat', subpipe_cat, selector(dtype_include=[object, int]))
], remainder='passthrough')

In [40]:
dummy_model_pipe = Pipeline(steps=[
    ('dum', DummyClassifier(strategy='most_frequent'))
])

In [41]:
class ModelWithCV():
    '''Structure to save the model and more easily see its crossvalidation'''
    
    def __init__(self, model, model_name, X, y, cv_now=True):
        self.model = model
        self.name = model_name
        self.X = X
        self.y = y
        # For CV results
        self.cv_results = None
        self.cv_mean = None
        self.cv_median = None
        self.cv_std = None
        #
        if cv_now:
            self.cross_validate()
        
    def cross_validate(self, X=None, y=None, kfolds=10):
        '''
        Perform cross-validation and return results.
        
        Args: 
          X:
            Optional; Training data to perform CV on. Otherwise use X from object
          y:
            Optional; Training data to perform CV on. Otherwise use y from object
          kfolds:
            Optional; Number of folds for CV (default is 10)  
        '''
        
        cv_X = X if X else self.X
        cv_y = y if y else self.y

        self.cv_results = cross_val_score(self.model, cv_X, cv_y, cv=kfolds)
        self.cv_mean = np.mean(self.cv_results)
        self.cv_median = np.median(self.cv_results)
        self.cv_std = np.std(self.cv_results)

        
    def print_cv_summary(self):
        cv_summary = (
        f'''CV Results for `{self.name}` model:
            {self.cv_mean:.5f} ± {self.cv_std:.5f} accuracy
        ''')
        print(cv_summary)

        
    def plot_cv(self, ax):
        '''
        Plot the cross-validation values using the array of results and given 
        Axis for plotting.
        '''
        ax.set_title(f'CV Results for `{self.name}` Model')
        # Thinner violinplot with higher bw
        sns.violinplot(y=self.cv_results, ax=ax, bw=.4)
        sns.swarmplot(
                y=self.cv_results,
                color='orange',
                size=10,
                alpha= 0.8,
                ax=ax
        )

        return ax

In [42]:
dummy_pipe = ModelWithCV(dummy_model_pipe, 'dummy_model', X_train, y_train)

In [43]:
dummy_pipe.print_cv_summary()

CV Results for `dummy_model` model:
            0.88847 ± 0.00016 accuracy
        


## Wow, what a wonderful baseline model! 

# Im using a balanced logreg model to combat the imbalance in classes

In [44]:
logreg_model_pipe = Pipeline([
    ('ct', CT),
    ('logreg', LogisticRegression(random_state=42, max_iter=1000, class_weight='balanced'))
])

In [45]:
params = {}
params['logreg__penalty'] = ['l2']
#params['logreg__C'] = []
params['logreg__solver'] = ['liblinear']

In [46]:
gs = GridSearchCV(logreg_model_pipe, params, cv = 5, verbose = 2)

In [47]:
gs.fit(X_train, y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END .......logreg__penalty=l2, logreg__solver=liblinear; total time=   0.3s
[CV] END .......logreg__penalty=l2, logreg__solver=liblinear; total time=   0.3s
[CV] END .......logreg__penalty=l2, logreg__solver=liblinear; total time=   0.3s
[CV] END .......logreg__penalty=l2, logreg__solver=liblinear; total time=   0.3s
[CV] END .......logreg__penalty=l2, logreg__solver=liblinear; total time=   0.3s


In [48]:
gs.best_score_

0.6627827976052157

In [49]:
y_hat = gs.predict(X_test)

In [50]:
print(f"""
Our model's accuracy on the test set is {round(accuracy_score(y_test, y_hat), 2)}. \n
Our model's recall on the test set is {round(recall_score(y_test, y_hat), 2)} \n
Our model's precision on the test set is {round(precision_score(y_test, y_hat), 2)} \n
Our model's f1-score on the test is {round(f1_score(y_test, y_hat), 2)}.
""")


Our model's accuracy on the test set is 0.67. 

Our model's recall on the test set is 0.65 

Our model's precision on the test set is 0.2 

Our model's f1-score on the test is 0.3.



# Well.. Thats not great.. Lets try the random forest

In [51]:
rfc_model_pipe = Pipeline([
    ('ct', CT),
    ('rfc', RandomForestClassifier(random_state=42))
])

parameters = {}

In [52]:
gs_rfc = GridSearchCV(rfc_model_pipe, parameters, cv = 5, verbose = 2)

In [53]:
gs_rfc.fit(X_train, y_train)

Fitting 5 folds for each of 1 candidates, totalling 5 fits
[CV] END .................................................... total time=   4.1s
[CV] END .................................................... total time=   4.1s
[CV] END .................................................... total time=   4.1s
[CV] END .................................................... total time=   3.9s
[CV] END .................................................... total time=   4.1s


In [54]:
gs_rfc.best_score_

0.8832433628569115

In [55]:
y_hat = gs_rfc.predict(X_test)

In [56]:
print(f"""
Our model's accuracy on the test set is {round(accuracy_score(y_test, y_hat), 2)}. \n
Our model's recall on the test set is {round(recall_score(y_test, y_hat), 2)} \n
Our model's precision on the test set is {round(precision_score(y_test, y_hat), 2)} \n
Our model's f1-score on the test is {round(f1_score(y_test, y_hat), 2)}.
""")


Our model's accuracy on the test set is 0.88. 

Our model's recall on the test set is 0.07 

Our model's precision on the test set is 0.35 

Our model's f1-score on the test is 0.12.



# Okay, that doesn't seem all that great either lets SMOTEN

In [58]:
imb_pipe = ImPipeline(steps=[
    ('ct', CT),
    ('sm', SMOTEN(sampling_strategy=0.8, random_state=42)),
    ('log', LogisticRegression(random_state=42, max_iter=1000))
])

In [59]:
imb_pipe.fit(X_train, y_train)

In [60]:
y_hat = imb_pipe.predict(X_test)

In [61]:
print(f"""
Our model's accuracy on the test set is {round(accuracy_score(y_test, y_hat), 2)}. \n
Our model's recall on the test set is {round(recall_score(y_test, y_hat), 2)} \n
Our model's precision on the test set is {round(precision_score(y_test, y_hat), 2)} \n
Our model's f1-score on the test is {round(f1_score(y_test, y_hat), 2)}.
""")


Our model's accuracy on the test set is 0.89. 

Our model's recall on the test set is 0.03 

Our model's precision on the test set is 0.41 

Our model's f1-score on the test is 0.06.



In [63]:
X_train.head()

Unnamed: 0,Subject Age Group,Weapon Type,Officer YOB,Officer Gender,Officer Race,Subject Perceived Race,Subject Perceived Gender,Frisk Flag,Sector,hour
27889,36 - 45,,1986,M,Nat Hawaiian/Oth Pac Islander,White,Female,0,W,1
22699,26 - 35,,1986,M,White,Black or African American,Female,1,E,1
26168,36 - 45,,1964,M,White,White,Female,0,B,15
30584,36 - 45,,1986,F,Hispanic or Latino,Black or African American,Male,0,J,23
17539,26 - 35,,1992,M,White,White,Male,0,J,22


In [64]:
df['Officer YOB'].value_counts()

1986    2806
1987    2498
1991    2305
1992    2155
1990    2034
1984    2030
1988    1803
1985    1754
1989    1736
1993    1462
1982    1405
1983    1334
1995    1282
1979    1188
1981    1165
1994    1020
1976     894
1978     884
1971     871
1996     834
1977     742
1973     678
1980     659
1967     586
1997     568
1970     461
1969     437
1968     389
1974     388
1975     385
1964     357
1962     314
1972     304
1965     286
1963     185
1961     135
1959     121
1966     118
1998     100
1958      91
1960      83
1954      41
1953      30
1999      20
2000      16
1957      16
1956      15
1955      13
1900       8
1948       6
1949       5
1946       2
1952       2
1951       1
Name: Officer YOB, dtype: int64

# It seems like maybe we still need to do some feature engineering

#### Maybe binning YOB will help out

In [65]:
df['Officer_YOB_Bins'] = pd.cut(
   df['Officer YOB'], 
   [1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000]
)

In [86]:
df[df.isna().any(axis=1)]

Unnamed: 0,Subject Age Group,Weapon Type,Officer YOB,Officer Gender,Officer Race,Subject Perceived Race,Subject Perceived Gender,Arrest Flag,Frisk Flag,Sector,hour,Officer_YOB_Bins
18612,26 - 35,,1900,M,Unknown,Unknown,Male,0,1,E,4,
19791,26 - 35,,1900,M,Unknown,Unknown,Male,1,0,U,2,
20036,26 - 35,,1900,M,Unknown,Unknown,Male,0,0,E,9,
20264,26 - 35,,1900,M,Unknown,White,Male,1,1,E,11,
31470,36 - 45,,1900,M,Unknown,Asian,Male,1,0,S,8,
35200,46 - 55,,1900,M,Unknown,White,Male,0,0,J,22,
36844,46 - 55,,1900,M,Unknown,White,Male,1,0,D,11,
37804,56 and Above,,1900,M,Unknown,Hispanic,Male,0,1,B,14,


In [88]:
df_remix[df_remix.isna().any(axis=1)]

Unnamed: 0,Subject Age Group,Weapon Type,Officer Gender,Officer Race,Subject Perceived Race,Subject Perceived Gender,Arrest Flag,Frisk Flag,Sector,hour,Officer_YOB_Bins


In [None]:
df_remix.dropna(inplace=True)

In [70]:
df_remix = df.drop('Officer YOB', axis = 1)

In [91]:
df_remix.head()

Unnamed: 0,Subject Age Group,Weapon Type,Officer Gender,Officer Race,Subject Perceived Race,Subject Perceived Gender,Arrest Flag,Frisk Flag,Sector,hour,Officer_YOB_Bins
0,1 - 17,,M,White,American Indian or Alaska Native,Male,0,1,G,0,"(1990, 2000]"
1,1 - 17,,M,Nat Hawaiian/Oth Pac Islander,Black or African American,Male,0,0,J,19,"(1980, 1990]"
2,1 - 17,,M,White,Black or African American,Male,0,1,E,22,"(1980, 1990]"
3,1 - 17,,F,White,Black or African American,Male,0,0,K,15,"(1980, 1990]"
4,1 - 17,,F,White,Black or African American,Male,0,0,K,15,"(1980, 1990]"


In [97]:
df_remix.Officer_YOB_Bins = df_remix.Officer_YOB_Bins.astype('object')

In [98]:
df_remix.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 39014 entries, 0 to 39021
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Subject Age Group         39014 non-null  object
 1   Weapon Type               39014 non-null  object
 2   Officer Gender            39014 non-null  object
 3   Officer Race              39014 non-null  object
 4   Subject Perceived Race    39014 non-null  object
 5   Subject Perceived Gender  39014 non-null  object
 6   Arrest Flag               39014 non-null  int64 
 7   Frisk Flag                39014 non-null  int64 
 8   Sector                    39014 non-null  object
 9   hour                      39014 non-null  int64 
 10  Officer_YOB_Bins          39014 non-null  object
dtypes: int64(3), object(8)
memory usage: 3.6+ MB


# Feature engineered into SMOTE pipe

In [112]:
y = df_remix['Arrest Flag']
X = df_remix.drop('Arrest Flag', axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state= 42)

In [113]:
imb_pipe.fit(X_train, y_train)

In [114]:
y_hat = imb_pipe.predict(X_test)

In [115]:
print(f"""
Our model's accuracy on the test set is {round(accuracy_score(y_test, y_hat), 2)}. \n
Our model's recall on the test set is {round(recall_score(y_test, y_hat), 2)} \n
Our model's precision on the test set is {round(precision_score(y_test, y_hat), 2)} \n
Our model's f1-score on the test is {round(f1_score(y_test, y_hat), 2)}.
""")


Our model's accuracy on the test set is 0.89. 

Our model's recall on the test set is 0.03 

Our model's precision on the test set is 0.39 

Our model's f1-score on the test is 0.06.



## well.. at least we tried.. nothing helps this stupid fucking DS

In [116]:
df_remix['Subject Perceived Race'].value_counts()

White                                        19678
Black or African American                    12334
Unknown                                       2527
Asian                                         1373
Hispanic                                      1183
American Indian or Alaska Native              1161
Multi-Racial                                   562
Other                                          112
Native Hawaiian or Other Pacific Islander       84
Name: Subject Perceived Race, dtype: int64

# I think its safe to say that this model failed, but why? What can we learn?

Exploring ideas of why it isnt performing well.

-It should be obvious that there isn't a formula for who gets arrested, thats why both sides are upset. Cops think there is a formula (they arrest guilty people) and we know there isn't (They arrest black and brown people) which is reflected in the inability to create a model. There is obviously a double standard. Look into how many white people get arrested with weapon and how many minority get arrested. (Why is t he black community pulled over so much when they make up only 7% of Seattle population?)

-Totals of races been pulled over


-The categories are bad. Race and gender are only part of the equation. 

-Lack of features

-categories like, Blood pressure, how many swear words were exchanged, how many times has this person been unjustly hassled by the cops before, did the cop have recorded prejudicies, was a gun drawn, did the car smell like weed or alcohol. 

-Baised dataset, Not accurately reported (Hispanic graph), (Arrests listed two different ways both ways bad) 

-Check your data source.

-More data



In [143]:
#df_remix.to_csv('df_remix_terry.csv')