### Predictive Modeling
In this notebook, we explore various models to predict crime type based on various time and location predictors. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import OrdinalEncoder
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

### Process Data
Since the data has many categorical features, we need to encode them in order to use sci-kit learn implementations of the different models. Options that we will explore are:

1. One-hot-encoding. Covert each level of the categorical feature into binary indicator variables. The problem with one-hot-encoding is that it can dramatically increase the dimensionality of the data which will increase the computational cost of training and increases the overfitting risk (increases model variance)

2. Ordinal encoding. Assign each level of the categorical feature an integer. While this does not increase the dimensionality of the data, it can introduce bias since the model can interpret the variables based on its magnitude while in reality the numerical values were arbitrarily assigned. This can be less of an issue with tree-based methods, however.

In [12]:
# read csv data
df = pd.read_csv('data/clean_data.csv')
# drop un-needed or already processed columns
df = df.drop(columns=['OBJECTID', 'OCC_HOUR', 'OCC_DATE', 'dayofweek'])

In [14]:
## Get one-hot-encoded data
df1 = pd.get_dummies(df[['NEIGHBOURHOOD_158', 'LOCATION_TYPE']])

# Combine with original data
df_one_hot = pd.concat([df, df1], axis=1)

In [15]:
## Get label encoded data
ordinal_encoder = OrdinalEncoder()
df_ordinal = df.copy() # write on top of a copy of df
df_ordinal.loc[:, ['NEIGHBOURHOOD_158', 'LOCATION_TYPE']] = ordinal_encoder.fit_transform(df[['NEIGHBOURHOOD_158', 'LOCATION_TYPE']]) # create labels

### Random Forest
Random Forest is capable of handling mixed data types, performs automatic feature selection, is robust to outliers, and discovers non-linear relationships. Since Random Forest averages many different decision trees, it is also not prone to overfitting and has low variance in the bias-variance tradeoff. RF is also useful because it discovers feature importance scores, although this needs to be carefully interpreted if there are colinear variables or if variables are of high cardinality.
