<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice Gridsearch and Multinomial Models with SF Crime Data

_Authors: Joseph Nelson (DC), Sam Stack (DC)_

---

### Multinomial logistic regression models

So far, we have been using logistic regression for binary problems where there are only two class labels. Logistic regression can be extended to dependent variables with multiple classes.

There are two ways sklearn solves multiple-class problems with logistic regression: a multinomial loss or a "one vs. rest" (OvR) process where a model is fit for each target class vs. all the other classes. 

**Multinomial vs. OvR**
- (both) 'k' classes
- (M) 'k-1' models with 1 reference category
- (OvR) 'k*(k-1)/2' models

You will use the gridsearch in conjunction with multinomial logistic to optimize a model that predicts the category (type) of crime based on various features captured by San Francisco police departments.

**Necessary lab imports**

In [82]:
import numpy as np
import pandas as pd
import patsy

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV


import seaborn as sns

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### 1. Read in the data

In [83]:
crime_csv = './datasets/sf_crime_train.csv'

In [84]:
# A:
crime_df = pd.read_csv(crime_csv)

### 2. Create column for hour, month, and year from 'Dates' column.

> *Hint: `pd.to_datetime` may or may not be helpful.*


In [85]:
# A:
crime_df.Dates[:5]

0    5/13/15 23:53
1    5/13/15 23:53
2    5/13/15 23:33
3    5/13/15 23:30
4    5/13/15 23:30
Name: Dates, dtype: object

In [86]:
dt = pd.to_datetime('5/13/15')
dt.year

2015

In [87]:
crime_df['hour'] = crime_df.Dates.map(lambda x: x[-5:-3])
crime_df['month'] = crime_df.Dates.map(lambda x: pd.to_datetime(x).month)
crime_df['year'] = crime_df.Dates.map(lambda x: pd.to_datetime(x).year)

### 3. Validate and clean the data.

In [88]:
crime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18000 entries, 0 to 17999
Data columns (total 12 columns):
Dates         18000 non-null object
Category      18000 non-null object
Descript      18000 non-null object
DayOfWeek     18000 non-null object
PdDistrict    18000 non-null object
Resolution    18000 non-null object
Address       18000 non-null object
X             18000 non-null float64
Y             18000 non-null float64
hour          18000 non-null object
month         18000 non-null int64
year          18000 non-null int64
dtypes: float64(2), int64(2), object(8)
memory usage: 1.6+ MB


In [89]:
# make column names small caps
crime_df.columns = [x.lower() for x in crime_df.columns]

In [90]:
# make category variables lower cap
crime_df.category = crime_df.category.map(lambda x: x.lower())

In [91]:
# A:
# remove the year column, zero variance
crime_df.drop(labels=['year'], axis=1, inplace=True)

### 4. Set up a target and predictor matrix for predicting violent crime vs. non-violent crime vs. non-crimes.

**Non-Violent Crimes:**
- bad checks
- bribery
- drug/narcotic
- drunkenness
- embezzlement
- forgery/counterfeiting
- fraud
- gambling
- liquor
- loitering 
- trespass

**Non-Crimes:**
- non-criminal
- runaway
- secondary codes
- suspicious occ
- warrants

**Violent Crimes:**
- everything else



**What type of model do you need here? What should your "baseline" category be?**

In [92]:
# A:
# Baseline category is Non-Violent Crimes
crime_df.category.unique()

array(['warrants', 'other offenses', 'larceny/theft', 'vehicle theft',
       'vandalism', 'non-criminal', 'robbery', 'assault', 'weapon laws',
       'burglary', 'suspicious occ', 'drunkenness',
       'forgery/counterfeiting', 'drug/narcotic', 'stolen property',
       'secondary codes', 'trespass', 'missing person', 'fraud',
       'kidnapping', 'runaway', 'driving under the influence',
       'sex offenses forcible', 'prostitution', 'disorderly conduct',
       'arson', 'family offenses', 'liquor laws', 'bribery',
       'embezzlement', 'assualt', 'suicide', 'loitering', 'trespassing',
       'sex offenses non forcible', 'extortion', 'gambling', 'bad checks'], dtype=object)

In [93]:
non_violent_crimes = ['bad checks','bribery','drug/narcotic','drunkenness','embezzlement','forgery/counterfeiting',
                     'fraud','gambling','liquor laws','loitering','trespassing']

non_crimes = ['non-criminal','runaway','secondary codes','suspicious occ','warrants']

violent_crimes = [x for x in crime_df.category.unique() if x not in non_violent_crimes and x not in non_crimes]

In [94]:
crime_df.category = crime_df.category.map(lambda x: 2 if x in non_violent_crimes else 1 if x in non_crimes else 0)

### 5. Standardize the predictor matrix

In [96]:
# A:
f = 'category ~ ' + ' + '.join([col for col in crime_df.columns if col != 'category'])
print(f)


category ~ dates + descript + dayofweek + pddistrict + resolution + address + x + y + hour + month


### 6. Find the optimal hyperparameters (optimal regularization) to predict your crime categories.

> **Note:** Gridsearching can be done with `GridSearchCV` or `LogisticRegressionCV`. They operate differently - the gridsearch object is more general and can be applied to any model. The `LogisticRegressionCV` is specific to tuning the logistic regression hyperparameters. I recommend the logistic regression one, but the downside is that lasso and ridge must be searched separately.

**Reference for logistic regression regularization hyperparameters:**
- `solver`: algorithm used for optimization (relevant for multiclass)
    - Newton-cg - Handles Multinomial Loss, L2 only
    - Sag - Handles Multinomial Loss, Large Datasets, L2 Only, Works best on sclaed data
    - lbfgs - Handles Multinomial Loss, L2 Only
    - Liblinear - Small Datasets, no Warm Starts
- `Cs`: Regularization strengths (smaller values are stronger penalties)
- `cv`: vross-validations or number of folds
- `penalty`: `'l1'` - LASSO, `'l2'` - Ridge 

In [8]:
# Example:
# fit model with five folds and lasso regularization
# use Cs=15 to test a grid of 15 distinct parameters
# remember: Cs describes the inverse of regularization strength

# logreg_cv = LogisticRegressionCV(solver='liblinear', 
#                                  Cs=[1,5,10], 
#                                  cv=5, penalty='l1')

**Split data into training and testing with 50% in testing.**

In [9]:
# A:

**Gridsearch hyperparameters for the training data.**

In [10]:
# A:

**Find the best parameters for each target class.**

In [11]:
# A:

**Build three logisitic regression models using the best parameters for each target class.**

In [12]:
# A:

### 7. Build confusion matrices for the models above
- Use the holdout test data from the train-test split

In [13]:
# A:

### 8. Print classification reports for your three models.

In [14]:
# A:

**Describe the metrics in the classification report.**

In [15]:
# A: