# Model Comparison Lab

In this lab we will compare the performance of all the models we have learned about so far, using the car evaluation dataset.

## 1. Prepare the data

The [car evaluation dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/car/) is in the assets/datasets folder. By now you should be very familiar with this dataset.

1. Load the data into a pandas dataframe
- Encode the categorical features properly: define a map that preserves the scale (assigning smaller numbers to words indicating smaller quantities)
- Separate features from target into X and y

In [108]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, ExtraTreesClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
%matplotlib inline

In [38]:
df = pd.read_csv('./../../assets/datasets/car.csv')
df.head()

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,acceptability
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [39]:
X = pd.get_dummies(df.drop('acceptability', axis=1))
le = LabelEncoder()
y = le.fit_transform(df['acceptability'])

X.head()

Unnamed: 0,buying_high,buying_low,buying_med,buying_vhigh,maint_high,maint_low,maint_med,maint_vhigh,doors_2,doors_3,...,doors_5more,persons_2,persons_4,persons_more,lug_boot_big,lug_boot_med,lug_boot_small,safety_high,safety_low,safety_med
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0


## 2. Useful preparation

Since we will compare several models, let's write a couple of helper functions.

1. Separate X and y between a train and test set, using 30% test set, random state = 42
    - make sure that the data is shuffled and stratified
2. Define a function called `evaluate_model`, that trains the model on the train set, tests it on the test, calculates:
    - accuracy score
    - confusion matrix
    - classification report
3. Initialize a global dictionary to store the various models for later retrieval


In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.33, random_state=42)

In [41]:
def evaluate_model(model):
    # create model object
    mod = model
    # fit model
    mod.fit(X_train, y_train)
    y_pred = mod.predict(X_test)
    print 'Accuracy score:', mod.score(X_test, y_test)
    con = pd.DataFrame(confusion_matrix(y_pred, y_test))
    con.columns = ['acc', 'good', 'unacc', 'vgood']
    con.index = ['pred_acc', 'pred_good', 'pred_unacc', 'pred_vgood']
    print con
    print classification_report(y_pred, y_test)
    return mod

models = {}

## 3.a KNN

Let's start with `KNeighborsClassifier`.

1. Initialize a KNN model
- Evaluate it's performance with the function you previously defined
- Find the optimal value of K using grid search
    - Be careful on how you perform the cross validation in the grid search

In [42]:
models['KNN'] = evaluate_model(KNeighborsClassifier())

Accuracy score: 0.894921190893
            acc  good  unacc  vgood
pred_acc     99    12      9      2
pred_good     3     8      0      2
pred_unacc   25     2    391      4
pred_vgood    0     1      0     13
             precision    recall  f1-score   support

          0       0.78      0.81      0.80       122
          1       0.35      0.62      0.44        13
          2       0.98      0.93      0.95       422
          3       0.62      0.93      0.74        14

avg / total       0.91      0.89      0.90       571



In [49]:
pg = {'n_neighbors': [i for i in range(1,20)]}
gs = GridSearchCV(models['KNN'], param_grid=pg, cv=5)
gs.fit(X_train, y_train)
gs.best_params_

{'n_neighbors': 8}

## 3.b Bagging + KNN

Now that we have found the optimal K, let's wrap `KNeighborsClassifier` in a BaggingClassifier and see if the score improves.

1. Wrap the KNN model in a Bagging Classifier
- Evaluate performance
- Do a grid search only on the bagging classifier params

In [55]:
models['bagging_knn'] = evaluate_model(BaggingClassifier(models['KNN'].set_params(n_neighbors=8)))

Accuracy score: 0.907180385289
            acc  good  unacc  vgood
pred_acc    100    12      8      4
pred_good     0    11      0      2
pred_unacc   27     0    392      0
pred_vgood    0     0      0     15
             precision    recall  f1-score   support

          0       0.79      0.81      0.80       124
          1       0.48      0.85      0.61        13
          2       0.98      0.94      0.96       419
          3       0.71      1.00      0.83        15

avg / total       0.92      0.91      0.91       571



In [69]:
pg = {'n_estimators': [4, 8, 10, 12, 15],
     'bootstrap': [True, False],
     'bootstrap_features': [True, False]}
def grid_search(model, pg):
    gs = GridSearchCV(model, param_grid=pg, cv=5)
    gs.fit(X_train, y_train)
    print gs.best_params_
grid_search(models['bagging_knn'], pg)

{'n_estimators': 4, 'bootstrap': False, 'bootstrap_features': False}


In [70]:
evaluate_model(models['bagging_knn'].set_params(bootstrap=False, bootstrap_features=False, n_estimators=4))

Accuracy score: 0.910683012259
            acc  good  unacc  vgood
pred_acc    112    11     16      4
pred_good     2    11      0      2
pred_unacc   13     1    384      2
pred_vgood    0     0      0     13
             precision    recall  f1-score   support

          0       0.88      0.78      0.83       143
          1       0.48      0.73      0.58        15
          2       0.96      0.96      0.96       400
          3       0.62      1.00      0.76        13

avg / total       0.92      0.91      0.91       571



BaggingClassifier(base_estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=8, p=2,
           weights='uniform'),
         bootstrap=False, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=4, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

## 4. Logistic Regression

Let's see if logistic regression performs better

1. Initialize LR and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

In [71]:
models['logreg'] = evaluate_model(LogisticRegression())

Accuracy score: 0.879159369527
            acc  good  unacc  vgood
pred_acc    102    16     16     11
pred_good     4     6      0      0
pred_unacc   21     0    384      0
pred_vgood    0     1      0     10
             precision    recall  f1-score   support

          0       0.80      0.70      0.75       145
          1       0.26      0.60      0.36        10
          2       0.96      0.95      0.95       405
          3       0.48      0.91      0.62        11

avg / total       0.90      0.88      0.89       571



In [74]:
pg = {
    'penalty': ['l1', 'l2'],
    'C': [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0]
}
grid_search(models['logreg'], pg)

{'penalty': 'l1', 'C': 1000.0}


In [75]:
models['bagging_logreg'] = evaluate_model(BaggingClassifier(models['logreg'].set_params(penalty='l1', C=1000.0)))

Accuracy score: 0.893169877408
            acc  good  unacc  vgood
pred_acc    100    14     16      4
pred_good     7     9      0      0
pred_unacc   17     0    384      0
pred_vgood    3     0      0     17
             precision    recall  f1-score   support

          0       0.79      0.75      0.77       134
          1       0.39      0.56      0.46        16
          2       0.96      0.96      0.96       401
          3       0.81      0.85      0.83        20

avg / total       0.90      0.89      0.90       571



In [76]:
pg = {'n_estimators': [4, 8, 10, 12, 15],
     'bootstrap': [True, False],
     'bootstrap_features': [True, False]}
grid_search(models['bagging_logreg'], pg)

{'n_estimators': 12, 'bootstrap': True, 'bootstrap_features': False}


In [77]:
evaluate_model(models['bagging_logreg'].set_params(n_estimators=12, bootstrap=True, bootstrap_features=False))

Accuracy score: 0.900175131349
            acc  good  unacc  vgood
pred_acc    101    14     14      3
pred_good     6     9      0      0
pred_unacc   17     0    386      0
pred_vgood    3     0      0     18
             precision    recall  f1-score   support

          0       0.80      0.77      0.78       132
          1       0.39      0.60      0.47        15
          2       0.96      0.96      0.96       403
          3       0.86      0.86      0.86        21

avg / total       0.91      0.90      0.90       571



BaggingClassifier(base_estimator=LogisticRegression(C=1000.0, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l1', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=12, n_jobs=1, oob_score=False,
         random_state=None, verbose=0, warm_start=False)

## 5. Decision Trees

Let's see if Decision Trees perform better

1. Initialize DT and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

In [79]:
models['dec_tree'] = evaluate_model(DecisionTreeClassifier())

Accuracy score: 0.957968476357
            acc  good  unacc  vgood
pred_acc    121     5     10      1
pred_good     2    18      0      2
pred_unacc    4     0    390      0
pred_vgood    0     0      0     18
             precision    recall  f1-score   support

          0       0.95      0.88      0.92       137
          1       0.78      0.82      0.80        22
          2       0.97      0.99      0.98       394
          3       0.86      1.00      0.92        18

avg / total       0.96      0.96      0.96       571



In [81]:
pg = {
    'criterion': ['gini', 'entropy'],
    'max_features': [None, 1, 2, 3],
    'max_depth': [None, 4, 5, 6, 7, 8, 9],
    'max_leaf_nodes': [None, 4, 5, 6, 7, 8, 9]
}
grid_search(models['dec_tree'], pg)

{'max_features': None, 'max_leaf_nodes': None, 'criterion': 'entropy', 'max_depth': None}


In [93]:
models['dec_tree'] = evaluate_model(models['dec_tree'].set_params(criterion='entropy'))

Accuracy score: 0.968476357268
            acc  good  unacc  vgood
pred_acc    120     3      7      1
pred_good     1    20      0      0
pred_unacc    6     0    393      0
pred_vgood    0     0      0     20
             precision    recall  f1-score   support

          0       0.94      0.92      0.93       131
          1       0.87      0.95      0.91        21
          2       0.98      0.98      0.98       399
          3       0.95      1.00      0.98        20

avg / total       0.97      0.97      0.97       571



In [94]:
evaluate_model(BaggingClassifier(models['dec_tree']))

Accuracy score: 0.977232924694
            acc  good  unacc  vgood
pred_acc    123     1      7      0
pred_good     0    22      0      1
pred_unacc    3     0    393      0
pred_vgood    1     0      0     20
             precision    recall  f1-score   support

          0       0.97      0.94      0.95       131
          1       0.96      0.96      0.96        23
          2       0.98      0.99      0.99       396
          3       0.95      0.95      0.95        21

avg / total       0.98      0.98      0.98       571



In [95]:
pg = {'n_estimators': [4, 8, 10, 12, 15],
     'bootstrap': [True, False],
     'bootstrap_features': [True, False]}
grid_search(models['bagging_dec_tree'], pg)

{'n_estimators': 10, 'bootstrap': False, 'bootstrap_features': False}


In [96]:
models['bagging_dec_tree'] = evaluate_model(BaggingClassifier(models['dec_tree'], n_estimators=10, 
                                                              bootstrap=False, bootstrap_features=False))

Accuracy score: 0.977232924694
            acc  good  unacc  vgood
pred_acc    122     0      7      1
pred_good     2    23      0      0
pred_unacc    3     0    393      0
pred_vgood    0     0      0     20
             precision    recall  f1-score   support

          0       0.96      0.94      0.95       130
          1       1.00      0.92      0.96        25
          2       0.98      0.99      0.99       396
          3       0.95      1.00      0.98        20

avg / total       0.98      0.98      0.98       571



## 6. Support Vector Machines

Let's see if SVM perform better

1. Initialize SVM and test on Train/Test set
- Find optimal params with Grid Search
- See if Bagging improves the score

In [101]:
models['svc'] = evaluate_model(SVC())

Accuracy score: 0.903677758319
            acc  good  unacc  vgood
pred_acc    124    22     17     12
pred_good     0     0      0      0
pred_unacc    3     0    383      0
pred_vgood    0     1      0      9
             precision    recall  f1-score   support

          0       0.98      0.71      0.82       175
          1       0.00      0.00      0.00         0
          2       0.96      0.99      0.97       386
          3       0.43      0.90      0.58        10

avg / total       0.95      0.90      0.92       571



In [103]:
pg = {
    'C': [0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0],
    'kernel': ['rbf', 'linear', 'poly'],
    'degree': [1, 2, 3, 4, 5]
}
grid_search(models['svc'], pg)

{'kernel': 'poly', 'C': 1000.0, 'degree': 3}


In [104]:
models['svc'] = evaluate_model(SVC(kernel='poly', C=1000.0, degree=3))

Accuracy score: 0.998248686515
            acc  good  unacc  vgood
pred_acc    127     1      0      0
pred_good     0    22      0      0
pred_unacc    0     0    400      0
pred_vgood    0     0      0     21
             precision    recall  f1-score   support

          0       1.00      0.99      1.00       128
          1       0.96      1.00      0.98        22
          2       1.00      1.00      1.00       400
          3       1.00      1.00      1.00        21

avg / total       1.00      1.00      1.00       571



In [105]:
models['bagging_svc'] = evaluate_model(BaggingClassifier(models['svc']))

Accuracy score: 0.991243432574
            acc  good  unacc  vgood
pred_acc    127     3      1      1
pred_good     0    20      0      0
pred_unacc    0     0    399      0
pred_vgood    0     0      0     20
             precision    recall  f1-score   support

          0       1.00      0.96      0.98       132
          1       0.87      1.00      0.93        20
          2       1.00      1.00      1.00       399
          3       0.95      1.00      0.98        20

avg / total       0.99      0.99      0.99       571



In [106]:
pg = {'n_estimators': [4, 8, 10, 12, 15],
     'bootstrap': [True, False],
     'bootstrap_features': [True, False]}
grid_search(models['bagging_svc'], pg)

{'n_estimators': 4, 'bootstrap': False, 'bootstrap_features': False}


In [107]:
models['bagging_svc'] = evaluate_model(BaggingClassifier(models['svc'], n_estimators=4, 
                                                              bootstrap=False, bootstrap_features=False))

Accuracy score: 0.998248686515
            acc  good  unacc  vgood
pred_acc    127     1      0      0
pred_good     0    22      0      0
pred_unacc    0     0    400      0
pred_vgood    0     0      0     21
             precision    recall  f1-score   support

          0       1.00      0.99      1.00       128
          1       0.96      1.00      0.98        22
          2       1.00      1.00      1.00       400
          3       1.00      1.00      1.00        21

avg / total       1.00      1.00      1.00       571



## 7. Random Forest & Extra Trees

Let's see if Random Forest and Extra Trees perform better

1. Initialize RF and ET and test on Train/Test set
- Find optimal params with Grid Search

In [110]:
models['rand_for'] = evaluate_model(RandomForestClassifier())

Accuracy score: 0.933450087566
            acc  good  unacc  vgood
pred_acc    117    11     11      3
pred_good     0    10      0      1
pred_unacc    9     0    389      0
pred_vgood    1     2      0     17
             precision    recall  f1-score   support

          0       0.92      0.82      0.87       142
          1       0.43      0.91      0.59        11
          2       0.97      0.98      0.97       398
          3       0.81      0.85      0.83        20

avg / total       0.94      0.93      0.94       571



In [111]:
pg = {
    'bootstrap': [True, False],
    'criterion':['gini', 'entropy'],
    'max_depth': [None, 3, 5, 7, 9, 15]
}
grid_search(models['rand_for'], pg)

{'bootstrap': False, 'criterion': 'entropy', 'max_depth': None}


In [112]:
models['rand_for'] = evaluate_model(RandomForestClassifier(bootstrap=False, criterion='entropy'))

Accuracy score: 0.947460595447
            acc  good  unacc  vgood
pred_acc    115     5     10      2
pred_good     2    18      0      1
pred_unacc    9     0    390      0
pred_vgood    1     0      0     18
             precision    recall  f1-score   support

          0       0.91      0.87      0.89       132
          1       0.78      0.86      0.82        21
          2       0.97      0.98      0.98       399
          3       0.86      0.95      0.90        19

avg / total       0.95      0.95      0.95       571



In [113]:
models['et'] = evaluate_model(ExtraTreesClassifier())

Accuracy score: 0.956217162872
            acc  good  unacc  vgood
pred_acc    123     6     13      1
pred_good     0    16      0      0
pred_unacc    4     0    387      0
pred_vgood    0     1      0     20
             precision    recall  f1-score   support

          0       0.97      0.86      0.91       143
          1       0.70      1.00      0.82        16
          2       0.97      0.99      0.98       391
          3       0.95      0.95      0.95        21

avg / total       0.96      0.96      0.96       571



In [114]:
pg = {
    'bootstrap': [True, False],
    'criterion':['gini', 'entropy'],
    'max_depth': [None, 3, 5, 7, 9, 15]
}
grid_search(models['et'], pg)

{'bootstrap': False, 'criterion': 'entropy', 'max_depth': 15}


In [115]:
models['et'] = evaluate_model(ExtraTreesClassifier(bootstrap=False, criterion='entropy', max_depth=15))

Accuracy score: 0.952714535902
            acc  good  unacc  vgood
pred_acc    118     4     13      1
pred_good     0    19      0      0
pred_unacc    8     0    387      0
pred_vgood    1     0      0     20
             precision    recall  f1-score   support

          0       0.93      0.87      0.90       136
          1       0.83      1.00      0.90        19
          2       0.97      0.98      0.97       395
          3       0.95      0.95      0.95        21

avg / total       0.95      0.95      0.95       571



## 8. Model comparison

Let's compare the scores of the various models.

1. Do a bar chart of the scores of the best models. Who's the winner on the train/test split?
- Re-test all the models using a 3 fold stratified shuffled cross validation
- Do a bar chart with errorbars of the cross validation average scores. is the winner the same?


## Bonus

We have encoded the data using a map that preserves the scale.
Would our results have changed if we had encoded the categorical data using `pd.get_dummies` or `OneHotEncoder`  to encode them as binary variables instead?

1. Repeat the analysis for this scenario. Is it better?
- Experiment with other models or other parameters, can you beat your classmates best score?