## Summary: Compare model results and final model selection

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will do the following:
1. Evaluate all of our saved models on the validation set
2. Select the best model based on performance on the validation set
3. Evaluate that model on the holdout test set

### Read in Data

In [16]:
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score
from time import time

# Feature and Label columns
feature_columns = ['Pclass', 'Gender', 'Age', 'Fare', 'Family_count', 'Cabin_index']
label_column = ['Survival']

# Validation set (features and labels)
val_features = pd.read_csv('dataset/validation_dataset.csv', usecols=feature_columns)
val_labels = pd.read_csv('dataset/validation_dataset.csv', usecols=label_column)

# Test set (features and labels)
te_features = pd.read_csv('dataset/test_dataset.csv', usecols=feature_columns)
te_labels = pd.read_csv('dataset/test_dataset.csv', usecols=label_column)

### Read in Models

In [17]:
models = {}

# load lr and svm models to a dictionary
for mdl in ['lr', 'svm']:
    models[mdl] = joblib.load('models/{}_model.pkl'.format(mdl))

In [18]:
models

{'lr': LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True,
                    intercept_scaling=1, l1_ratio=None, max_iter=1000,
                    multi_class='auto', n_jobs=None, penalty='l2',
                    random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                    warm_start=False),
 'svm': SVC(C=0.1, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
     decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
     max_iter=-1, probability=False, random_state=None, shrinking=True,
     tol=0.001, verbose=False)}

### Evaluate models on the validation set

![Evaluation Metrics](img/evaluation/eval_metrics.png)

In [19]:
def evaluate_model(name, model, features, labels):
    start = time()
    # prediction
    pred = model.predict(features)
    end = time()
    accuracy = round(accuracy_score(labels, pred), 3)
    precision = round(precision_score(labels, pred), 3)
    recall = round(recall_score(labels, pred), 3)
    print('{} -- Accuracy: {} / Precision: {} / Recall: {} / Latency: {}ms'.format(name,
                                                                                   accuracy,
                                                                                   precision,
                                                                                   recall,
                                                                                   round((end - start)*1000, 2)))

In [21]:
# Lets evaluate on the validation set
for name, mdl in models.items():
    evaluate_model(name, mdl, val_features, val_labels)

lr -- Accuracy: 0.827 / Precision: 0.846 / Recall: 0.724 / Latency: 1.0ms
svm -- Accuracy: 0.799 / Precision: 0.794 / Recall: 0.711 / Latency: 4.01ms


### Evaluate best model on test set

In [22]:
evaluate_model('Linear Regression', models['lr'], te_features, te_labels)

Linear Regression -- Accuracy: 0.775 / Precision: 0.712 / Recall: 0.646 / Latency: 2.0ms
