# Model Evaluation and Validation

In [5]:
from IPython.display import HTML, Image

## Goals

> Goals
- How to create a test set for your models.
- How to use confusion matrixes to evaluate false positives, and false negatives.
- How to measure accuracy and other model metrics.
- How to evaluate regression.
- How to detect whether you are overfitting or underfitting based on the complexity of your model.
- How to use cross validation to ensure your model is generalizable.

## Overfitting 

In [1]:
# see Video at 1:48s
from IPython.display import HTML
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/2GeMWgmx1rk?ecver=1" frameborder="0" allowfullscreen></iframe>')

## Why testing: training and test set

In [3]:
# see Video at 2:42s for linear regression
# see at 3:39 for classification
from IPython.display import HTML
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/2GeMWgmx1rk?ecver=1" frameborder="0" allowfullscreen></iframe>')

## How to train-test-split using sklearn

In [2]:
# see at 4:16s for code
# never use testing set to train models
from IPython.display import HTML
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/2GeMWgmx1rk?ecver=1" frameborder="0" allowfullscreen></iframe>')

In [4]:
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/2IOMeKnzSqI?ecver=1" frameborder="0" allowfullscreen></iframe>')

Confusion Matrix Quiz
What is the number of True Positives, False Negatives, False Positives, and True Negatives, in the model above? Please enter your answer as four numbers separated by a comma and a space. For example, if your answers are 1, 2, 3, and 4, enter the string 1, 2, 3, 4.
**answer: 6, 1, 2, 5**

> Key: 
- True Positive: you are right with prediction of being positive
- False Positive: You are wrong with prediction of being positive

## What is accuracy?

In [8]:
# see accuracy in graph and in sklearn at 0:38s
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/47-gJEyM9Ro?ecver=1" frameborder="0" allowfullscreen></iframe>')

In [7]:
Image(width = 500, height = 300, url="https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58950c4e_accuracy/accuracy.png")

Accuracy Quiz
What is the accuracy of the model above? Please enter the answer as a percentage, with two decimals. For example, 54.75.

In [9]:
TP = 6
TN = 5
Total = 14
accuracy = (TP + TN)/Total

In [10]:
accuracy

0.7857142857142857

## How to evaluate Regression?

In [11]:
# MAE: mean absolute error with code 0:40s (not differentiable, can't take derivative)
# MSE: mean squared error (for gradient descent) with code 1:09s
# R2 Score: start 1:10s "compare our model to the simplest possible model" using MSE 
# calc R2 in graph: 1:51s = 1- MSE(our model)/MSE(simplest model)
# if our model is no better than simple model, R2 = approaching 0; 
# if our model is a lot better than simple model, R2 = approaching 1
# R2 in sklearn code 2:35s
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/qbjqCBjzpLc?ecver=1" frameborder="0" allowfullscreen></iframe>')

## How to understand underfit vs overfit, bias vs variance

In [12]:
# Types of Errors
# 1. underfitting: oversimplify problem, don't do well on training set, 
# error by bias or poor assumptions (before 1:19s)
# 2. overfitting: overcomplicate problem. Force to memorise every bit detial of training, as
# a result, our model become too specific or complex, don't generalize in test set (before 2:10s)
# 3. bias(underfitting): underlying model is complex, but assumed model is simple without
# study hard on trainingset, so it won't do well on training set
# 4. variance(overfitting): forced model to remember and try to model every detail of training
# as a result, a little change in training set, cause huge change (variance) in model
# 5. Tradeoff: (3:20s) underfit = high bias vs overfit = high variance
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/xYhpWmaL4F4?ecver=1" frameborder="0" allowfullscreen></iframe>')

## Model Complexity Graph

In [13]:
# How to detect error (0:34s)
# simple solution: use our model for training and test sets, 
# find num of training and test error
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/QAmqhR6bKWA?ecver=1" frameborder="0" allowfullscreen></iframe>')

In [14]:
Image(width=500, height=300, url="https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58950d4e_mcg/mcg.png")

Model Complexity Graph Quiz
In the model above, how many training and testing errors are there? Please enter your answer as a string composed of the two numbers, separated by a comma and a space. For example, if you find 1 training error and 4 testing errors, your answer should be 1, 4. **0, 2**

In [15]:
# explain the model complexity graph: compare unfit model, just right, overfit model (0:49s)
# validation set: use to help decide on how good is model trained on training set (2:12s)
# underfit, overfit modesl vs just right model on performance on trainign and test set (3:00s)
# see Model Complexity Graph, x-axis is level of model complexity, y-axis is error 3:37s
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/-j0d72ixSVA?ecver=1" frameborder="0" allowfullscreen></iframe>')

## Why K-fold-crossvalidation?

In [16]:
# 1. try to keep as many data to training set as possible
# 2. must randomize the whole dataset first, avoid some meaningful order to remove bias
# 3. KFold with shuffle in sklearn code (1:08s)
HTML('<iframe width="500" height="300" src="https://www.youtube.com/embed/zva46jKINis?ecver=1" frameborder="0" allowfullscreen></iframe>')