# Evaluate the performance of ML algos with Resampling

## Why is this _really_ needed?

See 'theory/intuition' lectures..

We are going to look at 4 different techniques that we can use to split up our training dataset and create useful estimates of performance for our ML algorithms:

1. Train and Test Sets
1. k-fold Cross-Validation
1. Leave One Out Cross-Validation
1. Repeated Random Test-Train Splits

## 1. Split into Train and Test Sets

Discussed in the lecture, and also in the hands-on.

This algorithm evaluation technique is very fast. It has pros and cons:
* _Pro_. It is ideal for large datasets (millions of records): splitting a large dataset into largish sub-datasets allows that that 1) each split of the data is **not too tiny**, and 2) both are **representative** of the underlying problem. Because of the speed, it is useful to use this approach when the algorithm you are investigating is slow to train. 
* _Con_. A downside of this technique is that it can have a **high variance**. This means that differences in the training and test dataset can result in meaningful differences in the estimate of accuracy.

In the example below we split our dataset into 67%/33% splits for training and test and evaluate the accuracy of a Logistic Regression model.

In [11]:
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [12]:
# data import
filename = 'pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = read_csv(filename, names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]

In [13]:
# prepare for the evaluation with a train and test set
test_size = 0.33
seed = 10

In [17]:
# Evaluate using a train and a test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()            # choose a model
model.fit(X_train, Y_train)             # train on the training set
result = model.score(X_test, Y_test)    # get accuracy as measured on the test set
print("Accuracy: %.3f%%" % (result*100.0))

Accuracy: 74.803%


**Exercise**: Try to change the seed, and re-traing. Does accuracy change? Is it reproducible? Can you measure its variance?

**Exercise**: What happens if I check accuracy on the _train_ set (conceptually wrong)? Do I see something different or not? What is the drawback if I do this mistake? (SOLUTION below)

In [18]:
#seed = 8              ###QUIZ
#test_size = 0.33      ###QUIZ 
# Evaluate using a train and a test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression()            # choose a model
model.fit(X_train, Y_train)             # train on the training set
result = model.score(X_train, Y_train)    # get accuracy as measured on the train set now! (for the ###QUIZ)
print("Accuracy: %.3f%%" % (result*100.0))

Accuracy: 78.405%


## 2. K-fold Cross-Validation

Discussed in the hands-on.

In [19]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score   # <---
from sklearn.linear_model import LogisticRegression

In [20]:
# Evaluate using Cross Validation
num_folds = 10
seed = 7
kfold = KFold(n_splits=num_folds, random_state=seed)
model = LogisticRegression()
results = cross_val_score(model, X, Y, cv=kfold)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 76.951% (4.841%)


## 3. Leave One Out Cross-Validation

Discussed in the hands-on.

In [23]:
from sklearn.model_selection import LeaveOneOut       # <---
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

In [24]:
# Evaluate using Leave One Out Cross Validation
loocv = LeaveOneOut()
model = LogisticRegression()
results = cross_val_score(model, X, Y, cv=loocv)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 76.823% (42.196%)


## 4. Repeated Random Test-Train Splits

In [25]:
from sklearn.model_selection import ShuffleSplit      # <---
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

In [26]:
# Evaluate using Shuffle Split Cross Validation
n_splits = 10
test_size = 0.33
seed = 7

kfold = ShuffleSplit(n_splits=n_splits, test_size=test_size, random_state=seed)
model = LogisticRegression()
results = cross_val_score(model, X, Y, cv=kfold)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 76.496% (1.698%)


## OK, fine, but.. what techniques to use when?!?

There are some tips to consider what resampling technique to use in different circumstances.

* Generally k-fold cross-validation is the gold standard for evaluating the performance of a ML algorithm on unseen data with k set to 3, 5, or 10.

* Using a train/test split is good for speed when using a slow algorithm and produces performance estimates with lower bias when using large datasets.

* Techniques like leave-one-out cross-validation and repeated random splits can be useful intermediates when trying to balance variance in the estimated performance, model training speed and dataset size.

The best advice is to experiment and find a technique for your problem that is fast and produces reasonable estimates of performance that you can use to make decisions. If in doubt, use 10-fold cross-validation.

## Summary

What we did:

* we discovered 4 statistical techniques that we can use to estimate the performance of ML algorithms, called Resampling. 

## What's next 

Now we will see how you can evaluate the performance of classification and regression algorithms using a suite of different metrics and built in evaluation reports.