# Evaluate the performance of ML algos with Resampling

We are going to look at 4 different techniques that we can use to split up our training dataset and create useful estimates of performance for our ML algorithms:

1. Split into Train and Test Sets
1. k-fold Cross-Validation
1. Leave One Out Cross-Validation
1. Repeated Random Test-Train Splits

## 0. Import the data

In [4]:
import pandas as pd

url = 'https://raw.githubusercontent.com/dbonacorsi/AMLBas2122/main/datasets/pima-indians-diabetes.data.csv'

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = pd.read_csv(url, names=names)
data

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


## 1. Split into Train and Test Sets

The simplest method that we can use to evaluate the performance of a ML algorithm is to separate the dataset, and use (at least) different training and testing datasets (e.g. 2/3 and 1/3, but choices may vary).

This algorithm evaluation technique is very fast, and has pros and cons:
* _Pro_. Ideal for large datasets. Fast (so use it for algos slow in training)
* _Con_. High variance.





In [5]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [6]:
array = data.values
X = array[:,0:8]
Y = array[:,8]

In [7]:
test_size = 0.33
seed = 7

In [11]:
%%time
# Evaluate using a train and a test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
#model = LogisticRegression()   
model = LogisticRegression(solver='lbfgs', max_iter=500)         
model.fit(X_train, Y_train)             
result = model.score(X_test, Y_test)    
print("Accuracy: %.3f%%" % (result*100.0))

Accuracy: 78.740%
CPU times: user 42.4 ms, sys: 0 ns, total: 42.4 ms
Wall time: 58.4 ms


---

### <font color='red'>Exercise 1</font>

Try to change the seed, and re-train. Does accuracy change? Is it reproducible for a a fixed seed? for different seeds, could you measure its variance? (up to your curiosity here, but no need to do more here than just few tries and get a feeling.. but you can do more and clever tests..)

In [None]:
# type your code below

---

### <font color='red'>Exercise 2</font>

What happens if I check accuracy on the _train_ set (conceptually wrong)? Do I see something different or not? What is the drawback if I do this mistake?

In [None]:
# type your code below

---

### <font color='red'>Exercise 3</font>

What if change the training/test ratio?

In [None]:
# type your code below

---

## 2. K-fold Cross-Validation

It works by **splitting the dataset into k-parts** (e.g. $k=5$ or $k=10$). Each split of the data is called a $fold$. The algorithm is trained on $k-1$ folds (with 1 held back), and then tested on the held-back fold. This is also repeated, so that _each_ fold of the dataset is given a chance to be the held-back test set. So you repeat it k times. After running cross-validation you end up with $k$ different performance scores that you can summarize using a mean and a standard deviation. 


**The choice of $k$ is a trade-off** between reasonably large size of each test partition, and a number that allows enough repetitions of the train-test evaluation of the algorithm.

**$k$ values of $3$, $5$ and $10$ are common** (at least for modest-size datasets in the thousands or tens of thousands of records). In the example below we use 10-fold cross-validation.

In [12]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score   # <---
from sklearn.linear_model import LogisticRegression

In [15]:
# Evaluate using Cross Validation
num_folds = 5
seed = 7

kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
#model = LogisticRegression()
model = LogisticRegression(solver='lbfgs', max_iter=500)
results = cross_val_score(model, X, Y, cv=kfold)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 77.471% (1.913%)


You can see that we report both the mean and the standard deviation of the performance measure.


---

### <font color='red'>Exercise 4</font>

<div class="alert alert-block alert-info">
What if I change the nb folds?
</div>

In [None]:
# type your code below

## 3. Leave One Out Cross-Validation

You can configure cross-validation so that the size of the fold is 1 ($k=n$, i.e. $k$ is set to the number of observations in your dataset). 

In [16]:
from sklearn.model_selection import LeaveOneOut       # <---
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

In [17]:
# Evaluate using Leave One Out Cross Validation
loocv = LeaveOneOut()
#model = LogisticRegression()
model = LogisticRegression(solver='lbfgs', max_iter=500)
results = cross_val_score(model, X, Y, cv=loocv)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

Accuracy: 77.604% (41.689%)


(*NOTE: probably not so visible in this small example, but the time it took to run this is larger than the previous one..*)

You can see in the standard deviation that the score has **higher variance** than the k-fold cross-validation results described above.

## 4. Repeated Random Test-Train Splits

Another variation on k-fold cross-validation is to **create a random split of the data** like the train/test split described above, but **repeat multiple times the process of splitting and evaluation of the algorithm**, like cross-validation.

The example below splits the data into a 67%/33% train/test split and repeats the process 10 times.

In [None]:
from sklearn.model_selection import ShuffleSplit      # <---
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

In [None]:
# Evaluate using Shuffle Split Cross Validation
n_splits = 100
test_size = 0.33
seed = 7

kfold = ShuffleSplit(n_splits=n_splits, test_size=test_size, random_state=seed)
#model = LogisticRegression()
model = LogisticRegression(solver='lbfgs', max_iter=500)
results = cross_val_score(model, X, Y, cv=kfold)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

We can see that in this case the distribution of the performance measure is on par with
k-fold cross-validation above.

## OK, fine, but.. what techniques to use when?!?

Discussion at the lecture.

## Summary

What we did:

* we discovered 4 statistical techniques that we can use to estimate the performance of ML algorithms, called Resampling. 

## What's next 

Now we will see how you can evaluate the performance of classification and regression algorithms using a suite of different metrics and built in evaluation reports.