# Evaluate the Performance of Machine Learning Algorithms with Resampling

- The best way to evaluate the performance of an algorithm would be to make predictions for new data to which you already know the answers.
- The second best way is to use clever techniques from statistics called resampling methods that allow you to make accurate estimates for how well your algorithm will perform on new data.

### Why can't you train your machine learning algorithm on your dataset and use predictions from this same dataset to evaluate machine learning algorithms?
The simple answer is overfitting.

- Imagine an algorithm that remembers every observation it is shown during training.
- If you evaluated your machine learning algorithm on the same dataset used to train the algorithm, then an algorithm like this would have a perfect score on the training dataset.
- But the predictions it made on new data would be terrible. We must evaluate our machine learning algorithms on data that is not used to train the algorithm.

- The evaluation is an estimate that we can use to talk about how well we think the algorithm may actually do in practice. It is not a guarantee of performance.
- Once we estimate the performance of our algorithm, we can then re-train the nal algorithm on the entire training dataset and get it ready for operational use.

#### four different techniques that we can use to split up our training dataset and create useful estimates of performance for our machine learning algorithms:

 Train and Test Sets.
 k-fold Cross Validation.
 Leave One Out Cross Validation.
 Repeated Random Test-Train Splits.

### Train and Test Sets
pros:
1. This algorithm evaluation technique is very fast.
2. It is ideal for large datasets (millions of records) where there is strong evidence that both splits of the data are representative of the underlying problem.
3. Because of the speed, it is useful to use this approach when the algorithm you are investigating is slow to train.

cons:
1. downside of this technique is that it can have a high variance.
2. This means that differences in the training and test dataset can result in meaningful differences in the estimate of accuracy.

In [1]:
# Pima Indians Diabetes Dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [2]:
#Loading dataset
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv('pima-indians-diabetes.data',names=names)

In [3]:
# separate array into input and output components
X = df.drop('class',axis='columns')
Y = df['class']

In [4]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7)

In [6]:
model = LogisticRegression()
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print(result*100)

75.5905511811


## K-fold Cross Validation
- Cross validation is an approach that you can use to estimate the performance of a machine learning algorithm with less variance than a single train-test set split.
- It works by splitting the dataset into k-parts (e.g. k = 5 or k = 10).
- Each split of the data is called a fold. The algorithm is trained on k 􀀀 1 folds with one held back and tested on the held back fold. This is repeated so that each fold of the dataset is given a chance to be the held back test set.
- After running cross validation you end up with k dierent performance scores that you can summarize using a mean and a standard deviation.

1. The result is a more reliable estimate of the performance of the algorithm on new data. It is more accurate because the algorithm is trained and evaluated multiple times on dierent data.
2. The choice of k must allow the size of each test partition to be large enough to be a reasonable sample of the problem, whilst allowing enough repetitions of the train-test evaluation of the algorithm to provide a fair estimate of the algorithms performance on unseen data.
3. For modest sized datasets in the thousands or tens of thousands of records, k values of 3, 5 and 10 are common.

In [7]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [8]:
kfold = KFold(n_splits=10, random_state=7)
model2 = LogisticRegression()

In [10]:
result = cross_val_score(model2,X,Y,cv=kfold)

In [11]:
np.mean(result)*100

76.951469583048521

In [12]:
np.std(result)*100

4.8410519245671946

####  You can see that we report both the mean and the standard deviation of the performance measure. When summarizing performance measures, it is a good practice to summarize the distribution of the measures, in this case assuming a Gaussian distribution of performance (a very reasonable assumption) and recording the mean and standard deviation.

## Leave One Out Cross Validation
- You can congure cross validation so that the size of the fold is 1 (k is set to the number of observations in your dataset). This variation of cross validation is called leave-one-out cross validation.
- The result is a large number of performance measures that can be summarized in an eort to give a more reasonable estimate of the accuracy of your model on unseen data.
- A downside is that it can be a computationally more expensive procedure than k-fold cross validation. In the example below we use leave-one-out cross validation.

In [13]:
from sklearn.model_selection import LeaveOneOut

In [14]:
loocv = LeaveOneOut()
model = LogisticRegression()

In [15]:
results = cross_val_score(model, X, Y, cv=loocv)

In [16]:
(results.mean()*100.0, results.std()*100.0)

(76.953125, 42.113288315380629)

You can see in the standard deviation that the score has more variance than the k-fold cross
validation results described above.

## Repeated Random Test-Train Splits
Another variation on k-fold cross validation is to create a random split of the data like the
train/test split described above, but repeat the process of splitting and evaluation of the
algorithm multiple times, like cross validation. This has the speed of using a train/test split and
the reduction in variance in the estimated performance of k-fold cross validation. You can also
repeat the process many more times as needed to improve the accuracy. A down side is that
repetitions may include much of the same data in the train or the test split from run to run,
introducing redundancy into the evaluation.

In [17]:
from sklearn.model_selection import ShuffleSplit

In [18]:
kfold = ShuffleSplit()
model = LogisticRegression()
results = cross_val_score(model, X, Y, cv=kfold)

In [19]:
(results.mean()*100.0, results.std()*100.0)

(76.36363636363636, 4.5287261751071943)

We can see that in this case the distribution of the performance measure is on par with
k-fold cross validation above.

## What Techniques to Use When

- Generally k-fold cross validation is the gold standard for evaluating the performance of a machine learning algorithm on unseen data with k set to 3, 5, or 10.

- Using a train/test split is good for speed when using a slow algorithm and produces performance estimates with lower bias when using large datasets.

- Techniques like leave-one-out cross validation and repeated random splits can be useful intermediates when trying to balance variance in the estimated performance, model training speed and dataset size.

The best advice is to experiment and nd a technique for your problem that is fast and
produces reasonable estimates of performance that you can use to make decisions. If in doubt,
use 10-fold cross validation.