## Chapter 9 Evaluate the Performance of Machine Learning Algorithms with Resampling



In [19]:
import pandas as pd
from sklearn.model_selection import train_test_split, KFold, cross_val_score, LeaveOneOut, ShuffleSplit
from sklearn.linear_model import LogisticRegression

#### 1. Split into Train and Test Sets

In [9]:
# evaluate using a train and a test set
filename = 'data/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(filename, names=names)
array = df.values
X = array[:,:-1]
Y = array[:,-1]
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
model = LogisticRegression(max_iter=200)
model.fit(X_train, Y_train)
result = model.score(X_test, Y_test)
print(f'Accuracy: {result*100:.3f}%')

Accuracy: 78.740%


#### 2. k-fold Cross Validation

Cross validation is an approach that you can use to estimate the performance of a machine learning algorithm with less variance than a single train-test set split.

It works by splitting the dataset into k-parts (e.g. k = 5 or k = 10). Each split of the data is called a fold. The algorithm is trained on k− 1 folds with one held back and tested on the held back fold. This is repeated so that each fold of the dataset is given a chance to be the held back test set. After
running cross validation you end up with k diﬀerent performance scores that you can summarize using a mean and a standard deviation.

In [15]:
# evaluate using cross validation
num_folds = 10
seed = 7
kfold = KFold(n_splits=num_folds, shuffle=True, random_state=seed)
model = LogisticRegression(max_iter=200)
results = cross_val_score(model, X, Y, cv=kfold)
print(f'Accuracy: mean: {results.mean()*100.0:.3f}%, std: {results.std()*100.0:.3f}%')

Accuracy: mean: 77.216%, std: 4.968%


#### 3. Leave One Out Cross Validation

You can configure cross validation so that the size of the fold is 1 (k is set to the number of observations in your dataset). This variation of cross validation is called leave-one-out cross validation. The result is a large number of performance measures that can be summarized in an eﬀort to give a more reasonable estimate of the accuracy of your model on unseen data.
A downside is that it can be a computationally more expensive procedure than k-fold cross validation.

In [18]:
# evaluate using leave-one-out cross validation
loocv = LeaveOneOut()
model = LogisticRegression(max_iter=300)
results = cross_val_score(model, X, Y, cv=loocv)
print(f'Accuracy: mean: {results.mean()*100.0:.3f}%, std: {results.std()*100.0:.3f}%')

Accuracy: mean: 77.604%, std: 41.689%


You can see in the standard deviation that the score has more variance than the k-fold cross validation results described above.

#### 4. Repeated Random Test-Train Splits

Another variation on k-fold cross validation is to create a random split of the data like the train/test split described above, but repeat the process of splitting and evaluation of the algorithm multiple times, like cross validation. This has the speed of using a train/test split and
the reduction in variance in the estimated performance of k-fold cross validation. You can also repeat the process many more times as needed to improve the accuracy. A down side is that repetitions may include much of the same data in the train or the test split from run to run, introducing redundancy into the evaluation.

In [22]:
# evaluate using shuffle split cross validation
n_splits = 10
test_size = 0.33
seed = 7
kfold = ShuffleSplit(n_splits=n_splits, test_size=test_size, random_state=seed)
model = LogisticRegression(max_iter=200)
results = cross_val_score(model, X, Y, cv=kfold)
print(f'Accuracy: mean: {results.mean()*100.0:.3f}%, std: {results.std()*100.0:.3f}%')

Accuracy: mean: 76.535%, std: 2.235%


We can see that in this case the distribution of the performance measure is on par with k-fold cross validation above.

#### What Techniqes to Use When

- Generally *k-fold cross validation is the gold standard* for evaluating the performance of a machine learning algorithm on unseen data with k set to 3, 5, or 10.
- Using a train/test split is good for speed when using a slow algorithm and produces performance estimates with lower bias when using large datasets.
- Techniques like leave-one-out cross validation and repeated random splits can be useful intermediates when trying to balance variance in the estimated performance, model training speed and dataset size.

The best advice is to experiment and find a technique for your problem that is fast and produces reasonable estimates of performance that you can use to make decisions. If in doubt, use 10-fold cross validation.