# Chapter 06 - Algorithm Test Harnesses

A test harness provides a consistent way to evaluate machine learning algorithms on a dataset. It involves 3 elements:

1. The resampling method to split-up the dataset.
2. The machine learning algorithm to evaluate.
3. The performance measure by which to evaluate predictions.

In [7]:
from random import seed
from Codes.ch01_load_and_convert_data import load_csv, str_column_to_float
from Codes.ch03_resampling_methods import train_test_split, cross_validation_split
from Codes.ch04_evaluation_metrics import accuracy_metric
from Codes.ch05_baseline_models import zero_rule_algorithm_classification

### Train-Test Algorithm Test Harness

In [12]:
# Evaluate an algorithm using a train/test split
def evaluate_algorithm_train_test(dataset, algorithm, split, *args):
    train, test = train_test_split(dataset, split)
    test_set = list()
    for row in test:
        row_copy = list(row)
        row_copy[-1] = None
        test_set.append(row_copy)
    predicted = algorithm(train, test_set, *args)
    actual = [row[-1] for row in test]
    accuracy = accuracy_metric(actual, predicted)
    return accuracy

In [13]:
# Test the train/test harness 
seed(1)

# Load and prepare data
filename = './data/pima-indians-diabetes.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])):
    str_column_to_float(dataset, i)

# evaluate algorithm
split = 0.6
accuracy = evaluate_algorithm_train_test(dataset, zero_rule_algorithm_classification, split)
print('Accuracy: %.3f%%' % (accuracy))

Accuracy: 67.427%


### Cross-Validation Algorithm Test Harness

In [9]:
def evaluate_algorithm_kfold(dataset, algorithm, n_folds, *args):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = algorithm(train_set, test_set, *args)
        actual = [row[-1] for row in fold]
        accuracy = accuracy_metric(actual, predicted)
        scores.append(accuracy)
    return scores

In [11]:
# Test the train/test harness 
seed(1)

# Load and prepare data
filename = './data/pima-indians-diabetes.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])):
    str_column_to_float(dataset, i)

# Evaluate Algorithm
n_folds = 5
scores = evaluate_algorithm_kfold(dataset, zero_rule_algorithm_classification, n_folds)
print( 'Scores: %s' % scores)
print( 'Mean Accuracy: %.3f%%' % (sum(scores)/len(scores)))

Scores: [62.091503267973856, 64.70588235294117, 64.70588235294117, 64.70588235294117, 69.28104575163398]
Mean Accuracy: 65.098%


## Future Works

* Parameterized Evaluation. Pass in the function used to evaluate predictions, allowing
you to seamlessly work with regression problems.
* Parameterized Resampling. Pass in the function used to calculate resampling splits,
allowing you to easily switch between the train-test and cross-validation methods.
* Standard Deviation Scores. Calculate the standard deviation to get an idea of the
spread of scores when evaluating algorithms using cross-validation.