# Classifier Evaluation using Bootstrap
Bootstrap can also be used to evaluate a classifier. Given the original data $D$, k datasets $D_i$ are generated from $D$ and used to train a model $M_i$ that is evaluated using the entire dataset $D$ returning the evaluation $\theta_i$. The overall evaluation is computed as the mean and standard deviation of the $k$ values of $\theta_i$. Note however that the estimates will be somewhat optimistic becvause of the overlap between the training (generated by bootstrap resampling) and the testing performed on the entire dataset (63.2%). Cross-validation does not suffer from this limitation since it keeps the training and testing sets disjoint.

In [4]:
import numpy as np
import pandas as pd
import random

np.random.seed(238476293)

In [5]:
def bootstrap(X, y, ratio=1.0):
    
    # compute the number of rows of the generated dataset
    n_rows = int(X.shape[0]*ratio)
    
    # compute the number of columns
    n_cols = X.shape[1]
    
    # create the output dataset with all zero
    sampled_X = np.zeros((n_rows,n_cols))
    sampled_y = np.zeros(n_rows)
        
    # randomly select a row from the original dataset and then copy it to the output dataset
    for s in range(n_rows):
        sample_index = int(random.random()*n_rows)
        sampled_X[s,:] = X[sample_index,:]
        sampled_y[s] = y[sample_index]
    
    return sampled_X, sampled_y

In [7]:
def bootstrap_evaluation(classifier, X, y, k, metrics):
    evaluation = []
    for i in range(k):
        bX,by = bootstrap(X,y)
        classifier.fit(bX,by)        
        yp = classifier.predict(X)
        evaluation.append(metrics(y,yp))
    return evaluation

## Boston Housing Data
First, let apply bootstrap evaluation using a regression problem, the boston housing dataset.

In [8]:
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
y = boston.target

In [9]:
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

evaluation = bootstrap_evaluation(LinearRegression(), X, y, 100, r2_score)

In [10]:
print("Bootstrap Evaluation %.3f +/ %.3f"%(np.array(evaluation).mean(),np.array(evaluation).std()))

Bootstrap Evaluation 0.731 +/ 0.005


In [11]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
xval_evaluation = cross_val_score(LinearRegression(), X, y, cv=KFold(n_splits=10, random_state=1234, shuffle=True))

In [12]:
print("Crossvalidation Evaluation %.3f +/ %.3f"%(np.array(xval_evaluation).mean(),np.array(xval_evaluation).std()))

Crossvalidation Evaluation 0.682 +/ 0.125


## Iris Data
Next, we can apply the same approach to a classification problem.

In [13]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

classifier = LogisticRegression(max_iter=1000)

evaluation = bootstrap_evaluation(classifier, X, y, 100, accuracy_score)
print("Bootstrap Evaluation %.3f +/ %.3f"%(np.array(evaluation).mean(),np.array(evaluation).std()))

xval_evaluation = cross_val_score(classifier, X, y, cv=KFold(n_splits=10, random_state=1234, shuffle=True))
print("Crossvalidation Evaluation %.3f +/ %.3f"%(np.array(xval_evaluation).mean(),np.array(xval_evaluation).std()))

Bootstrap Evaluation 0.971 +/ 0.010
Crossvalidation Evaluation 0.967 +/ 0.045
