# Error Estimation

In this notebook, we will implement and test **error estimation approaches** for evaluating classifiers and/or learning algorithms.

At the start, we will implement the $k$-fold cross-validation with and without stratification.

Subsequently, we will use the nested $k$-fold cross-validation on an exemplary dataset to perform model selection.

### **Table of Contents**
1. [$k$-fold Cross-alidation](#k-fold-cross-validation)
2. [Model Selection](#model-selection)

In [None]:
%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import numpy as np

### **1. $k$-fold Cross-validation** <a class="anchor" id="k-fold-cross-validation"></a>

We implement the function [`cross_validation`](../e2ml/evaluation/_error_estimation.py) in the [`e2ml.evaluation`](../e2ml/evaluation) subpackage. Once, the implementation has been completed, we visualize and compare the standard and stratified cross-validation.

In [None]:
from e2ml.evaluation import cross_validation
# Generate articial class labels.
y = np.zeros(100)
sample_indices = np.arange(len(y), dtype=int)
y[30:90] = 1
y[90:] = 2

# Visualize standard (k=3)-fold cross validation via a bar plot showing the
# class distribution within each fold.
# BEGIN SOLUTION

train, test = cross_validation(
    sample_indices=sample_indices, n_folds=3, random_state=0
)
class_distribution = []
plt.title("Standard $k=3$-fold Cross-valdiation")
for i, class_y in enumerate(np.unique(y)):
    y_values = []
    for t in test:
        y_values.append(np.sum(y[t] == class_y))
    class_distribution.append(y_values)
    plt.bar(np.arange(len(test)), y_values, bottom=np.sum(np.array(class_distribution)[:i], axis=0))
    plt.xticks([0, 1, 2])
plt.show()

# END SOLUTION
    
# Visualize stratified (k=3)-fold cross validation via a bar plot showing the class
# distribution within each fold.
# BEGIN SOLUTION

train, test = cross_validation(
    sample_indices=sample_indices, n_folds=3, random_state=0, y=y
)
class_distribution = []
plt.title("Stratified $k=3$-fold Cross-valdiation")
for i, class_y in enumerate(np.unique(y)):
    y_values = []
    for t in test:
        y_values.append(np.sum(y[t] == class_y))
    class_distribution.append(y_values)
    plt.bar(np.arange(len(test)), y_values, bottom=np.sum(np.array(class_distribution)[:i], axis=0))
    plt.xticks([0, 1, 2])
plt.show()

# END SOLUTION

### **2. Model Selection** <a class="anchor" id="model-selection"></a>

In the follwing, we perform a small evaluation study including a model selection. Our goal is to compare the learning algorithm of a [*support vector classifier*](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) (SVC) and a [*multi-layer perceptron*](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier) (MLP) on the data set [*breast cancer*](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer). We generate in each run 20 hyperparameter configurations according to one of the popular experimentation methods. Studied hyperparamters are the regularization parameter $C \in (0, 1000)$ (`C`) and the so-called bandwidth $\gamma \in (0, 1]$ (`gamma`) for the SVC, while the learning rate $\eta \in (0, 1]$ (`learning_rate_init`) and another regularization parameter $\alpha \in (0, 1)$ (`alpha`) are studied for the MLP. Further, we use a nested stratified $k=5$-folded cross-valdiation as error-estimation approach. The zero-one loss serves as performance measure to report the emprical mean and standard deviation of the risk estimates.

In [None]:
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_breast_cancer

# Load breast cancer data set.
X, y = load_breast_cancer(return_X_y=True)

# Perform evaluation study.
# BEGIN SOLUTION

from e2ml.experimentation import halton
from e2ml.preprocessing import StandardScaler
from e2ml.evaluation import zero_one_loss

# Define number of folds.
n_folds = 5

# Placeholder for emprical risks of SVC and MLP per fold in the outer loop.
risks_svc_outer, risks_mlp_outer = [], []

# Create hyperparamter configuration using the halton sequence.
theta_svc_list = halton(n_samples=20, n_dimensions=2, bounds=[(0, 1000), (0, 1)])
theta_mlp_list = halton(n_samples=20, n_dimensions=2, bounds=[(0, 1), (0, 1)])

# Perform $k$-fold cross validation as outer loop.
sample_indices = np.arange(len(y), dtype=int)
train_outer, test_outer = cross_validation(
    sample_indices=sample_indices, n_folds=n_folds, y=y, random_state=0
)
for tr_outer, te_outer in zip(train_outer, test_outer):
    # Perform $k$-fold cross-validation as inner loop.
    train_inner, test_inner = cross_validation(
        sample_indices=tr_outer, n_folds=n_folds, y=y[tr_outer], random_state=0
    )
    
    # Define inital best hyperparameters and risk estimates.
    theta_star_svc, theta_star_mlp = None, None
    minimum_risk_svc, minimum_risk_mlp = 1, 1
    
    # Perform model selection.
    for theta_svc, theta_mlp in zip(theta_svc_list, theta_mlp_list):
        # Placeholder for emprical risks of SVC and MLP per fold in the inner loop.
        risks_svc_inner, risks_mlp_inner = [], []
        
        for tr_inner, te_inner in zip(train_inner, test_inner):
            # Standardize data.
            sc_inner = StandardScaler().fit(X[tr_inner])
            X_tr_inner = sc_inner.transform(X[tr_inner]) 
            X_te_inner = sc_inner.transform(X[te_inner])
            
            # Fit SVC in the inner loop.
            svc_inner = SVC(
                C=theta_svc[0],
                gamma=theta_svc[1],
                random_state=0
            )
            svc_inner.fit(X_tr_inner, y[tr_inner])
            
            # Evaluate SVC in the inner loop.
            y_pred = svc_inner.predict(X_te_inner)
            risks_svc_inner.append(
                zero_one_loss(y_pred=y_pred, y_true=y[te_inner])
            )
            
            # Fit MLP in the inner loop.
            mlp_inner = MLPClassifier(
                learning_rate_init=theta_mlp[0],
                alpha=theta_mlp[1],
                random_state=0
            )
            mlp_inner.fit(X_tr_inner, y[tr_inner])
            
            # Evaluate MLP in the inner loop.
            y_pred = mlp_inner.predict(X_te_inner)
            risks_mlp_inner.append(
                zero_one_loss(y_pred=y_pred, y_true=y[te_inner])
            )
            
        # Update best hyperparamter configuration for SVC.
        if np.mean(risks_svc_inner) <= minimum_risk_svc:
            theta_star_svc = theta_svc
            minimum_risk_svc = np.mean(risks_svc_inner)
            
        # Update best hyperparamter configuration for MLP.
        if np.mean(risks_mlp_inner) <= minimum_risk_mlp:
            theta_star_mlp = theta_mlp
            minimum_risk_mlp = np.mean(risks_mlp_inner)
    
    # Standardize data in the outer loop.
    sc_outer = StandardScaler().fit(X[tr_outer])
    X_tr_outer = sc_outer.transform(X[tr_outer]) 
    X_te_outer = sc_outer.transform(X[te_outer])

    # Fit SVC in the outer loop.
    svc_outer = SVC(
        C=theta_star_svc[0],
        gamma=theta_star_svc[1],
        random_state=0
    )
    svc_outer.fit(X_tr_outer, y[tr_outer])

    # Evaluate SVC in the outer loop.
    y_pred = svc_outer.predict(X_te_outer)
    risks_svc_outer.append(
        zero_one_loss(y_pred=y_pred, y_true=y[te_outer])
    )
    
    # Fit MLP in the outer loop.
    mlp_outer = MLPClassifier(
        learning_rate_init=theta_star_mlp[0],
        alpha=theta_star_mlp[1], 
        random_state=0
    )
    mlp_outer.fit(X_tr_outer, y[tr_outer])

    # Evaluate MLP in the outer loop.
    y_pred = mlp_outer.predict(X_te_outer)
    risks_mlp_outer.append(
        zero_one_loss(y_pred=y_pred, y_true=y[te_outer])
    )
    
print(f"SVC: {np.mean(risks_svc_outer)} +- {np.std(risks_svc_outer)}")
print(f"MLP: {np.mean(risks_mlp_outer)} +- {np.std(risks_mlp_outer)}")
    
# END SOLUTION