# Bootstrapping Idiom
There are multiple hyperparameters introduced by different regularization techniques, which may be adjusted. These include
- the number of layers in the NN
- how many neurons in a given layer
- which activation function to use on each layer
- dropout, and if so, what ratio
- L1/L2, and if so, with what magnitude Lagrangian multiplier

A difficulty exists when trying to measure the effect of these hyperparameters; that is, training the exact same NN setup twice, due to their stochastic nature, may already produce vastly different results independent of the hyperparameters. *Bootstrapping* can be an effective means of benchmarking sets of hyperparameters, since it creates an *ensemble*, mitigating some of the random contribution.

Some recipes for training neural networks and determing the best set of hyperparameters exists, such as [this blog post by Andrej Karpathy](http://karpathy.github.io/2019/04/25/recipe/).

Similar to cross-validation, *bootstrapping* iterates over a number of folds with validation and training sets. Bootstrapping, however, assembles a new train/validation split every cycle, with potential overlap, allowing bootstrapping methods to continue indefinitely. This means there will also often be repeated rows over enough cycles, and potentially even duplicate cycles entirely.

We will explore bootstrapping for hyperparameter benchmarking, and train NN for a specified number of *splits*, after which we compare the average score of each hyperparameter set. This approach should be much less prone to random fluctuation than using e.g. cross-validation for benchmarking.

Additionally, the number of *epochs* will be tracked, as this may also give an indication of optimal hyperparameters. There exists a caveat with this, namely that the early stop validation set may *seemingly* improve the network's performance. This is partially also due to stopping and evaluating on the same sample (ideally we would use independent samples). However, since we are using a gross average, this should not present too large of a problem.

We will define a helper function for numerical time displays:

In [1]:
import datetime

def hms_string(seconds):
    return str(
        datetime.timedelta(
            seconds=seconds
        )
    )

## Assembling the data
Since we will be demonstrating the bootstrapping method over a few examples, we want to hold onto the data and perform some general abstractions:

In [4]:
import pandas as pd

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?']
)
df

Unnamed: 0,id,job,area,income,aspect,subscriptions,dist_healthy,save_rate,dist_unhealthy,age,pop_dense,retail_dense,crime,product
0,1,vv,c,50876.0,13.100000,1,9.017895,35,11.738935,49,0.885827,0.492126,0.071100,b
1,2,kd,c,60369.0,18.625000,2,7.766643,59,6.805396,51,0.874016,0.342520,0.400809,c
2,3,pe,c,55126.0,34.766667,1,3.632069,6,13.671772,44,0.944882,0.724409,0.207723,b
3,4,11,c,51690.0,15.808333,1,5.372942,16,4.333286,50,0.889764,0.444882,0.361216,b
4,5,kl,d,28347.0,40.941667,3,3.822477,20,5.967121,38,0.744094,0.661417,0.068033,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,1996,vv,c,51017.0,38.233333,1,5.454545,34,14.013489,41,0.881890,0.744094,0.104838,b
1996,1997,kl,d,26576.0,33.358333,2,3.632069,20,8.380497,38,0.944882,0.877953,0.063851,a
1997,1998,kl,d,28595.0,39.425000,3,7.168218,99,4.626950,36,0.759843,0.744094,0.098703,f
1998,1999,qp,c,67949.0,5.733333,0,8.936292,26,3.281439,46,0.909449,0.598425,0.117803,c


Generate dummies and fill missing data:

In [5]:
import scipy.stats

df = pd.concat(
    [
        df,
        pd.get_dummies(df['job'], prefix='job'),
        pd.get_dummies(df['area'], prefix='area')
    ],
    axis=1
)

df.drop(['job', 'area'], axis=1, inplace=True)

med = df['income'].median()
df['income'] = df['income'].fillna(med)

for i in ['income', 'aspect', 'save_rate', 'subscriptions']:
    df[i] = scipy.stats.zscore(
        df[i]
    )

# save copy
_df = df.copy()
    
def renew_data():
    return _df.copy()

## Defining the environment
Since we will be performing much the same task for different models, we will define a few functions that enact the bootstrapping for us, which we can call in a loop:

In [6]:
import tensorflow as tf
import numpy as np
import time

def bootstrap_step(x, y, model, train, test, score_, mean_store, epoch_store):
    """ score_ is a callback for scoring, 
        mean and epoch store are lists
        - trains and evaluates the model, 
        - returns info string """
    start_time = time.time()
    
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    monitor = tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', 
        min_delta=1e-3, 
        patience=5, 
        verbose=0, 
        mode='min', 
        restore_best_weights=True
    )
    
    model.fit(
        x_train, y_train,
        validation_data=(x_test, y_test),
        callbacks=[monitor],
        verbose=0,
        epochs=1000
    )
    
    # get number of epochs until stopped
    epochs = monitor.stopped_epoch
    epoch_store.append(epochs)
    
    pred = model.predict(x_test)
    
    # calculate score and store
    score = score_(pred, y_test)
    mean_store.append(score)
    
    # calculate means and stds
    score_mean = np.mean(mean_store)
    score_std = np.std(mean_store)
    
    epoch_mean = np.mean(epoch_store)
    
    # end timer
    duration = time.time() - start_time
    return (
        f"score: {score:.3f} | mean: {score_mean:.3f} | std: {score_std:.3f}\n"
        f"epochs: {epochs} | mean: {int(epoch_mean)} | time: {hms_string(duration)}"
    ) 

## Bootstrapping Regression
Regression boostraps use the `ShuffleSplit` class, which, analogous to `KFold`, does not balance the train/validation splits. We finish preparing the data for regression:

In [7]:
df = renew_data()

df = pd.concat(
    [ 
        df,pd.get_dummies(df['product'], prefix="product") 
    ],
    axis=1
)
df.drop('product', axis=1, inplace=True)

x_cols = df.columns.drop('age').drop('id')
x = df[x_cols].values
y = df['age'].values

Next we build the model:

In [8]:
SPLITS = 50

import sklearn.metrics
import sklearn.model_selection

def new_model():
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(
            20,
            input_dim=x.shape[1],
            activation='relu'
        ),
        tf.keras.layers.Dense(
            10,
            activation='relu'
        ),
        tf.keras.layers.Dense(1)
    ])
    
    model.compile(
        loss='mean_squared_error',
        optimizer='adam'
    )
    return model


def score_func(pred, y_test):
    return np.sqrt(
        sklearn.metrics.mean_squared_error(
            pred, y_test
        )
    )

And using bootstrapping, we invoke our model:

In [9]:
boot = sklearn.model_selection.ShuffleSplit(
    n_splits=SPLITS,
    test_size=0.1,
    random_state=414141
)

mean_benchmark = []
epochs_needed = []

counter = 0
for train, test in boot.split(x):
    counter += 1
    
    model = new_model()
    
    info = bootstrap_step(
        x, y,
        model,
        train, test,
        score_func,
        mean_benchmark,
        epochs_needed
    )
    
    print(f"--- trial {counter} ---------------------")
    print(info)

--- trial 1 ---------------------
score: 0.561 | mean: 0.561 | std: 0.000
epochs: 120 | mean: 120 | time: 0:00:15.275265
--- trial 2 ---------------------
score: 0.597 | mean: 0.579 | std: 0.018
epochs: 116 | mean: 118 | time: 0:00:07.917888
--- trial 3 ---------------------
score: 1.033 | mean: 0.730 | std: 0.215
epochs: 94 | mean: 110 | time: 0:00:06.481990
--- trial 4 ---------------------
score: 0.993 | mean: 0.796 | std: 0.218
epochs: 95 | mean: 106 | time: 0:00:06.642428
--- trial 5 ---------------------
score: 1.151 | mean: 0.867 | std: 0.241
epochs: 104 | mean: 105 | time: 0:00:07.310214
--- trial 6 ---------------------
score: 0.508 | mean: 0.807 | std: 0.258
epochs: 113 | mean: 107 | time: 0:00:07.695861
--- trial 7 ---------------------
score: 0.683 | mean: 0.790 | std: 0.242
epochs: 151 | mean: 113 | time: 0:00:10.125791
--- trial 8 ---------------------
score: 0.797 | mean: 0.791 | std: 0.227
epochs: 100 | mean: 111 | time: 0:00:07.049929
--- trial 9 ---------------------


## Bootstrapping Classification
The approach to classification bootstrapping is very similar as with regression. Only now, we instead use `StratifiedShuffleSplit`, for the same reasons as with $k$-fold cross-validation:

In [11]:
df = renew_data()

df['age'] = scipy.stats.zscore(df['age'])

x_cols = df.columns.drop('product').drop('id')
x = df[x_cols].values

dummies = pd.get_dummies(df['product'])
products = dummies.columns
y = dummies.values

Again, we define our model construction and testing functions:

In [14]:
SPLITS = 50

import sklearn.metrics
import sklearn.model_selection

def new_model():
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(
            50,
            input_dim=x.shape[1],
            activation='relu'
        ),
        tf.keras.layers.Dense(
            25,
            activation='relu'
        ),
        tf.keras.layers.Dense(
            y.shape[1],
            activation='softmax'
        )
    ])
    
    model.compile(
        loss='categorical_crossentropy',
        optimizer='adam'
    )
    return model


def score_func(pred, y_test):
    y_compare = np.argmax(y_test,axis=1)
    return sklearn.metrics.log_loss(
        y_compare, pred
    )

Now we can run just as before, with the exception of using a different bootstrapper class:

In [15]:
boot = sklearn.model_selection.StratifiedShuffleSplit(
    n_splits=SPLITS,
    test_size=0.1,
    random_state=414141
)

mean_benchmark = []
epochs_needed = []

counter = 0
for train, test in boot.split(x, df['product']):
    counter += 1
    
    model = new_model()
    
    info = bootstrap_step(
        x, y,
        model,
        train, test,
        score_func,
        mean_benchmark,
        epochs_needed
    )
    
    print(f"--- trial {counter} ---------------------")
    print(info)

--- trial 1 ---------------------
score: 0.697 | mean: 0.697 | std: 0.000
epochs: 40 | mean: 40 | time: 0:00:03.414883
--- trial 2 ---------------------
score: 0.653 | mean: 0.675 | std: 0.022
epochs: 28 | mean: 34 | time: 0:00:02.359109
--- trial 3 ---------------------
score: 0.685 | mean: 0.678 | std: 0.019
epochs: 30 | mean: 32 | time: 0:00:02.658177
--- trial 4 ---------------------
score: 0.618 | mean: 0.663 | std: 0.031
epochs: 41 | mean: 34 | time: 0:00:03.514177
--- trial 5 ---------------------
score: 0.619 | mean: 0.654 | std: 0.033
epochs: 39 | mean: 35 | time: 0:00:03.241857
--- trial 6 ---------------------
score: 0.705 | mean: 0.663 | std: 0.035
epochs: 17 | mean: 32 | time: 0:00:01.670558
--- trial 7 ---------------------
score: 0.713 | mean: 0.670 | std: 0.037
epochs: 27 | mean: 31 | time: 0:00:02.581623
--- trial 8 ---------------------
score: 0.740 | mean: 0.679 | std: 0.042
epochs: 14 | mean: 29 | time: 0:00:01.436481
--- trial 9 ---------------------
score: 0.577 |