## Cross validation example

Cross validation can provide a better estimate of performance than a single split of your dataset. We have often observed that running Glimr with a single split produces a configuration that is highly overfit to this validation dataset, and that generalizes poorly to independent testing data. Glimr provides tools to perform cross validation to address this.

When performing a cross validation, each model configuration is run in multiple trials with different cross-validation folds. Post experiment analysis can be used to identify the model configuration with the best average performance, or to build ensembles of models trained on different portions of the data.

Revisiting the MNIST example, we demonstrate the formulation of cross validation dataloaders and the experiment analysis tools.

In [1]:
!pip install ../../glimr

Processing /Users/lac5440/Desktop/glimr
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: glimr
  Building wheel for glimr (pyproject.toml) ... [?25ldone
[?25h  Created wheel for glimr: filename=glimr-0.1.dev154+g25953a5.d20231214-py3-none-any.whl size=25100 sha256=9be33169ce38a692821313178ddc4d3576fb6fc28044df6f3b774fa14721ce4f
  Stored in directory: /private/var/folders/tz/qttd962d27n1g_l3f83f9s95byzb9n/T/pip-ephem-wheel-cache-_y_ao52w/wheels/17/71/17/3520291f6e42aef9e08bebc18ab3d238ca66e5490443565920
Successfully built glimr
Installing collected packages: glimr
  Attempting uninstall: glimr
    Found existing installation: glimr 0.1.dev154+g25953a5.d20231214
    Uninstalling glimr-0.1.dev154+g25953a5.d20231214:
      Successfully uninstalled glimr-0.1.dev154+g25953a5.d202312

# Create a cross validation data loader

Cross validation requires a dataloader that accepts `cv_index` and `cv_folds` arguments that represent the fold index and number of folds. The `Search` class will populate your data search space with these arguments automatically.

This data loader below uses stratified k-fold cross validation to build class-balanced folds. Since each trial will run a separate fold, random arguments like the split seed must be fixed across trials. 

In [15]:
import numpy as np
from sklearn.model_selection import StratifiedKFold


def cv_dataloader(batch_size, random_brightness, max_delta, cv_index, cv_folds):
    """Cross-validation MNIST data loader.

    Parameters
    ----------
    batch_size : int
        The number of samples to batch.
    random_brightness : bool
        Whether to apply random brightness augmentation.
    max_delta : float
        The random brightness augmentation parameter.
    cv_index : int
        The index of the requested fold.
    cv_folds : int
        The number of folds in the cross validation.

    Returns
    -------
    train_ds : tf.data.Dataset
        A batched training set for fold `cv_index` used to build models.
    validation_ds : tf.data.Dataset.
        A batched validation set for fold `cv_index` used to evaluate models.
    """

    # load mnist data
    train, validation = tf.keras.datasets.mnist.load_data(path="mnist.npz")

    # combine training, validation sets
    merged = (
        np.concatenate((train[0], validation[0]), axis=0),
        np.concatenate((train[1], validation[1]), axis=0),
    )

    # flattening function
    def mnist_flat(features):
        return features.reshape(
            features.shape[0], features.shape[1] * features.shape[2]
        )

    # stratified k-fold cross validation
    skf = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=0)
    train_index, validation_index = [
        (i, o) for (i, o) in skf.split(merged[0], merged[1])
    ][cv_index]

    # extract features, labels
    train_features = tf.cast(mnist_flat(merged[0][train_index]), tf.float32) / 255.0
    train_labels = merged[1][train_index]
    validation_features = (
        tf.cast(mnist_flat(merged[0][validation_index]), tf.float32) / 255.0
    )
    validation_labels = merged[1][validation_index]

    # build datasets
    train_ds = tf.data.Dataset.from_tensor_slices(
        (train_features, {"mnist": tf.one_hot(train_labels, 10)})
    )
    validation_ds = tf.data.Dataset.from_tensor_slices(
        (validation_features, {"mnist": tf.one_hot(validation_labels, 10)})
    )

    # batch
    train_ds = train_ds.shuffle(len(train_labels), reshuffle_each_iteration=True)
    train_ds = train_ds.batch(batch_size)
    validation_ds = validation_ds.batch(batch_size)

    # apply augmentation
    if random_brightness:
        train_ds = train_ds.map(
            lambda x, y: (tf.image.random_brightness(x, max_delta), y)
        )

    return train_ds, validation_ds

# Setting up the search space and model building funciton

The search space and model building function are not impacted by the choice to use cross validation. Reuse everything from the starter example.

In [2]:
from glimr.optimization import optimization_space
from pprint import pprint
from ray import tune
import tensorflow as tf

# define the possible layer activations
activations = tune.choice(
    ["elu", "gelu", "linear", "relu", "selu", "sigmoid", "softplus"]
)

# define the layer 1 hyperparameters
layer1 = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": tune.choice([64, 48, 32, 16]),
}

# define the task
task = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": 10,
    "loss": loss,
    "loss_weight": loss_weight,
    "metrics": metrics,
}

# set the loss as a hyperparameter
loss = tune.choice(
    [
        {"name": "categorical_hinge", "loss": tf.keras.losses.CategoricalHinge},
        {
            "name": "categorical_crossentropy",
            "loss": tf.keras.losses.CategoricalCrossentropy,
            "kwargs": {"label_smoothing": tune.quniform(0.0, 0.2, 0.01)},
        },
    ]
)

# use a fixed loss weight
loss_weight = (1.0,)

# set fixed metrics for reporting to Ray Tune
metrics = {
    "name": "auc",
    "metric": tf.keras.metrics.AUC,
    "kwargs": {"from_logits": True},
}

# optimizer search space
optimization = optimization_space()

# data loader keyword arguments to control loading, augmentation, and batching
data = {
    "batch_size": tune.choice([32, 64, 128]),
    "random_brightness": tune.choice(
        [True, False]
    ),  # whether to perform random brightness transformation
    "max_delta": tune.quniform(0.01, 0.15, 0.01),
}


from glimr.keras import keras_losses, keras_metrics


def builder(config):
    # a helper function for building layers
    def _build_layer(x, units, activation, dropout, name):
        # dense layer
        x = tf.keras.layers.Dense(units, activation=activation, name=name)(x)

        # add dropout if necessary
        if dropout > 0.0:
            x = tf.keras.layers.Dropout(dropout)(x)

        return x

    # create input layer
    input_layer = tf.keras.Input([784], name="input")

    # build layer 1
    x = _build_layer(
        input_layer,
        config["layer1"]["units"],
        config["layer1"]["activation"],
        config["layer1"]["dropout"],
        "layer1",
    )

    # build output / task layer
    task_name = list(config["tasks"].keys())[0]
    output = _build_layer(
        input_layer,
        config["tasks"][task_name]["units"],
        config["tasks"][task_name]["activation"],
        config["tasks"][task_name]["dropout"],
        task_name,
    )

    # build named output dict
    named = {f"{task_name}": output}

    # create model
    model = tf.keras.Model(inputs=input_layer, outputs=named)

    # create a loss dictionary
    losses, loss_weights = keras_losses(config)

    # create a metric dictionary
    metrics = keras_metrics(config)

    return model, losses, loss_weights, metrics

# Using Search with `cv_folds`

Creating a `Search` instance with the `cv_folds` argument is all that is needed to instruct `ray.tune` to perform a cross validation.

Since `cv_folds` trials will be run for each configuration, the total number of trials will be `cv_folds` * `num_samples`.

In [17]:
import contextlib
from glimr.search import Search
import os
import tempfile

# pass `cv_folds` parameter to Search for cross validation
tuner = Search(space, builder, cv_dataloader, "mnist_auc", cv_folds=5)

# make a temporary directory to store outputs - cleanup at end
temp_dir = tempfile.TemporaryDirectory()

# run trials using default settings
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(local_dir=temp_dir.name, name="default", num_samples=10)

0,1
Current time:,2023-12-13 20:06:23
Running for:,01:14:18.50
Memory:,13.4/32.0 GiB

Trial name,status,loc,data/batch_size,data/cv_index,data/max_delta,data/random_brightne ss,layer1/activation,layer1/dropout,layer1/units,optimization/beta_1,optimization/beta_2,optimization/ema_mom entum,optimization/ema_ove rwrite_frequency,optimization/learnin g_rate,optimization/method,optimization/momentu m,optimization/rho,optimization/use_ema,tasks/mnist/activati on,tasks/mnist/dropout,tasks/mnist/loss,tasks/mnist/loss/kwa rgs/label_smoothing,iter,total time (s),mnist_auc,mnist_loss
trainable_001fa_00000,TERMINATED,127.0.0.1:17719,32,0,0.05,True,sigmoid,0.05,64,0.87,0.64,0.9,3.0,0.00202,adam,0.03,0.67,True,elu,0.2,{'name': 'categ_a880,0.17,10,383.852,0.715866,6.70684
trainable_001fa_00001,TERMINATED,127.0.0.1:17720,128,1,0.04,False,elu,0.15,32,0.73,0.77,0.98,5.0,0.00082,sgd,0.04,0.51,False,gelu,0.0,{'name': 'categ_f680,0.02,17,156.834,0.901089,2.87752
trainable_001fa_00002,TERMINATED,127.0.0.1:17721,32,2,0.04,False,linear,0.15,64,0.58,0.72,0.95,1.0,0.00801,sgd,0.0,0.55,False,gelu,0.2,{'name': 'categ_6840,0.16,17,613.747,0.826779,1.78796
trainable_001fa_00003,TERMINATED,127.0.0.1:17722,32,3,0.15,False,gelu,0.0,16,0.87,0.98,0.92,4.0,0.00283,adam,0.01,0.67,False,softplus,0.05,{'name': 'categ_b800,0.19,8,301.361,0.507762,1.14431
trainable_001fa_00004,TERMINATED,127.0.0.1:17723,128,4,0.04,True,softplus,0.05,16,0.75,0.86,0.95,1.0,0.00983,adagrad,0.05,0.87,True,sigmoid,0.05,{'name': 'categ_0780,0.0,4,42.2716,0.905062,0.454144
trainable_001fa_00005,TERMINATED,127.0.0.1:17724,128,0,0.08,False,softplus,0.1,16,0.81,0.67,0.97,3.0,0.00677,sgd,0.03,0.52,False,elu,0.15,{'name': 'categ_ad80,,4,42.256,0.977813,0.343119
trainable_001fa_00006,TERMINATED,127.0.0.1:17725,32,1,0.03,True,linear,0.1,16,0.95,0.76,0.96,5.0,0.00958,sgd,0.08,0.74,False,gelu,0.2,{'name': 'categ_5580,,4,141.197,0.940969,0.324634
trainable_001fa_00007,TERMINATED,127.0.0.1:17726,128,2,0.14,False,selu,0.1,48,0.82,0.81,0.91,4.0,0.00015,sgd,0.02,0.58,False,selu,0.2,{'name': 'categ_0bc0,0.1,10,99.2069,0.396754,2.23903
trainable_001fa_00008,TERMINATED,127.0.0.1:17747,64,3,0.15,False,sigmoid,0.1,16,0.79,0.64,0.91,3.0,0.00169,adadelta,0.02,0.84,False,softplus,0.05,{'name': 'categ_96c0,,10,182.039,0.697013,1.08685
trainable_001fa_00009,TERMINATED,127.0.0.1:17748,64,4,0.08,False,elu,0.0,64,0.82,0.92,0.9,2.0,0.00861,adadelta,0.08,0.72,False,gelu,0.05,{'name': 'categ_2cc0,0.02,6,110.611,0.66408,7.21525


2023-12-13 18:52:52,337	INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'tasks/mnist/loss/loss': ('__ref_ph', '4ff9a407')}
2023-12-13 18:52:52,445	INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'tasks/mnist/loss/kwargs/label_smoothing': <ray.tune.search.sample.Float object at 0x3552debc0>, 'tasks/mnist/loss/loss': ('__ref_ph', '5811840f')}
2023-12-13 18:53:49,363	INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'tasks/mnist/loss/kwargs/label_smoothing': <ray.tune.search.sample.Float object at 0x354f6eef0>, 'tasks/mnist/loss/loss': ('__ref_ph', '5811840f')}
2023-12-13 18:54:31,254	INFO tensorboardx.py:275 -- Removed the following hyperparameter values when logging to tensorboard: {'tasks/mnist/loss/loss': ('__ref_ph', '4ff9a407')}
2023-12-13 18:54:46,829	INFO tensorboardx.py:275 -- Removed the following hyperparameter values wh

In [72]:
from glimr.analysis import _parse_experiment, _checkpoints, _filter_checkpoints

exp_dir = temp_dir.name + "/default"
metric = "mnist_auc"
df = _parse_experiment(exp_dir)
rates = [c["optimization"]["learning_rate"] for c in list(df["config"])]
print(len(set(rates)))
print(len(rates))

# add column where configurations are enumerated
from copy import deepcopy
import json


def _enumerate_configs(df):
    cleaned = [deepcopy(c) for c in list(df["config"])]
    for clean in cleaned:
        del clean["data"]["cv_index"]
    mapping = {}
    for clean in cleaned:
        if json.dumps(clean) not in mapping.keys():
            mapping[json.dumps(clean)] = len(mapping) + 1
    df["config_enum"] = [mapping[json.dumps(clean)] for clean in cleaned]
    return df

221
250
