## MNIST example

This notebook demonstrates and end-to-end application of the glimr package.

Using MNIST classification as a simple example, we demonstrate the steps to create a search space, model builder, and dataloader for use in tuning. This provides a concrete example of topics like using the `glimr.utils` and `glimr.keras` functions to create hyperparameters and to correctly name losses and metrics for training and reporting.

This is followed by a demonstration of the `Search` class to show how to setup and run experiments.

In [None]:
!pip install ../../glimr

# Creating the search space

First let's create a search space for a simple two layer network for a multiclass MNIST classifier.

This search space will consist of hyperparameters for each layer, for loss, for gradient optimization, and for data loading and preprocessing. Below we build these components incrementally and examine each in detail.

### The first layer

For the first layer we define the possible layer activations, dropout rate, and number of units. Where defining a range hyperparameter, we use `tune.quniform` which creates a quantized floating point hyperparameter. Where choosing among discrete options, we use `tune.choice` which performs a random selection.

In [None]:
# import optimization search space from glimr
from pprint import pprint
from ray import tune

# define the possible layer activations
activations = tune.choice(
    ["elu", "gelu", "linear", "relu", "selu", "sigmoid", "softplus"]
)

# define the layer 1 hyperparameters
layer1 = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": tune.choice([64, 48, 32, 16]),
}

### Defining losses and metrics

Since losses and their parameters can have a significant impact on performance, we may want to treat them as tunable hyperparameters. For example, properties like label smoothing thresholds can be searched to identify optimal values. Here we define a nested dictionary that randomizes choice of a hinge or cross entropy loss, and that defines label smoothing as a hyperparameter for cross entropy. Each loss has a `name` that defines how this loss is registered and reported by Ray Tune, and a `loss` parameter that defines a `tf.keras.losses.Loss` subclass. An optional `kwargs` dictionary is used to customize the class instance when it is created during a trial.

Loss weights are assigned for each loss, and can set as hyperparameters, although here we set the loss weight to 1.

Metrics provide feedback on model performance and are how Ray Tune ranks models, so they are not hyperparameters.

In [None]:
import tensorflow as tf

# set the loss as a hyperparameter
loss = tune.choice(
    [
        {"name": "categorical_hinge", "loss": tf.keras.losses.CategoricalHinge},
        {
            "name": "categorical_crossentropy",
            "loss": tf.keras.losses.CategoricalCrossentropy,
            "kwargs": {"label_smoothing": tune.quniform(0.0, 0.2, 0.01)},
        },
    ]
)

# use a fixed loss weight
loss_weight = (1.0,)

# set fixed metrics for reporting to Ray Tune
metrics = {
    "name": "auc",
    "metric": tf.keras.metrics.AUC,
    "kwargs": {"from_logits": True},
}

### Define the second layer / task

We refer to the terminal outputs / layers of a network as _tasks_. Each task is named to allow automatic linking of metrics and losses at compilation time for multi-task networks, and to simplify the naming and selection of the metric used by Ray to identify the best model/trial.

The specific formulation of a task depends on the model builder function, but here we define a task as a layer that has additional loss, loss weight, and metric values.

In [None]:
# define the task
task = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": 10,
    "loss": loss,
    "loss_weight": loss_weight,
    "metrics": metrics,
}

### Optimization hyperparameters

Optimization hyperparameters include the maximum number of epochs for a trial, the gradient descent algorithm, and the algorithm hyperparameters like learning rate or momentum.

Glimr defines an optimization search space and an optimization builder in `glimr.keras.keras_optimizer`.

In [None]:
from glimr.optimization import optimization_space

optimization = optimization_space()

### Data loader hyperparameters

Data loader hyperparameters include a required `batch_size` hyperparameter, as well as user-defined hyperparameters to control loading and preprocessing behavior. Here we define a variable batch size, and randomize the application of a brightness transform.

In [None]:
# data loader keyword arguments to control loading, augmentation, and batching
data = {
    "batch_size": tune.choice([32, 64, 128]),
    "random_brightness": tune.choice(
        [True, False]
    ),  # whether to perform random brightness transformation
    "max_delta": tune.quniform(0.01, 0.15, 0.01),
}

### Putting it all together

The keys `data`, `optimization`, and `tasks` are all required keys that `glimr.search.Search` uses to build models during trials. For `tasks`, a dictionary maps the user-designated task names to the task dictionaries like the one defined above. A multi-task model will contain multiple task key/value pairs.

In [None]:
# put it all together
space = {
    "layer1": layer1,
    "optimization": optimization_space(),
    "tasks": {"mnist": task},
    "data": data,
}

# display search space
pprint(space, indent=4)

### Sample a config from the search space and display

In [None]:
from glimr.utils import sample_space

config = sample_space(space)
pprint(config, indent=4)

# Implement the model-building function

The model-builder function transforms a sample of the space into a `tf.keras.Model`, and loss, loss weight, and metric inputs for model compilation. This is a user-defined function to provide maximum flexibility in the models that can be used with glimr.

In [None]:
from glimr.keras import keras_losses, keras_metrics


def builder(config):
    # a helper function for building layers
    def _build_layer(x, units, activation, dropout, name):
        # dense layer
        x = tf.keras.layers.Dense(units, activation=activation, name=name)(x)

        # add dropout if necessary
        if dropout > 0.0:
            x = tf.keras.layers.Dropout(dropout)(x)

        return x

    # create input layer
    input_layer = tf.keras.Input([784], name="input")

    # build layer 1
    x = _build_layer(
        input_layer,
        config["layer1"]["units"],
        config["layer1"]["activation"],
        config["layer1"]["dropout"],
        "layer1",
    )

    # build output / task layer
    task_name = list(config["tasks"].keys())[0]
    output = _build_layer(
        input_layer,
        config["tasks"][task_name]["units"],
        config["tasks"][task_name]["activation"],
        config["tasks"][task_name]["dropout"],
        task_name,
    )

    # build named output dict
    named = {f"{task_name}": output}

    # create model
    model = tf.keras.Model(inputs=input_layer, outputs=named)

    # create a loss dictionary
    losses, loss_weights = keras_losses(config)

    # create a metric dictionary
    metrics = keras_metrics(config)

    return model, losses, loss_weights, metrics

# Create a data loading function

Write a function to load and batch mnist samples. Flatten the images and apply a one-hot encoding to the labels.

In [None]:
import numpy as np


def dataloader(batch_size, random_brightness, max_delta):
    # load mnist data
    train, validation = tf.keras.datasets.mnist.load_data(path="mnist.npz")

    # flattening function
    def mnist_flat(features):
        return features.reshape(
            features.shape[0], features.shape[1] * features.shape[2]
        )

    # extract features, labels
    train_features = tf.cast(mnist_flat(train[0]), tf.float32) / 255.0
    train_labels = train[1]
    validation_features = tf.cast(mnist_flat(validation[0]), tf.float32) / 255.0
    validation_labels = validation[1]

    # build datasets
    train_ds = tf.data.Dataset.from_tensor_slices(
        (train_features, {"mnist": tf.one_hot(train_labels, 10)})
    )
    validation_ds = tf.data.Dataset.from_tensor_slices(
        (validation_features, {"mnist": tf.one_hot(validation_labels, 10)})
    )

    # batch
    train_ds = train_ds.shuffle(len(train_labels), reshuffle_each_iteration=True)
    train_ds = train_ds.batch(batch_size)
    validation_ds = validation_ds.batch(batch_size)

    # apply augmentation
    if random_brightness:
        train_ds = train_ds.map(
            lambda x, y: (tf.image.random_brightness(x, max_delta), y)
        )

    return train_ds, validation_ds

### Test the search space, model builder, and dataloader

Before doing a hyperparameter search, let's test this combination to verify that the models can train.

We generate a sample configuration from the search space and build, compile, and train a model with this config.

In [None]:
from glimr.keras import keras_optimizer
import ray

# sample a configuration
config = sample_space(space)

# display the configuration
from pprint import pprint

pprint(config, indent=4)

# build the model
model, losses, loss_weights, metrics = builder(config)

# build the optimizer
optimizer = keras_optimizer(config["optimization"])

# test compile the model
model.compile(
    optimizer=optimizer, loss=losses, metrics=metrics, loss_weights=loss_weights
)

# build dataset and train
train_ds, val_ds = dataloader(**config["data"])
model.fit(x=train_ds, validation_data=val_ds, epochs=10)

# Using Search for hyperparameter tuning

The `Search` class implements the hyperparameter tuning process of Ray Tune. It provides sensible defaults for the many options available in Ray Tune, but also allows fine grained access to these options through class methods and attributes. Options can be added or assigned incrementally to alter reporting, checkpointing, trial stopping criteria, and resources used in experiments.

We begin with a basic experiment using the AsyncHyperBandScheduler which performs a random search but terminates poorly performing trials early.

By default if a reporter is not set, however, Ray Tune will display a dynamic table of ongoing experiments and results if running in Jupyter.

In [None]:
import contextlib
from glimr.search import Search
import os
import tempfile

# Initialize a Search instance with the search space, model builder, and
# data loader. The name of the metric and indicate to Ray how to measure
# model performance, and are provided in format task_metric. This is the
# convention when building models using glimr.keras.keras_metrics.
tuner = Search(space, builder, dataloader, "mnist_auc")

# make a temporary directory to store outputs - cleanup at end
temp_dir = tempfile.TemporaryDirectory()

# run trials using default settings
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(local_dir=temp_dir.name, name="default", num_samples=20)

### Display information about the best trial

The output from `Search.experiment` contains information on each trial's performance and configuration, and can be used for trial analysis and for keeping the best checkpointed result.

In [None]:
# get information about the best trial
best_result = results.get_best_result()
best_auc = best_result.metrics["mnist_auc"]
best_config = best_result.metrics["config"]
best_checkpoint = best_result.checkpoint.to_directory()

# display
print(f"best result auc: {best_auc}")
print("best configuration:")
pprint(best_config, indent=4)
print("contents of best trial checkpoint:")
print(os.listdir(os.path.join(best_checkpoint, "checkpoint")))

### Experiment performance statistics for default settings

Let's plot the performance of trials from the default experiment which uses an AsyncHyperband scheduler with a random search.

In [None]:
# get dataframe from default experiment
default_table = results.get_dataframe()
default_auc = np.array(default_table["mnist_auc"])

# plot
import matplotlib.pyplot as plt

plt.plot(-np.sort(-default_auc))
plt.xlabel("trial")
plt.ylabel("AUC")

### Using a population-based training scheduler

Let's change the scheduler to population-based training (PBT). PBT optimizes resource utililzation by replacing poorly performing trials with "mutated" versions of top performing trials.

We recommend creating a new `Search` object for every experiment to avoid issues with Ray Tune.

In [None]:
from ray.tune.schedulers import PopulationBasedTraining

# create a Search object instance
tuner = Search(space, builder, dataloader, "mnist_auc")

# create the PBT trainer - the entire search space eligible for mutation
scheduler = PopulationBasedTraining(
    time_attr="training_iteration", hyperparam_mutations=space
)

# run trials with the PBT trainer
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(
        local_dir=temp_dir.name, name="pbt", num_samples=20, scheduler=scheduler
    )

### Compare PBT and default experiment performance statistics

Plot the PBT trial performance alongside the default AsyncHyperband results.

In [None]:
# get dataframe from default experiment
pbt_table = results.get_dataframe()
pbt_auc = np.array(pbt_table["mnist_auc"])

# plot
import matplotlib.pyplot as plt

f, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(-np.sort(-default_auc))
ax1.plot(-np.sort(-pbt_auc))
ax1.set_title("All trials")
ax1.set_xlabel("trial")
ax1.set_ylabel("AUC")
ax1.legend(["random", "population-based"])
ax2.plot(-np.sort(-default_auc)[0:5])
ax2.plot(-np.sort(-pbt_auc)[0:5])
ax2.set_title("Top 5 trials")
ax2.set_xlabel("trial")
ax2.set_ylabel("AUC")
ax2.legend(["random", "population-based"])
f.tight_layout()

### Change trial resources

The `ray.air.ScalingConfig` determines the resources available during experiments, and can be used to set the number of workers and availability of GPUs. Additional parameters are available for running on multiple machine using a Ray cluster.

In [None]:
# create a Search object instance
tuner = Search(space, builder, dataloader, "mnist_auc")

# alter the scaling parameters so that each trial gets 2 cores
tuner.set_scaling(
    num_workers=1,
    use_gpu=False,
    resources_per_worker={"CPU": 2, "GPU": 0},
)

# run 10 trials with the PBT trainer
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(
        local_dir=temp_dir.name, name="resources", num_samples=10
    )

### Reporting

Reporting options can also be set by using `Search.set_reporter` method or by creating a reporter object and assigning this directly to the `Search.reporter` attribute. This allows use of additional parameters not exposed by `set_reporter`.

Here we setup a CLI trainer with customized report columns and re-run an experiment. Since we are running in Jupyter, the dynamic table is also automatically displayed too.

In [None]:
from ray.tune import CLIReporter

# report every 30 seconds
max_report_frequency = 30

# set Jupyter preference
jupyter = False

# set metrics, parameters to display
metrics = [f"{t}_{m}" for t in space["tasks"] for m in space["tasks"][t]["metrics"]]
parameters = {
    "optimization/method": "method",
    "optimization/learning_rate": "learning rate",
    "layer1/units": "layer1_units",
}

# set reporter kwargs
reporter_kwargs = {
    "metric_columns": metrics,
    "parameter_columns": parameters,
    "max_report_frequency": max_report_frequency,
}

# create a Search object instance
tuner = Search(space, builder, dataloader, "mnist_auc")

# assign reporter to attribute and run a short trial
tuner.reporter = CLIReporter(**reporter_kwargs)
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(local_dir=temp_dir.name, name="try", num_samples=10)

### Restarting an interrupted experiment

We can restart an experiment that has been interrupted to complete unfinished trials using `Search.restore`.

Interrupt the execution of this cell and then execute the cell below to complete the unexecuted trials.

In [None]:
# create a Search object instance
tuner = Search(space, builder, dataloader, "mnist_auc")

# assign reporter to attribute and run a short trial
tuner.reporter = CLIReporter(**reporter_kwargs)
with contextlib.redirect_stderr(open(os.devnull, "w")):
    results = tuner.experiment(local_dir=temp_dir.name, name="restore", num_samples=50)

In [None]:
# complete trials
with contextlib.redirect_stderr(open(os.devnull, "w")):
    tuner.restore(local_dir=temp_dir.name + "/restore")

### Cleanup storage

Hyperparameter tuning experiments can consume a lot of storage. Take care when setting `local_dir` for more extensive runs.

In [None]:
# cleanup the temporary directory
temp_dir.cleanup()