## MNIST example

This notebook demonstrates and end-to-end application of the glimr package.

Using MNIST classification as a simple example, we demonstrate the steps to create a search space, model builder, and dataloader for use in tuning. This provides a concrete example of topics like using the `glimr.utils` and `glimr.keras` functions to create hyperparameters and to correctly name losses and metrics for training and reporting.

This is followed by a demonstration of the `Search` class to show how to setup and run experiments.

In [1]:
!pip install ../../glimr

Processing /Users/lcoop22/Desktop/glimr
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


Building wheels for collected packages: glimr
  Building wheel for glimr (pyproject.toml) ... [?25ldone
[?25h  Created wheel for glimr: filename=glimr-0.1.dev44+ga2ac0e7.d20230322-py3-none-any.whl size=18229 sha256=b6b84d7edbb7f8e33d6e8afa8cfd34c75177e74fc23537a9d65a9ebf3512a2de
  Stored in directory: /private/var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/pip-ephem-wheel-cache-xp9gwehi/wheels/58/08/39/c88c61a75aca3782dfcc11b86c8a6af860f75d1d00fddf72e2
Successfully built glimr
Installing collected packages: glimr
  Attempting uninstall: glimr
    Found existing installation: glimr 0.1.dev44+ga2ac0e7.d20230322
    Uninstalling glimr-0.1.dev44+ga2ac0e7.d20230322:
      Successfully uninstalled glimr-0.1.dev44+ga2ac0e7.d20230322
Successfully installed glimr-0.1.dev44+ga2ac0e7.d20230322


# Creating the search space

First let's create a search space for a simple two layer network for a multiclass MNIST classifier.

This search space will consist of hyperparameters for each layer, for loss, for gradient optimization, and for data loading and preprocessing. Below we build these components incrementally and examine each in detail.

### The first layer

For the first layer we define the possible layer activations, dropout rate, and number of units. Where defining a range hyperparameter, we use `tune.quniform` which creates a quantized floating point hyperparameter. Where choosing among discrete options, we use `tune.choice` which performs a random selection.

In [2]:
# import optimization search space from glimr
from pprint import pprint
from ray import tune

# define the possible layer activations
activations = tune.choice(
    ["elu", "gelu", "linear", "relu", "selu", "sigmoid", "softplus"]
)

# define the layer 1 hyperparameters
layer1 = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": tune.choice([64, 48, 32, 16]),
}

### Defining losses and metrics

Since losses can have a significant impact on performance, we may want to treat loss as a hyperparameter. Additionally, losses may have hyperparameters like label smoothting that can impact performance. Here we define a nested dictionary that randomizes choice of a hinge or cross entropy loss, and that defines label smoothing as a hyperparameter for cross entropy. Each loss has a `name` that is decoded by the model builder function to generate a `tf.keras.losses.Loss` object, and an optional `kwargs` dictionary that is used to create this object.

Loss weights are assigned for each loss, and can set as hyperparameters, although here we set the loss weight to 1.

Metrics provide feedback on model performance and are how Ray Tune ranks models, so they are not hyperparameters.

In [3]:
# set the loss as a hyperparameter
loss = tune.choice(
    [
        {"name": "categorical_hinge"},
        {
            "name": "categorical_crossentropy",
            "kwargs": {"label_smoothing": tune.quniform(0.0, 0.2, 0.01)},
        },
    ]
)

# use a fixed loss weight
loss_weight = (1.0,)

# set fixed metrics for reporting to Ray Tune
metrics = {
    "auc": {
        "name": "auc",
        "kwargs": {"from_logits": True},
    }
}

### Define the second layer / task

We refer to the terminal outputs / layers of a network as _tasks_. Each task is named to allow automatic linking of metrics and losses at compilation time for multi-task networks, and to simplify the naming and selection of the metric used by Ray to identify the best model/trial.

The specific formulation of a task depends on the model builder function, but here we define a task as a layer that has additional loss, loss weight, and metric values.

In [4]:
# define the task
task = {
    "activation": activations,
    "dropout": tune.quniform(0.0, 0.2, 0.05),
    "units": 10,
    "loss": loss,
    "loss_weight": loss_weight,
    "metrics": metrics,
}

### Optimization hyperparameters

Optimization hyperparameters include the maximum number of epochs for a trial, the gradient descent algorithm, and the algorithm hyperparameters like learning rate or momentum.

Glimr defines an optimization search space and an optimization builder in `glimr.keras.keras_optimizer`.

In [5]:
from glimr.optimization import optimization_space

optimization = optimization_space()

2023-03-21 23:41:52.465531: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Data loader hyperparameters

Data loader hyperparameters include a required `batch_size` hyperparameter, as well as user-defined hyperparameters to control loading and preprocessing behavior. Here we define a variable batch size, and randomize the application of a brightness transform.

In [6]:
# data loader keyword arguments to control loading, augmentation, and batching
data = {
    "batch_size": tune.choice([32, 64, 128]),
    "random_brightness": tune.choice(
        [True, False]
    ),  # whether to perform random brightness transformation
    "max_delta": tune.quniform(0.01, 0.15, 0.01),
}

### Putting it all together

The keys `data`, `optimization`, and `tasks` are all required keys that `glimr.search.Search` uses to build models during trials. For `tasks`, a dictionary maps the user-designated task names to the task dictionaries like the one defined above. A multi-task model will contain multiple task key/value pairs.

In [7]:
# put it all together
space = {
    "layer1": layer1,
    "optimization": optimization_space(),
    "tasks": {"mnist": task},
    "data": data,
}

# display search space
pprint(space, indent=4)

{   'data': {   'batch_size': <ray.tune.search.sample.Categorical object at 0x7fb822c49bb0>,
                'max_delta': <ray.tune.search.sample.Float object at 0x7fb80e8620a0>,
                'random_brightness': <ray.tune.search.sample.Categorical object at 0x7fb822c49790>},
    'layer1': {   'activation': <ray.tune.search.sample.Categorical object at 0x7fb822c74f10>,
                  'dropout': <ray.tune.search.sample.Float object at 0x7fb822c74220>,
                  'units': <ray.tune.search.sample.Categorical object at 0x7fb825bb2820>},
    'optimization': {   'batch': <ray.tune.search.sample.Categorical object at 0x7fb825bb2640>,
                        'beta_1': <ray.tune.search.sample.Float object at 0x7fb80c3a1340>,
                        'beta_2': <ray.tune.search.sample.Float object at 0x7fb80cee7d60>,
                        'epochs': 100,
                        'learning_rate': <ray.tune.search.sample.Float object at 0x7fb80c38ce50>,
                        'method':

### Sample a config from the search space and display

In [8]:
from glimr.utils import sample_space

config = sample_space(space)
pprint(config, indent=4)

{   'data': {'batch_size': 32, 'max_delta': 0.06, 'random_brightness': False},
    'layer1': {'activation': 'sigmoid', 'dropout': 0.0, 'units': 16},
    'optimization': {   'batch': 32,
                        'beta_1': 0.6,
                        'beta_2': 0.62,
                        'epochs': 100,
                        'learning_rate': 0.0017000000000000001,
                        'method': 'adadelta',
                        'momentum': 0.1,
                        'rho': 0.6},
    'tasks': {   'mnist': {   'activation': 'softplus',
                              'dropout': 0.15000000000000002,
                              'loss': {'name': 'categorical_hinge'},
                              'loss_weight': (1.0,),
                              'metrics': {   'auc': {   'kwargs': {   'from_logits': True},
                                                        'name': 'auc'}},
                              'units': 10}}}


# Implement the model-building function

The model-builder function transforms a sample of the space into a `tf.keras.Model`, and loss, loss weight, and metric inputs for model compilation. This is a user-defined function to provide maximum flexibility in the models that can be used with glimr.

In [9]:
from glimr.keras import keras_losses, keras_metrics
import tensorflow as tf


def builder(config):
    # a helper function for building layers
    def _build_layer(x, units, activation, dropout, name):
        # dense layer
        x = tf.keras.layers.Dense(units, activation=activation, name=name)(x)

        # add dropout if necessary
        if dropout > 0.0:
            x = tf.keras.layers.Dropout(dropout)(x)

        return x

    # create input layer
    input_layer = tf.keras.Input([784], name="input")

    # build layer 1
    x = _build_layer(
        input_layer,
        config["layer1"]["units"],
        config["layer1"]["activation"],
        config["layer1"]["dropout"],
        "layer1",
    )

    # build output / task layer
    task_name = list(config["tasks"].keys())[0]
    output = _build_layer(
        input_layer,
        config["tasks"][task_name]["units"],
        config["tasks"][task_name]["activation"],
        config["tasks"][task_name]["dropout"],
        task_name,
    )

    # build named output dict
    named = {f"{task_name}": output}

    # create model
    model = tf.keras.Model(inputs=input_layer, outputs=named)

    # create a loss dictionary using utility function
    loss_mapper = {
        "categorical_crossentropy": tf.keras.losses.CategoricalCrossentropy,
        "categorical_hinge": tf.keras.losses.CategoricalHinge,
    }
    losses, loss_weights = keras_losses(config, loss_mapper)

    # create a metric dictionary using utility function
    metric_mapper = {"auc": tf.keras.metrics.AUC}
    metrics = keras_metrics(config, metric_mapper)

    return model, losses, loss_weights, metrics

# Create a data loading function

Write a function to load and batch mnist samples. Flatten the images and apply a one-hot encoding to the labels.

In [10]:
import numpy as np


def dataloader(batch_size, random_brightness, max_delta):
    # load mnist data
    train, validation = tf.keras.datasets.mnist.load_data(path="mnist.npz")

    # flattening function
    def mnist_flat(features):
        return features.reshape(
            features.shape[0], features.shape[1] * features.shape[2]
        )

    # extract features, labels
    train_features = tf.cast(mnist_flat(train[0]), tf.float32) / 255.0
    train_labels = train[1]
    validation_features = tf.cast(mnist_flat(validation[0]), tf.float32) / 255.0
    validation_labels = validation[1]

    # build datasets
    train_ds = tf.data.Dataset.from_tensor_slices(
        (train_features, {"mnist": tf.one_hot(train_labels, 10)})
    )
    validation_ds = tf.data.Dataset.from_tensor_slices(
        (validation_features, {"mnist": tf.one_hot(validation_labels, 10)})
    )

    # batch
    train_ds = train_ds.shuffle(len(train_labels), reshuffle_each_iteration=True)
    train_ds = train_ds.batch(batch_size)
    validation_ds = validation_ds.batch(batch_size)

    # apply augmentation
    if random_brightness:
        train_ds = train_ds.map(
            lambda x, y: (tf.image.random_brightness(x, max_delta), y)
        )

    return train_ds, validation_ds

### Test the search space, model builder, and dataloader

Before doing a hyperparameter search, let's test this combination to verify that the models can train.

We generate a sample configuration from the search space and build, compile, and train a model with this config.

In [11]:
from glimr.keras import keras_optimizer
import ray

# sample a configuration
config = sample_space(space)

# display the configuration
from pprint import pprint

pprint(config, indent=4)

# build the model
model, losses, loss_weights, metrics = builder(config)

# build the optimizer
optimizer = keras_optimizer(config["optimization"])

# test compile the model
model.compile(
    optimizer=optimizer, loss=losses, metrics=metrics, loss_weights=loss_weights
)

{   'data': {'batch_size': 128, 'max_delta': 0.09, 'random_brightness': True},
    'layer1': {'activation': 'linear', 'dropout': 0.0, 'units': 48},
    'optimization': {   'batch': 64,
                        'beta_1': 0.68,
                        'beta_2': 0.53,
                        'epochs': 100,
                        'learning_rate': 0.00068,
                        'method': 'adadelta',
                        'momentum': 0.03,
                        'rho': 0.54},
    'tasks': {   'mnist': {   'activation': 'linear',
                              'dropout': 0.05,
                              'loss': {'name': 'categorical_hinge'},
                              'loss_weight': (1.0,),
                              'metrics': {   'auc': {   'kwargs': {   'from_logits': True},
                                                        'name': 'auc'}},
                              'units': 10}}}


2023-03-21 23:41:58.888908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [12]:
# build dataset and train
train_ds, val_ds = dataloader(**config["data"])
model.fit(x=train_ds, validation_data=val_ds, epochs=10)

Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fb825ba6ac0>

## Using Search for hyperparameter tuning

The `Search` class implements the hyperparameter tuning process of Ray Tune. It is designed to provide sensible defaults for the many options available in Ray Tune, but also allows fine grained access to all Ray Tune options through it's class attributes. It is written ass a builder class that is incrementally changed to add tuning options for things like reporting, checkpointing, and experiment resources.

We start by setting up a basic experiment, and then demonstrate how to control tuning options through class methods and class attribute assignment.

In [13]:
import contextlib
from glimr.search import Search
import os
import tempfile

# Initialize the class using the search space, model builder, data loader,
# and the name of the metric to optimize. The metric name for this single-task
# model has format task_metric. This is the standard convention when using
# glimr.keras.keras_metrics.
tuner = Search(space, builder, dataloader, "mnist_auc")

# setup a temporary directory to hold tune outputs
temp_dir = tempfile.TemporaryDirectory()

# run the experiment in this folder
with contextlib.redirect_stderr(open(os.devnull, "w")):
    tuner.experiment(temp_dir.name)

# cleanup the temporary folder
temp_dir.cleanup()

== Status ==
Current time: 2023-03-21 23:42:36 (running for 00:00:00.34)
Memory usage on this node: 12.2/16.0 GiB 
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 40.000: None | Iter 10.000: None
Resources requested: 2.0/8 CPUs, 0/0 GPUs, 0.0/2.92 GiB heap, 0.0/1.46 GiB objects
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmpjpedswwi/2023_03_21_23_42_28
Number of trials: 8/100 (7 PENDING, 1 RUNNING)
+-----------------------+----------+-----------------+----------+-----------------+
| Trial name            | status   | loc             | method   |   learning rate |
|-----------------------+----------+-----------------+----------+-----------------|
| trainable_f66bc_00000 | RUNNING  | 127.0.0.1:78727 | adadelta |         0.00189 |
| trainable_f66bc_00001 | PENDING  |                 | sgd      |         0.00293 |
| trainable_f66bc_00002 | PENDING  |                 | adagrad  |         0.00107 |
| trainable_f66bc_00003 | PENDING  |                 | adagrad  |       

Trial name,date,done,episodes_total,experiment_id,hostname,iterations_since_restore,mnist_auc,node_ip,pid,should_checkpoint,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
trainable_f66bc_00000,2023-03-21_23-43-34,True,,d9db0132e4634401ade3dfa9e1562aba,Lees-MacBook-Pro.local,10,0.678531,127.0.0.1,78727,True,48.9315,4.91987,48.9315,1679460214,0,,10,f66bc_00000,0.0142858
trainable_f66bc_00001,2023-03-21_23-43-09,True,,d715d7731f604c66b0e5ac11425600d0,Lees-MacBook-Pro.local,5,0.967127,127.0.0.1,78735,True,13.24,2.08665,13.24,1679460189,0,,5,f66bc_00001,0.0175021
trainable_f66bc_00002,2023-03-21_23-43-24,True,,257dafc66be64ad9a8bf3a00e92908d2,Lees-MacBook-Pro.local,6,0.850068,127.0.0.1,78736,True,28.886,4.16357,28.886,1679460204,0,,6,f66bc_00002,0.0177209
trainable_f66bc_00003,2023-03-21_23-43-11,True,,e315cd9bd3c64197bbf1a75c15fb3ca6,Lees-MacBook-Pro.local,4,0.965212,127.0.0.1,78737,True,15.4015,3.20595,15.4015,1679460191,0,,4,f66bc_00003,0.0200629
trainable_f66bc_00004,2023-03-21_23-43-22,True,,d715d7731f604c66b0e5ac11425600d0,Lees-MacBook-Pro.local,4,0.965887,127.0.0.1,78735,True,13.3639,2.88627,13.3639,1679460202,0,,4,f66bc_00004,0.0175021
trainable_f66bc_00005,2023-03-21_23-43-42,True,,e315cd9bd3c64197bbf1a75c15fb3ca6,Lees-MacBook-Pro.local,6,0.561184,127.0.0.1,78737,True,31.1069,4.48724,31.1069,1679460222,0,,6,f66bc_00005,0.0200629
trainable_f66bc_00006,2023-03-21_23-43-38,True,,d715d7731f604c66b0e5ac11425600d0,Lees-MacBook-Pro.local,5,0.968633,127.0.0.1,78735,True,15.9164,2.89567,15.9164,1679460218,0,,5,f66bc_00006,0.0175021
trainable_f66bc_00007,2023-03-21_23-43-44,True,,257dafc66be64ad9a8bf3a00e92908d2,Lees-MacBook-Pro.local,4,0.986881,127.0.0.1,78736,True,19.7609,4.25161,19.7609,1679460224,0,,4,f66bc_00007,0.0177209
trainable_f66bc_00008,2023-03-21_23-44-12,True,,d9db0132e4634401ade3dfa9e1562aba,Lees-MacBook-Pro.local,8,0.959884,127.0.0.1,78727,True,37.6765,4.51655,37.6765,1679460252,0,,8,f66bc_00008,0.0142858
trainable_f66bc_00009,2023-03-21_23-44-02,True,,d715d7731f604c66b0e5ac11425600d0,Lees-MacBook-Pro.local,5,0.949996,127.0.0.1,78735,True,23.131,4.37918,23.131,1679460242,0,,5,f66bc_00009,0.0175021


== Status ==
Current time: 2023-03-21 23:43:07 (running for 00:00:31.42)
Memory usage on this node: 11.9/16.0 GiB 
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 40.000: None | Iter 10.000: None
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/2.92 GiB heap, 0.0/1.46 GiB objects
Current best trial: f66bc_00001 with mnist_auc=0.96419358253479 and parameters={'optimization/method': 'sgd', 'optimization/learning_rate': 0.0029300000000000003}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmpjpedswwi/2023_03_21_23_42_28
Number of trials: 8/100 (4 PENDING, 4 RUNNING)
+-----------------------+----------+-----------------+----------+-----------------+-------------+
| Trial name            | status   | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+----------+-----------------+----------+-----------------+-------------|
| trainable_f66bc_00000 | RUNNING  | 127.0.0.1:78727 | adadelta |         0.00189 |    0.615816 |
| trainable_f66bc_000

== Status ==
Current time: 2023-03-21 23:45:09 (running for 00:02:33.78)
Memory usage on this node: 12.1/16.0 GiB 
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 40.000: None | Iter 10.000: 0.6747923344373703
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/2.92 GiB heap, 0.0/1.46 GiB objects
Current best trial: f66bc_00007 with mnist_auc=0.98688143491745 and parameters={'optimization/method': 'rms', 'optimization/learning_rate': 0.0026400000000000004}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmpjpedswwi/2023_03_21_23_42_28
Number of trials: 34/100 (4 PENDING, 4 RUNNING, 26 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_f66bc_00024 | RUNNING    | 127.0.0.1:78735 | adadelta |         0.00539