## MNIST example

This notebook demonstrates and end-to-end application of the glimr package.

Using MNIST classification as a simple example, we demonstrate the steps to create a search space, model builder, and dataloader for use in tuning. This provides a concrete example of topics like using the `glimr.utils` and `glimr.keras` functions to create hyperparameters and to correctly name losses and metrics for training and reporting.

This is followed by a demonstration of the `Search` class to show how to setup and run experiments.

In [1]:
!pip install ../../glimr

Processing /Users/lcoop22/Desktop/glimr
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


Building wheels for collected packages: glimr
  Building wheel for glimr (pyproject.toml) ... [?25ldone
[?25h  Created wheel for glimr: filename=glimr-0.1.dev33+ge238514-py3-none-any.whl size=18689 sha256=7de82b4f8993feb0a24ea2b75e2183acf6657a6e31cd2ae787d5941edbd5263c
  Stored in directory: /private/var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/pip-ephem-wheel-cache-nbc7hzai/wheels/58/08/39/c88c61a75aca3782dfcc11b86c8a6af860f75d1d00fddf72e2
Successfully built glimr
Installing collected packages: glimr
  Attempting uninstall: glimr
    Found existing installation: glimr 0.1.dev33+ge238514
    Uninstalling glimr-0.1.dev33+ge238514:
      Successfully uninstalled glimr-0.1.dev33+ge238514
Successfully installed glimr-0.1.dev33+ge238514


### Creating the search space

First let's create a search space for a simple two layer network for a multiclass MNIST classifier.

For each layer we define hyperparameters for the number of units, dropout rate, and activation functions. We explore two losses for the single output task (named "mnist"), and explore a variety of gradient optimization algorithms and batch sizes.

In [2]:
# import optimization search space from glimr
from glimr.optimization import optimization_space
from glimr.utils import set_hyperparameter

# define the possible layer activations
activations = {"elu", "gelu", "linear", "relu", "selu", "sigmoid", "softplus"}

# define the layer 1 hyperparameters
layer1 = {
    "activation": activations,
    "dropout": [0.0, 0.2, 0.05],
    "units": {64, 48, 32, 16}
}

# define the task
task = {
    "activation": activations,
    "dropout": [0.0, 0.2, 0.05],
    "units": 10,
    "loss": {"categorical_hinge", "categorical_crossentropy"},
    "loss_weight": 1.0,
    "metrics": {"auc": "auc"}
}

# put it all together
space = {
    "layer1": layer1,
    "optimization": optimization_space(),
    "tasks": {
        "mnist": task
    }
}

# display space
from pprint import pprint
pprint(space, indent=4)

# define a recursive procedure for setting hyperparameters for list, set types
def recursive_set_hyperparameter(dictionary):
    for key in dictionary.keys():
        if isinstance(dictionary[key], (list, set)):
            dictionary[key] = set_hyperparameter(dictionary[key])
        elif isinstance(dictionary[key], dict):
            recursive_set_hyperparameter(dictionary[key])
            
# convert from glimr hyperparameter notation to Ray Tune hyperparameters
recursive_set_hyperparameter(space)

2023-03-21 00:33:46.929005: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


{   'layer1': {   'activation': {   'elu',
                                    'gelu',
                                    'linear',
                                    'relu',
                                    'selu',
                                    'sigmoid',
                                    'softplus'},
                  'dropout': [0.0, 0.2, 0.05],
                  'units': {64, 16, 48, 32}},
    'optimization': {   'batch': <ray.tune.search.sample.Categorical object at 0x7fa106e9b220>,
                        'beta_1': <ray.tune.search.sample.Float object at 0x7fa0f224e9d0>,
                        'beta_2': <ray.tune.search.sample.Float object at 0x7fa0f224ea00>,
                        'learning_rate': <ray.tune.search.sample.Float object at 0x7fa106e9b2b0>,
                        'method': <ray.tune.search.sample.Categorical object at 0x7fa106e9b310>,
                        'momentum': <ray.tune.search.sample.Float object at 0x7fa0f222b8e0>,
                        

### Implement the model-building function

The model-builder function transforms a sample of the space into a `tf.keras.Model`, and loss, loss weight, and metric inputs for model compilation. This is a user-defined function to provide maximum flexibility in the models that can be used with glimr.

In [3]:
from glimr.keras import keras_losses, keras_metrics
import tensorflow as tf


def builder(config):
    
    # a helper function for building layers
    def _build_layer(x, units, activation, dropout, name):
        # dense layer
        x = tf.keras.layers.Dense(units, activation=activation, name=name)(x)

        # add dropout if necessary
        if dropout > 0.0:
            x = tf.keras.layers.Dropout(dropout)(x)

        return x
    
    # create input layer
    input_layer = tf.keras.Input([784], name="input")
    
    # build layer 1
    x = _build_layer(input_layer, 
                     config["layer1"]["units"], 
                     config["layer1"]["activation"], 
                     config["layer1"]["dropout"],
                     "layer1")
    
    # build output / task layer
    task_name = list(config["tasks"].keys())[0]
    output = _build_layer(input_layer, 
                     config["tasks"][task_name]["units"], 
                     config["tasks"][task_name]["activation"], 
                     config["tasks"][task_name]["dropout"],
                     task_name)

    # build named output dict
    named = {f"{task_name}": output}

    # create model
    model = tf.keras.Model(inputs=input_layer, outputs=named)

    # create a loss dictionary using utility function
    metric_mapper = {
        "categorical_crossentropy": tf.keras.losses.CategoricalCrossentropy(from_logits=True),
        "categorical_hinge": tf.keras.losses.CategoricalHinge()
    }
    losses, loss_weights = keras_losses(config, metric_mapper)

    # create a metric dictionary using utility function
    loss_mapper = {
        "auc": tf.keras.metrics.AUC
    }
    metrics = keras_metrics(config, loss_mapper)
    
    return model, losses, loss_weights, metrics

### Create a data loading function

Write a function to load and batch mnist samples. Flatten the images and apply a one-hot encoding to the labels.

In [4]:
import numpy as np


def dataloader(batch):
    
    # load mnist data
    train, validation = tf.keras.datasets.mnist.load_data(path="mnist.npz")
    
    # flattening function
    def mnist_flat(features):
        return features.reshape(features.shape[0], features.shape[1]*features.shape[2])

    # extract features, labels
    train_features = tf.cast(mnist_flat(train[0]), tf.float32) / 255.
    train_labels = train[1]
    validation_features = tf.cast(mnist_flat(validation[0]), tf.float32) / 255.
    validation_labels = validation[1]
    
    # build datasets
    train_ds = tf.data.Dataset.from_tensor_slices(
        (train_features, {"mnist": tf.one_hot(train_labels, 10)})
    )
    validation_ds = tf.data.Dataset.from_tensor_slices(
        (validation_features, {"mnist": tf.one_hot(validation_labels, 10)})
    )
    
    # batch
    train_ds = train_ds.shuffle(len(train_labels), reshuffle_each_iteration=True)
    train_ds = train_ds.batch(batch)
    validation_ds = validation_ds.batch(batch)

    return train_ds, validation_ds

### Test the search space, model builder, and dataloader

Before doing a hyperparameter search, let's test this combination to verify that the models can train.

We generate a sample configuration from the search space and build, compile, and train a model with this config.

In [5]:
from glimr.keras import keras_optimizer
import ray

# define a function for sampling a config from a space - ray will handle this automatically
def sample_space(space):
    config = {}
    for key in space:
        if isinstance(space[key], dict):
            config[key] = sample_space(space[key])
        elif isinstance(space[key], (ray.tune.search.sample.Categorical,
                                     ray.tune.search.sample.Integer,
                                     ray.tune.search.sample.Float)):
            config[key] = space[key].sample()
        else: # non sampleable value
            config[key] = space[key]
    return config

# sample a configuration
config = sample_space(space)

# display the configuration
from pprint import pprint
pprint(config, indent=4)

# build the model
model, losses, loss_weights, metrics = builder(config)

# build the optimizer
optimizer = keras_optimizer(config["optimization"])

# test compile the model
model.compile(optimizer=optimizer,
              loss=losses,
              metrics=metrics,
              loss_weights=loss_weights)

{   'layer1': {   'activation': 'softplus',
                  'dropout': 0.15000000000000002,
                  'units': 32},
    'optimization': {   'batch': 64,
                        'beta_1': 0.54,
                        'beta_2': 0.64,
                        'learning_rate': 0.008060000000000001,
                        'method': 'adam',
                        'momentum': 0.06,
                        'rho': 0.77},
    'tasks': {   'mnist': {   'activation': 'softplus',
                              'dropout': 0.05,
                              'loss': 'categorical_crossentropy',
                              'loss_weight': 1.0,
                              'metrics': {'auc': 'auc'},
                              'units': 10}}}


2023-03-21 00:33:55.206294: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
# build dataset and train
train_ds, val_ds = dataloader(config["optimization"]["batch"])
model.fit(x=train_ds, validation_data=val_ds, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa0f33fd130>

## Using Search for hyperparameter tuning

The `Search` class implements the hyperparameter tuning process of Ray Tune. It is designed to provide sensible defaults for the many options available in Ray Tune, but also allows fine grained access to all Ray Tune options through it's class attributes. It is written ass a builder class that is incrementally changed to add tuning options for things like reporting, checkpointing, and experiment resources.

We start by setting up a basic experiment, and then demonstrate how to control tuning options through class methods and class attribute assignment.

In [10]:
import contextlib
from glimr.search import Search
import os
import tempfile

###############################
space["epochs"] = 100
###############################

# Initialize the class using the search space, model builder, data loader, 
# and the name of the metric to optimize. The metric name for this single-task
# model has format task_metric. This is the standard convention when using
# glimr.keras.keras_metrics.
tuner = Search(space, builder, dataloader, "mnist_auc")

# setup a temporary directory to hold tune outputs
temp_dir = tempfile.TemporaryDirectory()

# run the experiment in this folder
with contextlib.redirect_stderr(open(os.devnull, "w")):
    tuner.experiment(temp_dir.name)

# cleanup the temporary folder
temp_dir.cleanup()

== Status ==
Current time: 2023-03-21 00:35:21 (running for 00:00:00.29)
Memory usage on this node: 11.7/16.0 GiB 
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 40.000: None | Iter 10.000: None
Resources requested: 2.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 8/100 (7 PENDING, 1 RUNNING)
+-----------------------+----------+-----------------+----------+-----------------+
| Trial name            | status   | loc             | method   |   learning rate |
|-----------------------+----------+-----------------+----------+-----------------|
| trainable_2b2b2_00000 | RUNNING  | 127.0.0.1:38406 | rms      |         0.00515 |
| trainable_2b2b2_00001 | PENDING  |                 | adadelta |         0.00372 |
| trainable_2b2b2_00002 | PENDING  |                 | sgd      |         0.00128 |
| trainable_2b2b2_00003 | PENDING  |                 | rms      |       

Trial name,date,done,episodes_total,experiment_id,hostname,iterations_since_restore,mnist_auc,node_ip,pid,should_checkpoint,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
trainable_2b2b2_00000,2023-03-21_00-35-47,True,,7f62d6d622a24cb68e58839afc4b4b77,Lees-MacBook-Pro.local,4,0.678888,127.0.0.1,38406,True,17.2701,4.03424,17.2701,1679376947,0,,4,2b2b2_00000,0.0124123
trainable_2b2b2_00001,2023-03-21_00-36-25,True,,a4a6cd44fb71432280d39eca62e0c174,Lees-MacBook-Pro.local,13,0.912039,127.0.0.1,38417,True,46.107,3.00604,46.107,1679376985,0,,13,2b2b2_00001,0.0139692
trainable_2b2b2_00002,2023-03-21_00-36-05,True,,9321de50b9454ec99518f35bcb03332f,Lees-MacBook-Pro.local,7,0.928004,127.0.0.1,38418,True,25.2455,3.20089,25.2455,1679376965,0,,7,2b2b2_00002,0.0147252
trainable_2b2b2_00003,2023-03-21_00-36-01,True,,81c4d710a5984eb79cea751efb9a4abd,Lees-MacBook-Pro.local,4,0.972908,127.0.0.1,38419,True,22.0418,4.56483,22.0418,1679376961,0,,4,2b2b2_00003,0.0148399
trainable_2b2b2_00004,2023-03-21_00-36-01,True,,7f62d6d622a24cb68e58839afc4b4b77,Lees-MacBook-Pro.local,4,0.8554,127.0.0.1,38406,True,14.1088,4.42581,14.1088,1679376961,0,,4,2b2b2_00004,0.0124123
trainable_2b2b2_00005,2023-03-21_00-36-23,True,,81c4d710a5984eb79cea751efb9a4abd,Lees-MacBook-Pro.local,4,0.927443,127.0.0.1,38419,True,20.8828,4.81903,20.8828,1679376983,0,,4,2b2b2_00005,0.0148399
trainable_2b2b2_00006,2023-03-21_00-36-14,True,,7f62d6d622a24cb68e58839afc4b4b77,Lees-MacBook-Pro.local,5,0.948686,127.0.0.1,38406,True,12.0353,2.25339,12.0353,1679376974,0,,5,2b2b2_00006,0.0124123
trainable_2b2b2_00007,2023-03-21_00-36-15,True,,9321de50b9454ec99518f35bcb03332f,Lees-MacBook-Pro.local,4,0.964558,127.0.0.1,38418,True,9.92862,1.85538,9.92862,1679376975,0,,4,2b2b2_00007,0.0147252
trainable_2b2b2_00008,2023-03-21_00-36-30,True,,7f62d6d622a24cb68e58839afc4b4b77,Lees-MacBook-Pro.local,5,0.769777,127.0.0.1,38406,True,16.1991,2.8684,16.1991,1679376990,0,,5,2b2b2_00008,0.0124123
trainable_2b2b2_00009,2023-03-21_00-36-36,True,,9321de50b9454ec99518f35bcb03332f,Lees-MacBook-Pro.local,4,0.910548,127.0.0.1,38418,True,20.8184,4.83513,20.8184,1679376996,0,,4,2b2b2_00009,0.0147252


== Status ==
Current time: 2023-03-21 00:35:51 (running for 00:00:30.38)
Memory usage on this node: 11.9/16.0 GiB 
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 40.000: None | Iter 10.000: None
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00003 with mnist_auc=0.978256106376648 and parameters={'optimization/method': 'rms', 'optimization/learning_rate': 0.0071200000000000005}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 9/100 (4 PENDING, 4 RUNNING, 1 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00001 | RUNNING    | 127.0.0.1:38417 | adadelta |         0.00372 |    0.655278 

== Status ==
Current time: 2023-03-21 00:37:53 (running for 00:02:32.04)
Memory usage on this node: 11.5/16.0 GiB 
Using AsyncHyperBand: num_stopped=1
Bracket: Iter 40.000: None | Iter 10.000: 0.9288615882396698
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00024 with mnist_auc=0.9769511222839355 and parameters={'optimization/method': 'adagrad', 'optimization/learning_rate': 0.00267}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 33/100 (4 PENDING, 4 RUNNING, 25 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00025 | RUNNING    | 127.0.0.1:38417 | adadelta |         0.00419 |    0.

== Status ==
Current time: 2023-03-21 00:39:24 (running for 00:04:03.06)
Memory usage on this node: 12.0/16.0 GiB 
Using AsyncHyperBand: num_stopped=4
Bracket: Iter 40.000: None | Iter 10.000: 0.9200438857078552
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00024 with mnist_auc=0.9769511222839355 and parameters={'optimization/method': 'adagrad', 'optimization/learning_rate': 0.00267}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 55/100 (4 PENDING, 4 RUNNING, 47 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00046 | RUNNING    | 127.0.0.1:38417 | adadelta |         0.00481 |    0.

== Status ==
Current time: 2023-03-21 00:40:58 (running for 00:05:36.72)
Memory usage on this node: 11.8/16.0 GiB 
Using AsyncHyperBand: num_stopped=8
Bracket: Iter 40.000: None | Iter 10.000: 0.9064149260520935
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00024 with mnist_auc=0.9769511222839355 and parameters={'optimization/method': 'adagrad', 'optimization/learning_rate': 0.00267}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 74/100 (4 PENDING, 4 RUNNING, 66 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00064 | RUNNING    | 127.0.0.1:38419 | adadelta |         0.00357 |    0.

== Status ==
Current time: 2023-03-21 00:42:31 (running for 00:07:09.61)
Memory usage on this node: 11.7/16.0 GiB 
Using AsyncHyperBand: num_stopped=9
Bracket: Iter 40.000: None | Iter 10.000: 0.9113542437553406
Resources requested: 8.0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00081 with mnist_auc=0.9790194630622864 and parameters={'optimization/method': 'rms', 'optimization/learning_rate': 0.00464}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 92/100 (4 PENDING, 4 RUNNING, 84 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00084 | RUNNING    | 127.0.0.1:38419 | sgd      |         0.00625 |    0.9660

== Status ==
Current time: 2023-03-21 00:43:38 (running for 00:08:16.60)
Memory usage on this node: 10.7/16.0 GiB 
Using AsyncHyperBand: num_stopped=9
Bracket: Iter 40.000: None | Iter 10.000: 0.9113542437553406
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/3.02 GiB heap, 0.0/1.51 GiB objects
Current best trial: 2b2b2_00081 with mnist_auc=0.9790194630622864 and parameters={'optimization/method': 'rms', 'optimization/learning_rate': 0.00464}
Result logdir: /var/folders/p8/2m9hqfn51c3_zkpq1894xzf80000gn/T/tmp584ijb11/2023_03_21_00_35_13
Number of trials: 100/100 (100 TERMINATED)
+-----------------------+------------+-----------------+----------+-----------------+-------------+
| Trial name            | status     | loc             | method   |   learning rate |   mnist_auc |
|-----------------------+------------+-----------------+----------+-----------------+-------------|
| trainable_2b2b2_00081 | TERMINATED | 127.0.0.1:38419 | rms      |         0.00464 |    0.979019 |
| trainable_2b2b2

[2m[33m(raylet)[0m [2023-03-21 00:43:48,617 E 38389 13848610] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2023-03-21_00-35-13_211893_38207 is over 95% full, available space: 12729610240; capacity: 1000240963584. Object creation will fail if spilling is required.
[2m[33m(raylet)[0m [2023-03-21 00:43:58,677 E 38389 13848610] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2023-03-21_00-35-13_211893_38207 is over 95% full, available space: 12729614336; capacity: 1000240963584. Object creation will fail if spilling is required.
[2m[33m(raylet)[0m [2023-03-21 00:44:08,705 E 38389 13848610] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2023-03-21_00-35-13_211893_38207 is over 95% full, available space: 12728401920; capacity: 1000240963584. Object creation will fail if spilling is required.
[2m[33m(raylet)[0m [2023-03-21 00:44:18,773 E 38389 13848610] (raylet) file_system_monitor.cc:105: /tmp/ray/session_2023-03-21_00-35-13_211893_38207 is over 95% full, av