# Ray Tune Lab/Demo

In this walkthrough, we'll see a minimal realistic example of tuning, from a Ray Tune example.

We'll use
* Tensorflow
* MNIST data
* Perceptron classifier architecture with SGD optimizer

And we'll tune
* Number of neurons in the hidden layer
* SGD learning rate
* SGD momentum

First, we'll do some imports and set up the training call:

In [None]:
import numpy as np
from tensorflow.keras.datasets import mnist
from ray.tune.integration.keras import TuneReportCallback

def train_mnist(config):
    # https://github.com/tensorflow/tensorflow/issues/32159
    import tensorflow as tf
    batch_size = 128
    num_classes = 10
    epochs = 12

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),
        tf.keras.layers.Dense(config["hidden"], activation="relu"),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(num_classes, activation="softmax")
    ])

    model.compile(
        loss="sparse_categorical_crossentropy",
        optimizer=tf.keras.optimizers.SGD(
            lr=config["lr"], momentum=config["momentum"]),
        metrics=["accuracy"])

    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=0,
        validation_data=(x_test, y_test),
        callbacks=[TuneReportCallback({
            "mean_accuracy": "accuracy"
        })])

We'll use the `AsyncHyperBandScheduler` for managing our trials. Ray recommends this variant (described in https://arxiv.org/abs/1810.05934) over the "base" Hyperband implementation.

For an overview of various strategies -- including Hyperband -- this is a great introduction: https://medium.com/criteo-labs/hyper-parameter-optimization-algorithms-2fe447525903

In [None]:
import ray
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
mnist.load_data()

sched = AsyncHyperBandScheduler(
    time_attr="training_iteration",
    metric="mean_accuracy",
    mode="max",
    max_t=400,
    grace_period=20)

And now we start Ray and configure our work in a call to `tune.run`

In [None]:
ray.init()

tune.run(
    train_mnist,
    name="exp",
    scheduler=sched,
    stop={
        "mean_accuracy": 0.99,
        "training_iteration": 5
    },
    num_samples=10,
    resources_per_trial={
        "cpu": 2,
        "gpu": 0
    },
    config={
        "threads": 2,
        "lr": tune.sample_from(lambda spec: np.random.uniform(0.001, 0.1)),
        "momentum": tune.sample_from(
            lambda spec: np.random.uniform(0.1, 0.9)),
        "hidden": tune.sample_from(
            lambda spec: np.random.randint(32, 512)),
    })

As we can see -- and as you've probably experienced if you've tried to hand-tune a network -- even in a simple problem like this, the resulting accuracy comes from subtle interplay between the influences of the hyperparams.

Simply put, the network size, learning rate, and momentum all have to live in a sweet spot to get optimal results.

In [None]:
ray.shutdown()