# Hyperparameter tuning with Ray Tune

[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a library for experiment execution and hyperparameter tuning at any scale. It supports most machine learning frameworks, a variety of state-of-the-art algorithms, and integrates a wide range of dedicated hyperparameter optimization tools.

## Key Concepts

There are [six concepts to understand](https://docs.ray.io/en/latest/tune/key-concepts.html):

1. Search Spaces: A search space defines the set of valid parameters for your hyperparameters.
2. Trainable: In Tune, the objective you want to minimize/maximize is represented by a [Trainable](https://docs.ray.io/en/latest/tune/key-concepts.html#ray-tune-trainables). Trainables are [functions](https://docs.ray.io/en/latest/tune/api/trainable.html#tune-function-api) or [classes](https://docs.ray.io/en/latest/tune/api/trainable.html#class-trainable-api) that take hyperparameters as input and return metric.
3. Search Algorithms: Search algorithms do the heavy lifting, as they describe _how_ to tune the Trainable.
4. Schedulers: Tune can optionally use a Scheduler to stop searches early and thus speed up the hyperparameter search process.
5. Trials: Trials are a concrete combination of hyperparameter values.
6. Analyses: After the search process has terminated, Tune will present you with a `ResultGrid`, which allows you to access various metrics such as the best available trial or the hyperparameter configuration for said trial.

![Tune Flow](imgs/tune_flow.png)

Image taken from the [tune documentation](https://docs.ray.io/en/latest/tune/key-concepts.html).

With these concepts in mind, we can create a blueprint for any Tune script:

In [None]:
from ray import tune, train


def trainable(config):
    # config is a dict containing the hyperparameters, it is a sample from the search space

    # train your model using the hyperparameters
    # ...
    score = 0.5

    # return the score
    return {"score": score}


# Define the search space
search_space = {
    # Your hyperparameters go here
}

# Select the search algorithm and its parameters
# (e.g. Random Search, Bayesian Optimization, HyperBand, etc.; Searcher is the base class for all search algorithms)
algo = tune.search.Searcher(
    # Your search algorithm parameters go here
)

# Select the scheduler and its parameters
# (e.g. HyperBand, ASHAScheduler, etc.; Scheduler is the base class for all schedulers)
scheduler = tune.schedulers.TrialScheduler(
    # Your scheduler parameters go here
)

# Define the tune_config
tune_config = tune.TuneConfig(
    # Name of the metric that the trainable returns
    # and we want to optimize.
    metric="score",
    # The mode can be "min" or "max"
    mode="max",
    # The search algorithm
    search_alg=algo,
    # The scheduler
    scheduler=scheduler,
    # Number of times to sample from the hyperparameter space
    num_samples=10,
)

# Define the run_config.
run_config = train.RunConfig(stop={"training_iteration": 20})

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config,
    param_space=search_space,
)

# Start the search
results = tuner.fit() # returns result grid

This is of course very abstract. Let's look at a concrete example: grid search.

### Grid Search with Ray Tune

In this example, we will be trying to minimize the following objective:

```python
def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]
    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss})
```

with the loss function `loss_fn` defined as

```python
def loss_fn(step, width, height, activation):
    pre_act = (0.1 + width * step / 100) ** (-1) + height * 0.1
    if 'relu' == activation:
        return F.relu(pre_act)
    elif 'tanh' == activation:
        return F.sigmoid(pre_act)
```

Grid / random search (implemented by the [`BasicVariantGenerator`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.search.basic_variant.BasicVariantGenerator.html#ray.tune.search.basic_variant.BasicVariantGenerator)) is the default search algorithm in Ray Tune, it is selected automatically when no search algorithm is passed to the `Tuner`. You can find the complete list of search algorithms [in the docs](https://docs.ray.io/en/latest/tune/api/suggestion.html#random-search-and-grid-search-tune-search-basic-variant-basicvariantgenerator).

We will perform a grid search over the `activation`. This means that, for each value, either `relu` or `sigmoid`, we'll randomly sample an equal amount of values for `width`, `height`.

Our search space looks as follows:

```python
    search_space={
        "steps": 100,  # We don't want to optimize the number of steps.
        "width": tune.uniform(0, 20),
        "height": tune.uniform(-100, 100),
        "activation": tune.grid_search(["relu", "tanh"]),
    }
```

`tune.uniform` describes a uniform distribution. `tune.grid_search` guarantees that the values are sampled `num_samples` times (`num_samples` is a parameter of [`TunerConfig`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.TuneConfig.html#ray.tune.TuneConfig), see the blueprint above). For a full list of the random distributions supported by the search space API, refer to the [corresponding page in the documentation](https://docs.ray.io/en/latest/tune/api/search_space.html#tune-search-space-api).

The trainable, search algorithm, and search space are everything we need. Let's add them to the blueprint.


In [None]:
from ray import tune, train
import torch
import torch.nn.functional as F

def loss_fn(step, width, height, activation):
    pre_act = torch.tensor((0.1 + width * step / 100) ** (-1) + height * 0.1)
    if "relu" == activation:
        return F.relu(pre_act)
    elif "tanh" == activation:
        return F.tanh(pre_act)

def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]

    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss.item()})


# Define the search space
search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
    "activation": tune.grid_search(["relu", "tanh"]),
}

# Select the search algorithm and its parameters
algo = tune.search.basic_variant.BasicVariantGenerator()

# We're not using a scheduler in this example
scheduler = None

# Define the tune_config
tune_config = tune.TuneConfig(
    metric="mean_loss",
    mode="min",
    search_alg=algo,
    scheduler=scheduler,
    num_samples=50,
)

# Define the run_config.
run_config = train.RunConfig(name="Grid search experiment")

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config,
    param_space=search_space,
)

# Start the search
results = tuner.fit()  # returns result grid

You can access the best run via `ResultGrid.get_best_result()`...

In [None]:
results.get_best_result()

... and the corresponding configuration:

In [None]:
results.get_best_result().config

### Walking on egg shells

Now it's your turn. Use the blueprint above to optimize the following function:

$$
f(x, y) = -(y + 47) \cdot \sin\left(\sqrt{\left| \frac{x}{2} + (y + 47) \right|}\right) - x \cdot \sin\left(\sqrt{\left| x - (y + 47) \right|}\right)
$$

also known as the Eggholder function. The search domain is $-512 \leq x, y \leq 512$.

Implement the function as a trainable and find $(x, y)$ which minimize the eggholder. Use the [`HEBOSearch`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.search.hebo.HEBOSearch.html#ray.tune.search.hebo.HEBOSearch) search algorithm.

_Note: One could of course use a gradient-based optimizer to minimize this function - but that's not the point here. ;)_

In [None]:
# Your code goes here.

## Tracking hyperparameter experiments

Ray Tune integrates with a range of experiment tracking tools, including MLflow. To integrate MLflow (or any tracking framework for that matter), there are two options.

1. the Callback API
2. the `setup_<integration>` function

The callback API is easier to set up but gives you slightly less control over what is logged to MLflow than `setup_mlflow`.
We'll only show how to use the callback API, but you are of course invited to explore the `setup_*` option on your own.

Using the callback API is as easy as adding the callback to the `RunConfig`.

In [None]:
from ray.air.integrations.mlflow import MLflowLoggerCallback

run_config_with_callback = train.RunConfig(
    name="MLFlow logging experiment",
    callbacks=[MLflowLoggerCallback(
        tracking_uri="http://localhost:8080",  # Replace with your MLFlow tracking server URI.
        experiment_name="ray-tune-experiments",
        save_artifact=True
    )],
)

Everything else is the same as before!

In [None]:
def loss_fn(step, width, height, activation):
    pre_act = torch.tensor((0.1 + width * step / 100) ** (-1) + height * 0.1)
    if "relu" == activation:
        return F.relu(pre_act)
    elif "tanh" == activation:
        return F.tanh(pre_act)

def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]

    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss.item()})


# Define the search space
search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
    "activation": tune.grid_search(["relu", "tanh"]),
}

# Select the search algorithm and its parameters
algo = tune.search.basic_variant.BasicVariantGenerator()

# We're not using a scheduler in this example
scheduler = None

# Define the tune_config
tune_config = tune.TuneConfig(
    metric="mean_loss",
    mode="min",
    search_alg=algo,
    scheduler=scheduler,
    num_samples=50,
)

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config_with_callback,
    param_space=search_space,
)

# Start the search
results = tuner.fit()  # returns result grid

If you head over to your MLFlow tracking server, you should see the experiment and the runs that were logged.

![Ray Tune experiment in MLflow](imgs/mlflow_raytune_experiment.png)