# Hyperparameter tuning with Ray Tune

[Ray Tune](https://docs.ray.io/en/latest/tune/index.html) is a library for experiment execution and hyperparameter tuning at any scale. It supports most machine learning frameworks, a variety of state-of-the-art algorithms, and integrates a wide range of dedicated hyperparameter optimization tools.

## Key Concepts

There are [six concepts to understand](https://docs.ray.io/en/latest/tune/key-concepts.html):

1. Search Spaces: A search space defines the set of valid parameters for your hyperparameters.
2. Trainable: In Tune, the objective you want to minimize/maximize is represented by a [Trainable](https://docs.ray.io/en/latest/tune/key-concepts.html#ray-tune-trainables). Trainables are [functions](https://docs.ray.io/en/latest/tune/api/trainable.html#tune-function-api) or [classes](https://docs.ray.io/en/latest/tune/api/trainable.html#class-trainable-api) that take hyperparameters as input and return metric.
3. Search Algorithms: Search algorithms do the heavy lifting, as they describe _how_ to tune the Trainable.
4. Schedulers: Tune can optionally use a Scheduler to stop searches early and thus speed up the hyperparameter search process.
5. Trials: Trials are a concrete combination of hyperparameter values.
6. Analyses: After the search process has terminated, Tune will present you with a `ResultGrid`, which allows you to access various metrics such as the best available trial or the hyperparameter configuration for said trial.

![Tune Flow](imgs/tune_flow.png)

Image taken from the [tune documentation](https://docs.ray.io/en/latest/tune/key-concepts.html).

With these concepts in mind, we can create a blueprint for any Tune script:

In [None]:
from ray import tune, train


def trainable(config):
    # config is a dict containing the hyperparameters, it is a sample from the search space
    
    # train your model using the hyperparameters
    # ...
    score = 0.5

    # return the score
    return {"score": score}


# Define the search space
search_space = {
    # Your hyperparameters go here
}

# Select the search algorithm and its parameters
# (e.g. Random Search, Bayesian Optimization, HyperBand, etc.; Searcher is the base class for all search algorithms)
algo = tune.search.Searcher(
    # Your search algorithm parameters go here
)

# Select the scheduler and its parameters
# (e.g. HyperBand, ASHAScheduler, etc.; Scheduler is the base class for all schedulers)
scheduler = tune.schedulers.TrialScheduler(
    # Your scheduler parameters go here
)

# Define the tune_config
tune_config = tune.TuneConfig(
    # Name of the metric that the trainable returns
    # and we want to optimize.
    metric="score",
    # The mode can be "min" or "max"
    mode="max",
    # The search algorithm
    search_alg=algo,
    # The scheduler
    scheduler=scheduler,
    # Number of times to sample from the hyperparameter space
    num_samples=10,
)

# Define the run_config.
run_config = train.RunConfig(stop={"training_iteration": 20})

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config,
    param_space=search_space,
)

# Start the search
results = tuner.fit() # returns result grid

This is of course very abstract. Let's look at a concrete example: grid search.

### Grid Search with Ray Tune

In this example, we will be trying to minimize the following objective:

```python
def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]
    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss})
```

with the loss function `loss_fn` defined as

```python
def loss_fn(step, width, height, activation):
    pre_act = (0.1 + width * step / 100) ** (-1) + height * 0.1
    if 'relu' == activation:
        return F.relu(pre_act)
    elif 'tanh' == activation:
        return F.sigmoid(pre_act)
```

Grid / random search (implemented by the [`BasicVariantGenerator`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.search.basic_variant.BasicVariantGenerator.html#ray.tune.search.basic_variant.BasicVariantGenerator)) is the default search algorithm in Ray Tune, it is selected automatically when no search algorithm is passed to the `Tuner`. You can find the complete list of search algorithms [in the docs](https://docs.ray.io/en/latest/tune/api/suggestion.html#random-search-and-grid-search-tune-search-basic-variant-basicvariantgenerator).

We will perform a grid search over the `activation`. This means that, for each value, either `relu` or `sigmoid`, we'll randomly sample an equal amount of values for `width`, `height`.

Our search space looks as follows:

```python
    search_space={
        "steps": 100,  # We don't want to optimize the number of steps.
        "width": tune.uniform(0, 20),
        "height": tune.uniform(-100, 100),
        "activation": tune.grid_search(["relu", "tanh"]),
    }
```

`tune.uniform` describes a uniform distribution. `tune.grid_search` guarantees that the values are sampled `num_samples` times (`num_samples` is a parameter of [`TunerConfig`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.TuneConfig.html#ray.tune.TuneConfig), see the blueprint above). For a full list of the random distributions supported by the search space API, refer to the [corresponding page in the documentation](https://docs.ray.io/en/latest/tune/api/search_space.html#tune-search-space-api).

The trainable, search algorithm, and search space are everything we need. Let's add them to the blueprint.


In [1]:
from ray import tune, train
import torch
import torch.nn.functional as F

def loss_fn(step, width, height, activation):
    pre_act = torch.tensor((0.1 + width * step / 100) ** (-1) + height * 0.1)
    if "relu" == activation:
        return F.relu(pre_act)
    elif "tanh" == activation:
        return F.tanh(pre_act)

def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]

    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss.item()})


# Define the search space
search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
    "activation": tune.grid_search(["relu", "tanh"]),
}

# Select the search algorithm and its parameters
algo = tune.search.basic_variant.BasicVariantGenerator()

# We're not using a scheduler in this example
scheduler = None

# Define the tune_config
tune_config = tune.TuneConfig(
    metric="mean_loss",
    mode="min",
    search_alg=algo,
    scheduler=scheduler,
    num_samples=50,
)

# Define the run_config.
run_config = train.RunConfig(name="Grid search experiment")

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config,
    param_space=search_space,
)

# Start the search
results = tuner.fit()  # returns result grid

0,1
Current time:,2025-03-18 14:11:22
Running for:,00:01:03.71
Memory:,8.6/16.0 GiB

Trial name,status,loc,activation,height,width,loss,iter,total time (s),iterations
trainable_548b9_00000,TERMINATED,127.0.0.1:3994,relu,-22.9221,8.41859,0.0,100,0.0107439,99
trainable_548b9_00001,TERMINATED,127.0.0.1:3998,tanh,-73.3395,2.9791,-0.999998,100,0.0131283,99
trainable_548b9_00002,TERMINATED,127.0.0.1:3997,relu,-73.2647,15.1497,0.0,100,0.0156734,99
trainable_548b9_00003,TERMINATED,127.0.0.1:3999,tanh,-19.7836,17.0463,-0.957872,100,0.00978827,99
trainable_548b9_00004,TERMINATED,127.0.0.1:3995,relu,-98.0011,18.9521,0.0,100,0.0171704,99
trainable_548b9_00005,TERMINATED,127.0.0.1:3992,tanh,-54.3481,15.8323,-0.999957,100,0.010165,99
trainable_548b9_00006,TERMINATED,127.0.0.1:3996,relu,-2.75463,3.68036,0.0,100,0.0134754,99
trainable_548b9_00007,TERMINATED,127.0.0.1:3993,tanh,-54.1314,16.4875,-0.999955,100,0.0200071,99
trainable_548b9_00008,TERMINATED,127.0.0.1:4023,relu,-74.6032,18.8045,0.0,100,0.0080533,99
trainable_548b9_00009,TERMINATED,127.0.0.1:4025,tanh,-13.7424,17.488,-0.865988,100,0.00903535,99


2025-03-18 14:11:22,987	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/jaron/ray_results/Grid search experiment' in 0.0481s.
2025-03-18 14:11:22,997	INFO tune.py:1041 -- Total run time: 63.78 seconds (63.66 seconds for the tuning loop).


You can access the best run via `ResultGrid.get_best_result()`...

In [2]:
results.get_best_result()

Result(
  metrics={'iterations': 99, 'mean_loss': -1.0},
  path='/Users/jaron/ray_results/Grid search experiment/trainable_548b9_00043_43_activation=tanh,height=-96.9614,width=13.9640_2025-03-18_14-10-19',
  filesystem='local',
  checkpoint=None
)

... and the corresponding configuration:

In [3]:
results.get_best_result().config

{'steps': 100,
 'width': 13.964045786388974,
 'height': -96.96137793658775,
 'activation': 'tanh'}

### Walking on egg shells

Now it's your turn. Use the blueprint above to optimize the following function:

$$
f(x, y) = -(y + 47) \cdot \sin\left(\sqrt{\left| \frac{x}{2} + (y + 47) \right|}\right) - x \cdot \sin\left(\sqrt{\left| x - (y + 47) \right|}\right)
$$

also known as the Eggholder function. The search domain is $-512 \leq x, y \leq 512$.

Implement the function as a trainable and find $(x, y)$ which minimize the eggholder. Use the [`HEBOSearch`](https://docs.ray.io/en/latest/tune/api/doc/ray.tune.search.hebo.HEBOSearch.html#ray.tune.search.hebo.HEBOSearch) search algorithm.

_Note: One could of course use a gradient-based optimizer to minimize this function - but that's not the point here. ;)_

In [5]:
# Your code goes here.
from ray.tune.search.hebo import HEBOSearch

def loss_eh(x, y):
    pre_act = -(y + 47) * torch.sin(torch.sqrt(torch.abs((x / 2) + (y + 47)))) - x * torch.sin(torch.sqrt(torch.abs(x - (y + 47))))
    return F.relu(pre_act)

def trainable(config):
    # Hyperparameters
    x = torch.tensor(config["x"], dtype=torch.float32)
    y = torch.tensor(config["y"], dtype=torch.float32)

    for step in range(config["steps"]):
        loss = loss_eh(x, y)
        train.report({"iterations": step, "mean_loss": loss.item()})

# Define the search space
search_space = {
    "steps": 100,
    "x": tune.uniform(-512, 512),
    "y": tune.uniform(-512, 512)
}

# Select the search algorithm and its parameters
algo = HEBOSearch()

# We're not using a scheduler in this example
scheduler = None

# Define the tune_config
tune_config = tune.TuneConfig(
    metric="mean_loss",
    mode="min",
    search_alg=algo,
    scheduler=scheduler,
    num_samples=50,
)

# Define the run_config.
run_config = train.RunConfig(name="Grid search experiment")

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config,
    param_space=search_space,
)

# Start the search
results = tuner.fit()  # returns result grid

0,1
Current time:,2025-03-18 14:26:07
Running for:,00:01:40.69
Memory:,9.6/16.0 GiB

Trial name,status,loc,x,y,loss,iter,total time (s),iterations
trainable_636b0c88,TERMINATED,127.0.0.1:7774,91.0682,-236.863,0.0,100,0.010184,99
trainable_f40944a3,TERMINATED,127.0.0.1:7784,-177.758,47.8817,0.0,100,0.00939822,99
trainable_301615a1,TERMINATED,127.0.0.1:7791,-432.637,-284.029,580.256,100,0.00973058,99
trainable_23509142,TERMINATED,127.0.0.1:7798,346.439,479.234,0.0,100,0.00901937,99
trainable_686d23ba,TERMINATED,127.0.0.1:7805,408.288,-439.591,370.519,100,0.00931358,99
trainable_1c59d2dd,TERMINATED,127.0.0.1:7812,-370.605,372.672,0.0,100,0.00937033,99
trainable_16085da3,TERMINATED,127.0.0.1:7831,-115.984,-71.3011,0.0,100,0.00910473,99
trainable_a78ad782,TERMINATED,127.0.0.1:7859,153.158,132.444,191.971,100,0.00977445,99
trainable_20cf6756,TERMINATED,127.0.0.1:7878,-127.663,-51.5655,0.0,100,0.0106044,99
trainable_ed49182a,TERMINATED,127.0.0.1:7885,-122.319,-1.47109,79.4556,100,0.0095377,99



Compiled modules for significant speedup can not be used!
https://pymoo.org/installation.html#installation

from pymoo.config import Config



2025-03-18 14:26:07,677	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/jaron/ray_results/Grid search experiment' in 0.0446s.
2025-03-18 14:26:07,683	INFO tune.py:1041 -- Total run time: 100.71 seconds (100.65 seconds for the tuning loop).


In [6]:
results.get_best_result()

Result(
  metrics={'iterations': 99, 'mean_loss': 0.0},
  path='/Users/jaron/ray_results/Grid search experiment/trainable_636b0c88_1_steps=100,x=91.0682,y=-236.8633_2025-03-18_14-24-26',
  filesystem='local',
  checkpoint=None
)

In [7]:
results.get_best_result().config

{'steps': 100, 'x': 91.0682373046875, 'y': -236.86331176757812}

## Tracking hyperparameter experiments

Ray Tune integrates with a range of experiment tracking tools, including MLflow. To integrate MLflow (or any tracking framework for that matter), there are two options.

1. the Callback API
2. the `setup_<integration>` function

The callback API is easier to set up but gives you slightly less control over what is logged to MLflow than `setup_mlflow`.
We'll only show how to use the callback API, but you are of course invited to explore the `setup_*` option on your own.

Using the callback API is as easy as adding the callback to the `RunConfig`.

In [8]:
from ray.air.integrations.mlflow import MLflowLoggerCallback

run_config_with_callback = train.RunConfig(
    name="MLFlow logging experiment",
    callbacks=[MLflowLoggerCallback(
        tracking_uri="http://localhost:8080",  # Replace with your MLFlow tracking server URI.
        experiment_name="ray-tune-experiments",
        save_artifact=True
    )],
)

Everything else is the same as before!

In [9]:
def loss_fn(step, width, height, activation):
    pre_act = torch.tensor((0.1 + width * step / 100) ** (-1) + height * 0.1)
    if "relu" == activation:
        return F.relu(pre_act)
    elif "tanh" == activation:
        return F.tanh(pre_act)

def trainable(config):
    # Hyperparameters
    width, height, activation = config["width"], config["height"], config["activation"]

    for step in range(config["steps"]):
        loss = loss_fn(step, width, height, activation)
        train.report({"iterations": step, "mean_loss": loss.item()})


# Define the search space
search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
    "activation": tune.grid_search(["relu", "tanh"]),
}

# Select the search algorithm and its parameters
algo = tune.search.basic_variant.BasicVariantGenerator()

# We're not using a scheduler in this example
scheduler = None

# Define the tune_config
tune_config = tune.TuneConfig(
    metric="mean_loss",
    mode="min",
    search_alg=algo,
    scheduler=scheduler,
    num_samples=50,
)

tuner = tune.Tuner(
    trainable=trainable,
    tune_config=tune_config,
    run_config=run_config_with_callback,
    param_space=search_space,
)

# Start the search
results = tuner.fit()  # returns result grid

0,1
Current time:,2025-03-18 14:34:22
Running for:,00:05:47.13
Memory:,10.7/16.0 GiB

Trial name,status,loc,activation,height,width,loss,iter,total time (s),iterations
trainable_e4f20_00032,RUNNING,127.0.0.1:9918,relu,-36.1854,13.5531,0.0,61.0,0.0132766,60.0
trainable_e4f20_00033,RUNNING,127.0.0.1:9950,tanh,11.1998,19.0929,0.838483,55.0,0.0121045,54.0
trainable_e4f20_00034,RUNNING,127.0.0.1:9958,relu,1.92628,2.10904,1.18579,44.0,0.0109074,43.0
trainable_e4f20_00035,RUNNING,127.0.0.1:9977,tanh,64.7402,16.9547,0.999996,50.0,0.0107286,49.0
trainable_e4f20_00036,RUNNING,127.0.0.1:9991,relu,82.7227,11.6517,8.47252,43.0,0.00889397,42.0
trainable_e4f20_00037,RUNNING,127.0.0.1:10000,tanh,41.3346,7.53196,0.999726,42.0,0.00933838,41.0
trainable_e4f20_00038,RUNNING,127.0.0.1:10001,relu,84.9256,12.0189,8.75393,32.0,0.00770497,31.0
trainable_e4f20_00039,RUNNING,127.0.0.1:10033,tanh,64.5812,0.759428,1.0,19.0,0.00541234,18.0
trainable_e4f20_00040,PENDING,,relu,-97.1368,13.8461,,,,
trainable_e4f20_00041,PENDING,,tanh,-27.0868,3.6085,,,,


🏃 View run trainable_e4f20_00002 at: http://localhost:8080/#/experiments/429530962525741147/runs/83d9088166ff4804b023f2e1fb7bd319
🧪 View experiment at: http://localhost:8080/#/experiments/429530962525741147
🏃 View run trainable_e4f20_00000 at: http://localhost:8080/#/experiments/429530962525741147/runs/891a2c7911574897a32f682f9473913e
🧪 View experiment at: http://localhost:8080/#/experiments/429530962525741147
🏃 View run trainable_e4f20_00001 at: http://localhost:8080/#/experiments/429530962525741147/runs/d7e75a09a61244f2877631fcfce6f2ce
🧪 View experiment at: http://localhost:8080/#/experiments/429530962525741147
🏃 View run trainable_e4f20_00007 at: http://localhost:8080/#/experiments/429530962525741147/runs/0a192d6e3f4b4ccea1206fa9e89ce1d3
🧪 View experiment at: http://localhost:8080/#/experiments/429530962525741147
🏃 View run trainable_e4f20_00006 at: http://localhost:8080/#/experiments/429530962525741147/runs/cf8c554f1aa344e598e303c170dd7509
🧪 View experiment at: http://localhost:808

2025-03-18 14:34:22,856	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/Users/jaron/ray_results/MLFlow logging experiment' in 0.0388s.
2025-03-18 14:34:22,910	INFO tune.py:1041 -- Total run time: 347.28 seconds (347.09 seconds for the tuning loop).
Resume experiment with: Tuner.restore(path="/Users/jaron/ray_results/MLFlow logging experiment", trainable=...)
- trainable_e4f20_00040: FileNotFoundError('Could not fetch metrics for trainable_e4f20_00040: both result.json and progress.csv were not found at /Users/jaron/ray_results/MLFlow logging experiment/trainable_e4f20_00040_40_activation=relu,height=-97.1368,width=13.8461_2025-03-18_14-28-35')
- trainable_e4f20_00041: FileNotFoundError('Could not fetch metrics for trainable_e4f20_00041: both result.json and progress.csv were not found at /Users/jaron/ray_results/MLFlow logging experiment/trainable_e4f20_00041_41_activation=tanh,height=-27.0868,width=3.6085_2025-03-18_14-28-35')
- trainable_e4

If you head over to your MLFlow tracking server, you should see the experiment and the runs that were logged.

![Ray Tune experiment in MLflow](imgs/mlflow_raytune_experiment.png)