# Learning to utilize Ray Tune with the $a$ and $b$ parameters of a function $a^2+b$

In [1]:
from pathlib import Path
from ray import tune, air, train

The objective function represents the function we want to maximize or minimize. The config parameter should contain the parameters that we want to find the best possible value. In the following example, we want to find the $a$ and $b$ values that minimize the given function. In the cell below we implement an objective function (trainable) using the Ray Tune function API.

In [2]:
def objective(config):
    score = config["a"] ** 2 + config["b"]
    return {"score": score}


def trainable(config):  # Pass a "config" dictionary into your trainable.

    for x in range(100):  # "Train" for 100 iterations and compute intermediate scores.
        score = objective(config)

        train.report({"score": score})  # Send the score to Tune.

In the following cell, we show the score values obtained for different values of $a$ and $b$ to illustrate the objective function process.

In [3]:
config_example1 = {"a": 5, "b": 7}
print(
    f"a={config_example1['a']}, b={config_example1['b']}, result={objective(config_example1)['score']}"
)
config_example2 = {"a": -4, "b": -1}
print(
    f"a={config_example2['a']}, b={config_example2['b']}, result={objective(config_example2)['score']}"
)
config_example3 = {"a": 7, "b": 5}
print(
    f"a={config_example3['a']}, b={config_example3['b']}, result={objective(config_example3)['score']}"
)

a=5, b=7, result=32
a=-4, b=-1, result=15
a=7, b=5, result=54


We need to define the search space of the variables we want to optimize. In this case, we have two variables, $a$ and $b$, and we want to optimize them. We can define the search space as follows:

In [4]:
search_space = {"a": tune.uniform(0, 100), "b": tune.choice([0, 10, 500, 21])}

print(f"Variable a sample:{search_space['a'].sample()}")
print(f"Variable b sample:{search_space['b'].sample()}")

# Execute this cell multiple times to see different samples

Variable a sample:27.78486628444198
Variable b sample:21


In this case, the variable $a$ can contain a value between 0 and 99 distributed with a uniform probability, and $b$ is a chosen value between the options $0$, $10$, $500$, and $21$.

There are different search space types available in the [Ray Tune API](https://docs.ray.io/en/latest/tune/api/search_space.html) to utilize according with the needs.

## Configuring/running the Tuner to optimize the function considering the defined parameter space

It is the same tune.Tuner structure we used in the [lesson 1 activity 2](../lesson_1/2-simple_ppo_agent_to_environment.ipynb) to implement the PPO Rl agent, but here we are utilizing it without considering RL.**Executing the cell below, we obtain a summary of the obtained results by randomly sampling parameter values using the defined search spaces**.

In [5]:
store_results_path = str(Path("./ray_results/").resolve()) + "/nb_1/"
agent_name = "random_agent"
tuner = tune.Tuner(
    trainable,  # Here, you pass our trainable function
    param_space=search_space,
    run_config=air.RunConfig(
        verbose=2,
        storage_path=store_results_path,
        name=agent_name,
    ),
    tune_config=tune.TuneConfig(
        num_samples=100
    ),  # Running 100 different trials (combinations of hyperparameters)
)
results = tuner.fit()

2024-11-30 00:00:43,797	INFO worker.py:1783 -- Started a local Ray instance.
2024-11-30 00:00:44,338	INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2024-11-30 00:00:44,340	INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


0,1
Current time:,2024-11-30 00:01:12
Running for:,00:00:26.40
Memory:,3.9/23.9 GiB

Trial name,status,loc,a,b,iter,total time (s)
trainable_2533a_00000,TERMINATED,200.239.93.233:490982,19.971,10,100,0.013304
trainable_2533a_00001,TERMINATED,200.239.93.233:490983,54.377,500,100,0.0148187
trainable_2533a_00002,TERMINATED,200.239.93.233:490984,26.1984,0,100,0.0120797
trainable_2533a_00003,TERMINATED,200.239.93.233:490985,67.0128,10,100,0.0387573
trainable_2533a_00004,TERMINATED,200.239.93.233:490986,62.6673,0,100,0.0116091
trainable_2533a_00005,TERMINATED,200.239.93.233:490992,82.3452,0,100,0.0121455
trainable_2533a_00006,TERMINATED,200.239.93.233:490987,44.0245,0,100,0.0115938
trainable_2533a_00007,TERMINATED,200.239.93.233:490988,90.7716,0,100,0.0125358
trainable_2533a_00008,TERMINATED,200.239.93.233:490989,13.0525,21,100,0.0123551
trainable_2533a_00009,TERMINATED,200.239.93.233:490990,65.7566,10,100,0.0123911


Trial name,score
trainable_2533a_00000,{'score': 408.84215311900647}
trainable_2533a_00001,{'score': 3456.8596559558337}
trainable_2533a_00002,{'score': 686.3535968931619}
trainable_2533a_00003,{'score': 4500.7088193554955}
trainable_2533a_00004,{'score': 3927.192243676915}
trainable_2533a_00005,{'score': 6780.737194389832}
trainable_2533a_00006,{'score': 1938.1545787275404}
trainable_2533a_00007,{'score': 8239.477326166882}
trainable_2533a_00008,{'score': 191.36809204443404}
trainable_2533a_00009,{'score': 4333.933801818879}


2024-11-30 00:01:12,070	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/home/lasse/ray_minicourse/lesson_2/ray_results/nb_1/random_agent' in 0.0336s.
2024-11-30 00:01:12,084	INFO tune.py:1041 -- Total run time: 27.74 seconds (26.37 seconds for the tuning loop).


We can check the best (minimum) obtained result using:

In [7]:
print(
    f"Best obtained parameters considering random samples: {results.get_best_result(metric='score/score', mode='min').config}, score: {objective(results.get_best_result(metric='score/score', mode='min').config)['score']}"
)

Best obtained parameters considering random samples: {'a': 2.1088226524589304, 'b': 0}, score: 4.447132979523919


Remember we are just sampling random values and testing them in the function objective.

Let's check the tensorboard files containing the average score obtained by each trial during the experiment (remembering each trial has 100 steps). Try looking at the different metrics, focusing on the `ray/tune/score` results which contain the score values obtained by each Trial.


In [8]:
%load_ext tensorboard
%tensorboard --logdir ray_results/nb_1

The results above for `ray/tune/score/score` shows the trial values obtained during 100 steps, It is important to emphasize that the score value does not change for each trial over the steps since in each trial the parameters $a$ and $b$ are kept the same.