# Learning to optimize the $a$ and $b$ parameters of a function $a^2+b$ to minimize it

In [2]:
from pathlib import Path
from ray import tune, air

The objective function represents the function we want to maximize or minimize. The config parameter should contain the parameters that we want to find the best possible value. In the following example, we want to find the $a$ and $b$ values that minimize the given function.

In [3]:
def objective(config):
    score = config["a"] ** 2 + config["b"]
    return {"score": score}

In the following cell, we show the score values obtained for different values of $a$ and $b$ to illustrate the objective function process.

In [4]:
config_example1 = {"a": 5, "b": 7}
print(
    f"a={config_example1['a']}, b={config_example1['b']}, result={objective(config_example1)['score']}"
)
config_example2 = {"a": -4, "b": -1}
print(
    f"a={config_example2['a']}, b={config_example2['b']}, result={objective(config_example2)['score']}"
)
config_example3 = {"a": 7, "b": 5}
print(
    f"a={config_example3['a']}, b={config_example3['b']}, result={objective(config_example3)['score']}"
)

a=5, b=7, result=32
a=-4, b=-1, result=15
a=7, b=5, result=54


We need to define the search space of the variables we want to optimize. In this case, we have two variables, $a$ and $b$, and we want to optimize them. We can define the search space as follows:

In [5]:
search_space = {"a": tune.uniform(0, 100), "b": tune.choice([0, 10, 500, 21])}

print(f"Variable a sample:{search_space['a'].sample()}")
print(f"Variable b sample:{search_space['b'].sample()}")

# Execute this cell multiple times to see different samples

Variable a sample:60.37564198760494
Variable b sample:0


In this case, the variable $a$ can contain a value between 0 and 99 distributed with a uniform probability, and $b$ is a chosen value between the options $0$, $10$, $500$, and $21$.

There are different search space types available in the [Ray Tune API](https://docs.ray.io/en/latest/tune/api/search_space.html) to utilize according with the needs.

## Configuring/running the Tuner to optimize the function considering the defined parameter space

It is the same tune.Tuner structure we used in the [lesson 1 activity 2](../lesson_1/2-simple_ppo_agent_to_environment.ipynb) to implement the PPO Rl agent, but here we are utilizing it without considering RL. Here, we don't have the concept of episode since we are not using RL and each Trial (experiment considering a combination of the optimized parameters) has only one sample. **Executing the cell below, we obtain a summary of the obtained results by randomly sampling parameter values using the defined search spaces**.

In [7]:
store_results_path = str(Path("./ray_results/").resolve()) + "/nb_1/"
agent_name = "random_agent"
tuner = tune.Tuner(
    objective,
    param_space=search_space,
    run_config=air.RunConfig(
        verbose=2,
        storage_path=store_results_path,
        name=agent_name,
    ),
    tune_config=tune.TuneConfig(
        num_samples=100
    ),  # Running 100 different trials (combinations of hyperparameters)
)
results = tuner.fit()
print(results.get_best_result(metric="score", mode="min").config)

2024-11-29 23:06:41,694	INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


0,1
Current time:,2024-11-29 23:06:57
Running for:,00:00:15.34
Memory:,3.7/23.9 GiB

Trial name,status,loc,a,b,iter,total time (s),score
objective_986eb_00000,TERMINATED,200.239.93.233:441293,44.5602,10,1,0.00378323,1995.61
objective_986eb_00001,TERMINATED,200.239.93.233:441294,45.3174,21,1,0.00434995,2074.66
objective_986eb_00002,TERMINATED,200.239.93.233:441295,60.7008,21,1,0.00034976,3705.59
objective_986eb_00003,TERMINATED,200.239.93.233:441298,79.215,21,1,0.000437498,6296.02
objective_986eb_00004,TERMINATED,200.239.93.233:441297,31.3767,0,1,0.000330687,984.496
objective_986eb_00005,TERMINATED,200.239.93.233:441296,64.6036,500,1,0.000328302,4673.63
objective_986eb_00006,TERMINATED,200.239.93.233:441299,95.172,500,1,0.000333786,9557.71
objective_986eb_00007,TERMINATED,200.239.93.233:441300,37.6027,500,1,0.000344753,1913.96
objective_986eb_00008,TERMINATED,200.239.93.233:441301,92.5968,0,1,0.000422239,8574.17
objective_986eb_00009,TERMINATED,200.239.93.233:441302,53.7695,500,1,0.000395775,3391.16


Trial name,score
objective_986eb_00000,1995.61
objective_986eb_00001,2074.66
objective_986eb_00002,3705.59
objective_986eb_00003,6296.02
objective_986eb_00004,984.496
objective_986eb_00005,4673.63
objective_986eb_00006,9557.71
objective_986eb_00007,1913.96
objective_986eb_00008,8574.17
objective_986eb_00009,3391.16


2024-11-29 23:06:57,042	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/home/lasse/ray_minicourse/lesson_2/ray_results/nb_1/random_agent' in 0.0253s.
2024-11-29 23:06:57,057	INFO tune.py:1041 -- Total run time: 15.36 seconds (15.32 seconds for the tuning loop).


{'a': 2.787670560838451, 'b': 0}


Let's check the tensorboard files containing the score obtained by each trial during the experiment. Try looking at the different metrics, focusing on the `ray/tune/score` results which contain the score values obtained by each Trial.


In [None]:
%load_ext tensorboard
%tensorboard --logdir ray_results/nb_1

Reusing TensorBoard on port 6006 (pid 449972), started 0:00:20 ago. (Use '!kill 449972' to kill it.)