# Sweeping over hyperparameters

I advice you to check [Hydra](https://hydra.cc/docs/intro/)'s website with all its great guides to better understand this notebook, specifically the [multirun](https://hydra.cc/docs/tutorials/basic/running_your_app/multi-run/) guide for this notebook.

In this notebook, we will see how to [sweep](https://en.wikipedia.org/wiki/Hyperparameter_optimization) over some particular parameters to get some results from the paper in a one command line.

Let's say we want to reproduce the result from the GenFL - FL-SOB - Posterior from Random Prior with different seeds to check the stability of the randomness in the code, and with the two different pac-bayes objective $f_1$ and $f_2$ all in once.

Let's create a new directory `./conf/experiment`, in which we will build new configuration files. Let's create `sweep_over_seeds_and_objectives.yaml` in this directory, like this:
```yaml
# @package _global_

hydra:
  sweeper:
    params:
      pbobj.objective: fclassic, fquad
      seed: 0, 1, 2, 3, 4
```

Now let's run the next command line. This will launch 10 runs in total: [pbobj.objective=fclassic, pbobj.objective=fquad] x [seed=0, seed=1, seed=2, seed=3, seed=4] (every couple made by one element from the first set and one element from the second set). For safety and explanatory reasons, we use `dryrun=True` and num_rounds=1.
To make all these runs happens, we have to specify `--multirun` in the command line.

In [1]:
%cd ..

/volatile/home/pj273170/Code/PAC_Bayes


In [None]:
%%bash

python GenFL_posterior.py --multirun \
    +scenario=perez_posterior_rand \
    +experiment=sweep_over_seeds_and_objectives \
    dryrun=True \
    num_rounds=1 \

Hydra made a directory `./outputs/GenFL_Posterior/[Date]/[Time]` inside which there are `multirun.yaml` file and 10 differents directories :

```bash
(main) me@here:./outputs/GenFL_Posterior/2023-11-10/19-15-02$ tree -L 1
.
├── 0_pbobj.objective=fclassic_seed=0_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 1_pbobj.objective=fclassic_seed=1_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 2_pbobj.objective=fclassic_seed=2_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 3_pbobj.objective=fclassic_seed=3_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 4_pbobj.objective=fclassic_seed=4_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 5_pbobj.objective=fquad_seed=0_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 6_pbobj.objective=fquad_seed=1_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 7_pbobj.objective=fquad_seed=2_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 8_pbobj.objective=fquad_seed=3_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
├── 9_pbobj.objective=fquad_seed=4_+scenario=perez_posterior_rand_+experiment=sweep_over_seeds_and_objectives_dryrun=True_num_rounds=1
└── multirun.yaml
```

## Modify directoy names / Remove redundant information

As you can see the names are quite long and redundant. The names are made with the `overrides` values which are the values that differ from the `default` config.
It is possible to not print them all thanks to `my_subdir_suffix_impl` function from `./core/utils.py`. This function allow to remove the `overrides` values whose names have a certain regular expression. For instance the `+experiment=sweep_over_seeds_and_objectives` is kind of useless because the information is already contained by `pbobj.objective=XXXX_seed=X`. hence to remove it we can modify the yaml file as:

```yaml
# @package _global_

my_excludes:
  experiment: experiment*

hydra:
  sweeper:
    params:
      pbobj.objective: fclassic, fquad
      seed: 0, 1, 2, 3, 4
```

This `multirun` functionality is very useful to launch multiple runs in once. This can be used to explore hyperparameters space in a efficient way.
It is extremly efficient to use it in combination with a slurm cluster to run all these jobs in a parallel fashion. See the Slurm notebook for more information.

In addition, we only saw exhaustive sweeping (grid search) over parameters in this notebook, which can be not efficient. However Hydra offers state-of-the-art algorithms for hyperparameter optimization, see this [page](https://hydra.cc/docs/plugins/nevergrad_sweeper/) for more information.