We use Optuna for optimizing the hyperparameters. Not all hyperparameters are tuned, and tuning enforces certain default hyperparameter settings that may be different from the official defaults. See rl_zoo3/hyperparams_opt.py for the current settings for each agent.
Hyperparameters not specified in rl_zoo3/hyperparams_opt.py are taken from the associated YAML file and fallback to the default values of SB3 if not present.
Note: when using SuccessiveHalvingPruner (“halving”), you must specify
--n-jobs > 1
Budget of 1000 trials with a maximum of 50000 steps:
python train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \ --sampler tpe --pruner median
Distributed optimization using a shared database is also possible (see the corresponding Optuna documentation):
python train.py --algo ppo --env MountainCar-v0 -optimize --study-name test --storage sqlite:///example.db
Print and save best hyperparameters of an Optuna study:
python scripts/parse_study.py -i path/to/study.pkl --print-n-best-trials 10 --save-n-best-hyperparameters 10
The default budget for hyperparameter tuning is 500 trials and there is one intermediate evaluation for pruning/early stopping per 100k time steps.
Note that the default hyperparameters used in the zoo when tuning are not always the same as the defaults provided in stable-baselines3. Consult the latest source code to be sure of these settings. For example:
- PPO tuning assumes a network architecture with
ortho_init = False
when tuning, though it isTrue
by default. You can change that by updating rl_zoo3/hyperparams_opt.py. - Non-episodic rollout in TD3 and DDPG assumes
gradient_steps = train_freq
and so tunes onlytrain_freq
to reduce the search space.
When working with continuous actions, we recommend to enable gSDE by uncommenting lines in rl_zoo3/hyperparams_opt.py.