Skip to content

Latest commit



120 lines (65 loc) · 3.08 KB


File metadata and controls

120 lines (65 loc) · 3.08 KB

Train an Agent

Basic Usage

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.


Once RL Zoo3 is install, you can do python -m rl_zoo3.train from any folder, it is equivalent to python

If the environment exists in this file, then you can train an agent using:

python --algo algo_name --env env_id


You can use -P (--progress) option to display a progress bar.

Custom Config File

Using a custom config file when it is a yaml file with a which contains a env_id entry:

python --algo algo_name --env env_id --conf-file my_yaml.yml

You can also use a python file that contains a dictionary called hyperparams with an entry for each env_id. (see hyperparams/python/ for an example)

# You can pass a path to a python file
python --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams/python/
# Or pass a path to a file from a module (for instance my_package.my_file)
python --algo ppo --env MountainCarContinuous-v0 --conf-file hyperparams.python.ppo_config_example

The advantage of this approach is that you can specify arbitrary python dictionaries and ensure that all their dependencies are imported in the config file itself.

Tensorboard, Checkpoints, Evaluation

For example (with tensorboard support):

python --algo ppo --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Evaluate the agent every 10000 steps using 10 episodes for evaluation (using only one evaluation env):

python --algo sac --env AntBulletEnv-v0 --eval-freq 10000 --eval-episodes 10 --n-eval-envs 1

Save a checkpoint of the agent every 100000 steps:

python --algo td3 --env AntBulletEnv-v0 --save-freq 100000

Resume Training

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python --algo a2c --env BreakoutNoFrameskip-v4 -i rl-trained-agents/a2c/BreakoutNoFrameskip-v4_1/ -n 5000

Save Replay Buffer

When using off-policy algorithms, you can also save the replay buffer after training:

python --algo sac --env Pendulum-v1 --save-replay-buffer

It will be automatically loaded if present when continuing training.

Env keyword arguments

You can specify keyword arguments to pass to the env constructor in the command line, using --env-kwargs:

python --algo ppo --env MountainCar-v0 --env-kwargs goal_velocity:10

Overwrite hyperparameters

You can easily overwrite hyperparameters in the command line, using --hyperparams:

python --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"

Note: if you want to pass a string, you need to escape it like that: my_string:"'value'"