This repository is the official implementation of the State Planning Policy Reinforcement Learning.
You can find videos presenting the trained agents online https://sites.google.com/view/spprl
Code was run on Ubuntu 20.04
Below we list Ubuntu 20.4 install notes
- download mujoco200 linux https://www.roboti.us/index.html and put into .mujoco directory with licence add following line to .bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to .mujoco)/mujoco200/bin
- install mujoco-py requirements
sudo apt install cmake
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- install patchelf
sudo add-apt-repository ppa:jamesh/snap-support
sudo apt-get update
sudo apt install patchelf
pip install -r rltoolkit/requirements.txt
Requirements will install mujoco-py which will work only on installed mujoco with licence (see Install MuJoCo section in mujoco-py documentation)
Then install rltoolkit
with:
pip install -e rltoolkit/
To train the models in the paper, you can use scripts from train
folder.
We provide a separate script per environment class.
To train SPP-TD3 on Ant, simply run:
python train/mujoco/run_experiment.py td3 Ant --spp -c train/mujoco/configs/spp_td3_optimal_mujoco.yaml
SPP-SAC can be run by replacing td3
with sac
, and using the config file
configs/spp_sac_optimal_mujoco.yaml
to run Humanoid instead of Ant
can be replaced with Humanoid
.
Analogously, to train vanilla TD3 on Ant (remove --spp
and change the config file) run:
python train/mujoco/run_experiment.py td3 Ant -c train/mujoco/configs/td3_base.yaml
Our running script accepts several other useful parameters,
including --n_runs
how many runs, --n_cores
how many cores use in parallel.
Also neptune.ai logging can be used by providing --neptune_proj
project name and --neptune_token
token.
First install requirements
pip install --no-deps -r train/safetygym/requirements.txt
To train SPP-TD3 on Doggo Goal, simply run:
python train/safetygym/run_experiment.py td3 Goal Doggo --spp -c train/safetygym/configs/spp_td3_optimal_safetygym.yaml
To run Doggo Button
instead of Goal
replace Goal
with Button
, proceed similarly for Doggo Columns and Car Push.
To train vanilla TD3/SAC on Doggo Goal run (for SAC
replace td3
with sac
):
python train/safetygym/run_experiment.py td3 Goal Doggo -c train/safetygym/configs/td3_base.yaml
Our running script accepts several other useful parameters,
including --n_runs
how many runs, --n_cores
how many cores use in parallel.
Also neptune.ai logging can be used by providing --neptune_proj
project name and --neptune_token
token.
To train SPP-TD3 on AntPush, simply run:
python train/antpush/run_experiment.py td3 AntPush --spp -c train/antpush/configs/spp_td3_optimal_antpush.yaml
Other algorithms and environments were not tested.
Model evaluation code is available in the jupyter notebook: notebooks/load_and_evaluate_agents.ipynb
.
There you can load pre-trained models, evaluate their reward, and render in the environment.
You can find pre-trained models in models
directory and a notebook for evaluating them will be provided shortly.
Our model achieves the following performance on OpenAI gym MuJoCo environments:
Ant results:
Humanoid results:
Our model achieves the following performance on OpenAI safety-gym environments:
Doggo-Goal results:
Doggo-Button results:
Car-Push results: