Software and Results for the State Planning Policy Reinforcement Learning Paper

This repository is the official implementation of the State Planning Policy Reinforcement Learning.

You can find videos presenting the trained agents online https://sites.google.com/view/spprl

Requirements and Installation

Code was run on Ubuntu 20.04

Below we list Ubuntu 20.4 install notes

download mujoco200 linux https://www.roboti.us/index.html and put into .mujoco directory with licence add following line to .bashrc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to .mujoco)/mujoco200/bin

install mujoco-py requirements

sudo apt install cmake
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3

install patchelf

sudo add-apt-repository ppa:jamesh/snap-support
sudo apt-get update
sudo apt install patchelf

pip install -r rltoolkit/requirements.txt

Requirements will install mujoco-py which will work only on installed mujoco with licence (see Install MuJoCo section in mujoco-py documentation)

Then install rltoolkit with:

pip install -e rltoolkit/

Training

To train the models in the paper, you can use scripts from train folder. We provide a separate script per environment class.

MuJoCo

To train SPP-TD3 on Ant, simply run:

python train/mujoco/run_experiment.py td3 Ant --spp -c train/mujoco/configs/spp_td3_optimal_mujoco.yaml

SPP-SAC can be run by replacing td3 with sac, and using the config file configs/spp_sac_optimal_mujoco.yaml

to run Humanoid instead of Ant can be replaced with Humanoid.

Analogously, to train vanilla TD3 on Ant (remove --spp and change the config file) run:

python train/mujoco/run_experiment.py td3 Ant -c train/mujoco/configs/td3_base.yaml

Our running script accepts several other useful parameters, including --n_runs how many runs, --n_cores how many cores use in parallel.

Also neptune.ai logging can be used by providing --neptune_proj project name and --neptune_token token.

SafetyGym

First install requirements

pip install --no-deps -r train/safetygym/requirements.txt

To train SPP-TD3 on Doggo Goal, simply run:

python train/safetygym/run_experiment.py td3 Goal Doggo --spp -c train/safetygym/configs/spp_td3_optimal_safetygym.yaml

To run Doggo Button instead of Goal replace Goal with Button, proceed similarly for Doggo Columns and Car Push.

To train vanilla TD3/SAC on Doggo Goal run (for SAC replace td3 with sac):

python train/safetygym/run_experiment.py td3 Goal Doggo -c train/safetygym/configs/td3_base.yaml

Our running script accepts several other useful parameters, including --n_runs how many runs, --n_cores how many cores use in parallel.

Also neptune.ai logging can be used by providing --neptune_proj project name and --neptune_token token.

AntPush

To train SPP-TD3 on AntPush, simply run:

python train/antpush/run_experiment.py td3 AntPush --spp -c train/antpush/configs/spp_td3_optimal_antpush.yaml

Other algorithms and environments were not tested.

Evaluation

Model evaluation code is available in the jupyter notebook: notebooks/load_and_evaluate_agents.ipynb. There you can load pre-trained models, evaluate their reward, and render in the environment.

Pre-trained Models

You can find pre-trained models in models directory and a notebook for evaluating them will be provided shortly.

Results

Our model achieves the following performance on OpenAI gym MuJoCo environments:

Ant results:

Humanoid results:

Our model achieves the following performance on OpenAI safety-gym environments:

Doggo-Goal results:

Doggo-Button results:

Car-Push results:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Software and Results for the State Planning Policy Reinforcement Learning Paper

Requirements and Installation

Training

MuJoCo

SafetyGym

AntPush

Evaluation

Pre-trained Models

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Software and Results for the State Planning Policy Reinforcement Learning Paper

Requirements and Installation

Training

MuJoCo

SafetyGym

AntPush

Evaluation

Pre-trained Models

Results