Skip to content

State Planning Policy RL method software and training codes for the benchmarks presented in the paper

License

Notifications You must be signed in to change notification settings

MIMUW-RL/spp-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Software and Results for the State Planning Policy Reinforcement Learning Paper

This repository is the official implementation of the State Planning Policy Reinforcement Learning.

You can find videos presenting the trained agents online https://sites.google.com/view/spprl

SPPRL

Requirements and Installation

Code was run on Ubuntu 20.04

Below we list Ubuntu 20.4 install notes

  1. download mujoco200 linux https://www.roboti.us/index.html and put into .mujoco directory with licence add following line to .bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:(path to .mujoco)/mujoco200/bin
  1. install mujoco-py requirements
sudo apt install cmake
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
  1. install patchelf
sudo add-apt-repository ppa:jamesh/snap-support
sudo apt-get update
sudo apt install patchelf
pip install -r rltoolkit/requirements.txt

Requirements will install mujoco-py which will work only on installed mujoco with licence (see Install MuJoCo section in mujoco-py documentation)

Then install rltoolkit with:

pip install -e rltoolkit/

Training

To train the models in the paper, you can use scripts from train folder. We provide a separate script per environment class.

MuJoCo

To train SPP-TD3 on Ant, simply run:

python train/mujoco/run_experiment.py td3 Ant --spp -c train/mujoco/configs/spp_td3_optimal_mujoco.yaml

SPP-SAC can be run by replacing td3 with sac, and using the config file configs/spp_sac_optimal_mujoco.yaml

to run Humanoid instead of Ant can be replaced with Humanoid.

Analogously, to train vanilla TD3 on Ant (remove --spp and change the config file) run:

python train/mujoco/run_experiment.py td3 Ant -c train/mujoco/configs/td3_base.yaml

Our running script accepts several other useful parameters, including --n_runs how many runs, --n_cores how many cores use in parallel.

Also neptune.ai logging can be used by providing --neptune_proj project name and --neptune_token token.

SafetyGym

First install requirements

pip install --no-deps -r train/safetygym/requirements.txt

To train SPP-TD3 on Doggo Goal, simply run:

python train/safetygym/run_experiment.py td3 Goal Doggo --spp -c train/safetygym/configs/spp_td3_optimal_safetygym.yaml

To run Doggo Button instead of Goal replace Goal with Button, proceed similarly for Doggo Columns and Car Push.

To train vanilla TD3/SAC on Doggo Goal run (for SAC replace td3 with sac):

python train/safetygym/run_experiment.py td3 Goal Doggo -c train/safetygym/configs/td3_base.yaml

Our running script accepts several other useful parameters, including --n_runs how many runs, --n_cores how many cores use in parallel.

Also neptune.ai logging can be used by providing --neptune_proj project name and --neptune_token token.

AntPush

To train SPP-TD3 on AntPush, simply run:

python train/antpush/run_experiment.py td3 AntPush --spp -c train/antpush/configs/spp_td3_optimal_antpush.yaml

Other algorithms and environments were not tested.

Evaluation

Model evaluation code is available in the jupyter notebook: notebooks/load_and_evaluate_agents.ipynb. There you can load pre-trained models, evaluate their reward, and render in the environment.

Pre-trained Models

You can find pre-trained models in models directory and a notebook for evaluating them will be provided shortly.

Results

Our model achieves the following performance on OpenAI gym MuJoCo environments:

Ant results:

ant (spp)ddpg ant (spp)sac ant (spp)td3

Humanoid results:

humanoid (spp)ddpg humanoid (spp)sac humanoid (spp)td3

Our model achieves the following performance on OpenAI safety-gym environments:

Doggo-Goal results:

doggo goal td3

Doggo-Button results:

doggo button td3

Car-Push results:

car push td3