GitHub - SamsungLabs/tqc: Implementation of Truncated Quantile Critics method for continuous reinforcement learning.

This repository implements continuous reinforcement learning method TQC, described in paper "Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics". Source code is based on Softlearning, and we thank the authors for a good framework. For more exaustive Readme (for example, docker usage), please refer to the original repo.

Our method is implemented in module ${SOURCE_PATH}/softlearning/algorithms/tqc.py.

MuJoCo Installation

Download and install MuJoCo 1.50 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150). Gym and MuJoCo 2.0 have integration bug, where Gym doesn't process contanct forces correctly for environments Humanoid and Ant. Please use MuJoCo 1.5.
Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:

Conda installation

Create and activate conda environment, install softlearning to enable command line interface.

cd ${SOURCE_PATH}
conda env create -f environment.yml
conda activate tqc

Training and simulating an agent

To train the agent
```
./run_tqc.sh --alg_top_crop_quantiles=2 --domain=Walker2d
```
Number of atoms to remove for each environment:

Environment alg_top_crop_quantiles

Hopper 5

HalfCheetah 0

Walker2d 2

Ant 2

Humanoid 2

You can look at full list of parameters inside the run_tqc.sh.
To simulate the resulting policy:

First, find the path that the checkpoint is saved to. By default, the data is saved under ${SOURCE_PATH}/ray_results/<universe>/<domain>/<task>/<datatimestamp>-<exp-name>/<trial-id>/<checkpoint-id>.

For example: ${SOURCE_PATH}/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/.

The next command assumes environment var ${CHECKPOINT_DIR} contains ${SOURCE_PATH}/ray_results/....

python ./examples/development/simulate_policy.py \
    ${CHECKPOINT_DIR} \
    --max-path-length=1000 \
    --num-rollouts=1 \
    --render-mode=human

Run curves

tqc_curves.pkl contains evaluation returns of TQC agent, which were used for plotting learning curves in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 1,465 Commits
config		config
docker		docker
examples		examples
models		models
scripts		scripts
softlearning		softlearning
tests		tests
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run_tqc.sh		run_tqc.sh
setup.py		setup.py
tqc_curves.pkl		tqc_curves.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuJoCo Installation

Conda installation

Training and simulating an agent

Run curves

About

Releases

Packages

Languages

Environment	alg_top_crop_quantiles
Hopper	5
HalfCheetah	0
Walker2d	2
Ant	2
Humanoid	2

License

SamsungLabs/tqc

Folders and files

Latest commit

History

Repository files navigation

MuJoCo Installation

Conda installation

Training and simulating an agent

Run curves

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages