AMMI Bootcamp II, Summer 2021

Decision Transformer AMMI

This is a re-implementation of Decision Transformer as a part of AMMI Bootcamp II. I chose only to re-implemnt the gym code, due to time constrain and limited experience in Atari tasks. I took some codes and scripts from the original repo (mentioned where taken); because I want to focus only on the algorithmic part. (I'm running the experiement now, no reported results yet)

Paper: Decision Transformer | Original Repo | Google Slides | W&B

Installation

Ubuntu 20.04

Create a new cond aenvironment:

conda create -n dt-gym-ammi python=3.8

Install the following python packages, using:

pip install numpy==1.20.3 torch==1.8.1 transformers==4.5.1 wandb==0.9.1 gym==0.18.3

Install MuJoCo and mujoco-py:

sudo apt-get install ffmpeg

pip install mujoco-py==2.0.2.13

MacOS Big Sur

Create a new cond aenvironment:

conda create -n dt-gym-ammi python=3.8

Install the following python packages, using:

pip install numpy==1.20.3 torch==1.8.1 transformers==4.5.1 wandb==0.9.1 gym==0.18.3

Install MuJoCo and mujoco-py:

brew install ffmpeg gcc

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco200/bin

pip install mujoco-py==2.0.2.13

Run an Experiement

To download offline datasets, you should install D4RL and then run:

python data/d4rl_dataset.py

Configure your setting in config/, and then run:

python experiment.py -cfg <config file name-.py>

for example:

python experiment.py -cfg dt_gym_halfcheetah

Results

Due to time limit and limited compute resources, I chose a subset of tasks to evaluate and validate my re-implementation: HalfCheetah, Walker, and Hopper for Medium-Expert and Medium offline datasets. Check W&B

Dataset	Environement	DT (mine)	DT (paper)
Medium-Expert	HalfCheetah	? ± ?	68.8 ± 1.3
Medium-Expert	Hopper	? ± ?	107.6 ± 1.8
Medium-Expert	Walker2d	? ± ?	108.1 ± 0.2
Medium	HalfCheetah	? ± ?	42.6 ± 0.1
Medium	Hopper	? ± ?	67.6 ± 1.0
Medium	Walker2d	? ± ?	74.0 ± 1.4
Medium-Replay	HalfCheetah	? ± ?	36.6 ± 0.8

Next, I'll run the following:

Dataset	Environement	DT (mine)	DT (paper)
Medium-Replay	Hopper	? ± ?	82.7 ± 7.0
Medium-Replay	Walker2d	? ± ?	66.6 ± 3.0

Note: The above results are normalized scores for those tasks, to calculate the normalized score from the final return:

$norm\_score = \frac{score - min\_score}{max\_score - min\_score} * 100$

where the score is the return from the plot, and the min-max scores for the environments are in the following table:

Environement	Min	Max
HalfCheetah	-280.178953	12135.0
Hopper	-20.272305	3234.3
Walker2d	1.629008	4592.3

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
gym		gym
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gym

gym

.gitignore

.gitignore

README.md

README.md

Repository files navigation

AMMI Bootcamp II, Summer 2021

Decision Transformer AMMI

Installation

Ubuntu 20.04

MacOS Big Sur

Run an Experiement

Results

About

Releases

Packages

Languages

RamiRibat/decision-transformer-ammi

Folders and files

Latest commit

History

Repository files navigation

AMMI Bootcamp II, Summer 2021

Decision Transformer AMMI

Installation

Ubuntu 20.04

MacOS Big Sur

Run an Experiement

Results

About

Resources

Stars

Watchers

Forks

Languages