offline-rl-sequence-modeling

Three transformer-based offline RL methods on the D4RL Hopper benchmark. Behavior Cloning, Decision Transformer, and Online Decision Transformer trained on hopper-medium-replay-v2 with D4RL-normalized scoring. Multi-seed evaluation pipeline for all three.

Hero numbers

Metric	Value
Methods	3 (BC, DT, ODT)
Dataset	D4RL hopper-medium-replay-v2, ~200K transitions
Seeds per method	Multi-seed eval scripts ship for BC, DT, ODT (see `eval_*_multiseed.py`)
Expected DT D4RL-normalized score	60 to 80 on Hopper
Foundation baseline	`foundation_compare.py` benchmarks a frozen transformer with linear head

Three methods, one benchmark

Model	What it learns	Where it lives
BC	Imitates dataset actions through a transformer policy head	`train.py --model bc`
DT	Return-conditioned sequence model. State, action, return-to-go tokens	`train.py --model dt`
ODT	Stochastic DT with an online fine-tune phase	`train.py --model odt --online_epochs 10`

Each model uses the same dataloader and tokenization, so the comparison isolates the algorithmic difference.

Repository layout

offline-rl-sequence-modeling/
  src/
    dataloader.py        D4RL trajectory loader, return-to-go tokenization
    model.py             Transformer policy with mode flags for BC, DT, ODT
    utils.py             D4RL normalized score, eval rollout, seed helpers
  dataset_setup.py       D4RL download wrapper
  train.py               Single-seed training entry, mode flag picks BC, DT, ODT
  train_improved.py      Tuned hyperparameters from the multi-seed sweep
  test.py                Episode rollout with deterministic policy
  eval_bc_multiseed.py   BC eval across seeds, returns mean and std
  eval_dt_multiseed.py   DT eval across seeds
  eval_odt_multiseed.py  ODT eval across seeds
  eval_bc_checkpoints.py BC checkpoint sweep, picks best epoch
  compare.py             Side-by-side bar chart of D4RL scores
  foundation_compare.py  Frozen-foundation baseline plus DT on top
  plot_bc_final.py       Final BC curves
  plot_bc_multiseed.py   BC seed-spread plot
  requirements.txt

Train

python dataset_setup.py --output data/

python train.py --model bc  --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
python train.py --model dt  --epochs 50 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/
python train.py --model odt --epochs 50 --online_epochs 10 --batch_size 64 --lr 1e-4 --device cuda:0 --out_dir outputs/

Evaluate

Single seed, single model:

python test.py --model dt --ckpt models/best_model_dt.pth --device cuda:0 --num_episodes 20

Multi-seed for fair comparison:

python eval_bc_multiseed.py  --ckpt_dir outputs/bc  --seeds 5
python eval_dt_multiseed.py  --ckpt_dir outputs/dt  --seeds 5
python eval_odt_multiseed.py --ckpt_dir outputs/odt --seeds 5
python compare.py

compare.py reads the eval JSON dumps and writes a D4RL-score bar chart with mean and std error bars.

Foundation comparison

foundation_compare.py tests whether a frozen transformer foundation gives a useful feature space for offline RL. A frozen encoder plus a linear policy head is the lower bound. DT trained from scratch is the upper bound. The gap shows how much the sequence modeling buys over a static representation.

Stack

Python 3.12, PyTorch, D4RL, numpy, matplotlib for plots.

Hardware

CPU runs to completion on the BC model in a few hours. DT and ODT want CUDA. Reduce --batch_size and --epochs to fit smaller VRAM. The dataloader streams transitions, no full-dataset GPU residency required.

License

MIT, see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

offline-rl-sequence-modeling

Hero numbers

Three methods, one benchmark

Repository layout

Train

Evaluate

Foundation comparison

Stack

Hardware

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.txt		README.txt
compare.py		compare.py
create_slides.js		create_slides.js
dataset_setup.py		dataset_setup.py
eval_bc_checkpoints.py		eval_bc_checkpoints.py
eval_bc_multiseed.py		eval_bc_multiseed.py
eval_dt_multiseed.py		eval_dt_multiseed.py
eval_odt_multiseed.py		eval_odt_multiseed.py
foundation_compare.py		foundation_compare.py
package-lock.json		package-lock.json
package.json		package.json
plot_bc_final.py		plot_bc_final.py
plot_bc_multiseed.py		plot_bc_multiseed.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
train_improved.py		train_improved.py

Folders and files

Latest commit

History

Repository files navigation

offline-rl-sequence-modeling

Hero numbers

Three methods, one benchmark

Repository layout

Train

Evaluate

Foundation comparison

Stack

Hardware

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages