Skip to content

crazyguitar/RLcade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLcade

A PyTorch-based reinforcement learning framework for playing games. RLcade provides multiple RL algorithms (PPO, DQN, Rainbow DQN, SAC), distributed training support, and a plugin-based architecture for profiling and curriculum learning.

Key Features

Supported Mappers

Mapper Name Example Games
0 NROM Super Mario Bros, Donkey Kong
1 SxROM (MMC1) The Legend of Zelda, Metroid
2 UxROM Mega Man, Castlevania
3 CNROM Paperboy, Gradius
4 TxROM (MMC3) Super Mario Bros 2/3, Kirby's Adventure

Getting Started

Prerequisites

  • Rust (1.85+)
  • SDL2
  • Python 3.10+
  • uv (brew install uv or curl -LsSf https://astral.sh/uv/install.sh | sh)

Note: maturin is bootstrapped automatically by uv via PEP 517 (declared in pyproject.toml build-system requires).

Install SDL2

macOS:

brew install sdl2

Ubuntu/Debian:

sudo apt install libsdl2-dev

Install

uv venv .venv
source .venv/bin/activate
make install

Play with a trained agent

python -m rlcade.agent --rom <path/to/rom.nes> --checkpoint <path/to/checkpoint>

Run the NES emulator standalone

cargo build --release

# Default: loads games/super-mario-bros.nes
cargo run --release

# Specify a ROM file
cargo run --release -- path/to/rom.nes

Training

All training scripts follow the same pattern: bash <script> --train to train, omit --train to play with a checkpoint. Every script supports distributed training via --launcher, --nproc-per-node, and --nnodes, and extra args are forwarded to the trainer.

# Common flags
--train                  # Run training (omit to run inference with human rendering)
--device cuda            # Use GPU (defaults to auto: GPU if available, else CPU)
--launcher BACKEND       # Launch backend: none, elastic, mp, ray (default: elastic)
--nproc-per-node N       # Number of GPUs per node (default: 1)
--nnodes N               # Number of nodes (default: 1)
--distributed STRATEGY   # Distributed strategy: ddp, fsdp2 (default: none)

Launchers

Backend When to use
none Single-process, or when an external scheduler (Slurm) pre-sets RANK/WORLD_SIZE env vars
elastic Single or multi-node GPU training (equivalent to torchrun --standalone)
mp Single-node multi-GPU via torch.multiprocessing.spawn
ray Multi-node GPU training on a Ray cluster (GPU-only, auto-detects topology; pip install 'ray[default]>=2.9')
# Single GPU (launcher=none, no distributed)
bash examples/ppo/ppo.sh --train --launcher none

# 8 GPUs, single node, elastic launch + DDP
bash examples/ppo/ppo.sh --train --launcher elastic --nproc-per-node 8

# 8 GPUs, single node, mp launch + DDP
bash examples/ppo/ppo.sh --train --launcher mp --nproc-per-node 8

# Multi-node (2 nodes x 8 GPUs) via elastic
bash examples/ppo/ppo.sh --train --launcher elastic --nproc-per-node 8 --nnodes 2

# Ray cluster (auto-detect GPUs)
bash examples/ppo/ppo.sh --train --launcher ray --ray-address ray://head:6379

# FSDP2 (sharded training)
bash examples/ppo/ppo.sh --train --launcher elastic --nproc-per-node 8 --distributed fsdp2

PPO

# Train PPO + CNN baseline
bash examples/ppo/ppo.sh --train

# Multi-GPU (8x H100)
bash examples/ppo/ppo.sh --train --nproc-per-node 8

# PPO + LSTM encoder
bash examples/ppo/ppo_lstm.sh --train

# PPO + LSTM + ICM (intrinsic curiosity)
bash examples/ppo/ppo_lstm_icm.sh --train

# PPO + LSTM + ICM + Curriculum learning
bash examples/ppo/ppo_lstm_icm_curriculum.sh --train

# PPO + IMPALA ResNet encoder
bash examples/ppo/ppo_resnet.sh --train

DQN

# Double DQN + Dueling
bash examples/dqn/dqn.sh --train

# Rainbow DQN (C51 + PER + NoisyNet + Dueling + Double + N-step)
bash examples/dqn/rainbow_dqn.sh --train

# DQN + IMPALA ResNet encoder
bash examples/dqn/dqn_resnet.sh --train

# DQN + Curriculum learning
bash examples/dqn/dqn_curriculum.sh --train

# Rainbow DQN + Curriculum learning
bash examples/dqn/rainbow_dqn_curriculum.sh --train

SAC

# SAC-Discrete (dual Q-networks + auto temperature tuning)
bash examples/sac/sac.sh --train

# SAC + IMPALA ResNet encoder
bash examples/sac/sac_resnet.sh --train

Play with a trained checkpoint

Omit --train to run inference with human rendering:

bash examples/ppo/ppo.sh
bash examples/dqn/rainbow_dqn.sh
bash examples/sac/sac.sh

Upload to Hugging Face

Training scripts can export .safetensors via --safetensors-path. For sharing on the Hub, prefer uploading the .safetensors weights plus metadata; optionally include the .pt checkpoint if you want to preserve optimizer/resume state.

tools/hf.sh \
  --repo-id <user-or-org>/rlcade-rainbow-dqn-smb \
  --agent rainbow_dqn \
  --safetensors checkpoints/rainbow_dqn_smb.safetensors \
  --checkpoint checkpoints/rainbow_dqn_smb.pt \
  --include-pt \
  --actions complex

Script summary

Script Agent Features
examples/ppo/ppo.sh PPO CNN encoder
examples/ppo/ppo_lstm.sh PPO LSTM encoder
examples/ppo/ppo_lstm_icm.sh PPO LSTM + ICM curiosity
examples/ppo/ppo_lstm_icm_curriculum.sh PPO LSTM + ICM + curriculum
examples/ppo/ppo_resnet.sh PPO IMPALA ResNet encoder
examples/dqn/dqn.sh DQN Double + Dueling
examples/dqn/dqn_resnet.sh DQN Double + Dueling + IMPALA ResNet
examples/dqn/rainbow_dqn.sh Rainbow DQN C51 + PER + NoisyNet + Dueling + Double + N-step
examples/dqn/dqn_curriculum.sh DQN Double + Dueling + curriculum
examples/dqn/rainbow_dqn_curriculum.sh Rainbow DQN Full Rainbow + curriculum
examples/sac/sac.sh SAC Discrete SAC + auto alpha
examples/sac/sac_resnet.sh SAC Discrete SAC + IMPALA ResNet

Benchmarks

Benchmark trainer throughput for all agents (PPO, DQN, Rainbow DQN):

# Run all benchmarks (env + all agents, 8 iterations each)
python -m bench --rom <rom>

# Benchmark a specific agent
python -m bench --rom <rom> --bench ppo
python -m bench --rom <rom> --bench dqn
python -m bench --rom <rom> --bench rainbow_dqn

# Env-only benchmark (single + vectorized step throughput)
python -m bench --rom <rom> --bench env

# Options
python -m bench --rom <rom> --bench ppo --device cuda --num-steps 256 --iterations 16

Profiling

Use VizTracer to profile training runs. Traces are saved to the profile/ directory.

Profiling via benchmarks

# Profile all agents (outputs profile/trace_ppo.json, profile/trace_dqn.json, etc.)
python -m bench --rom <rom> --viztracer trace

# Profile a single agent
python -m bench --rom <rom> --bench ppo --viztracer trace

Profiling via training

# Profile steps 50-60 of a training run
python -m rlcade.training \
    --viztracer profile/training.json \
    --viztracer-start 50 \
    --viztracer-end 60

# With additional VizTracer options
python -m rlcade.training --viztracer profile/training.json \
    --viztracer-start 10 \
    --viztracer-end 20 \
    --viztracer-max-stack-depth 15 \
    --viztracer-ignore-c-function \
    --viztracer-log-func-args

Nsight Systems (GPU profiling)

Use Nsight Systems to profile CUDA kernels, memory transfers, and NVTX-annotated training steps.

# Profile steps 5-15 of a training run
nsys profile \
  -t cuda,nvtx,osrt,cudnn,cublas \
  --capture-range=cudaProfilerApi \
  --cuda-memory-usage=true \
  -o nsys_trace --force-overwrite=true \
  python -m rlcade.training \
    --nsys \
    --nsys-start 5 \
    --nsys-end 15 \
    --agent ppo \
    --env rlcade/SuperMarioBros-v0 \
    --rom games/super-mario-bros.nes \
    --device cuda

# View the trace
nsys-ui nsys_trace.nsys-rep

Each training step appears as an NVTX range (step_5, step_6, ...) in the timeline, making it easy to correlate CUDA kernel activity with specific iterations.

CUDA Memory Profiling

Use PyTorch's CUDA Memory Profiler to capture memory allocation history and diagnose leaks or fragmentation.

# Record memory history for steps 5-15
python -m rlcade.training \
    --memory-profiler \
    --memory-profiler-start 5 \
    --memory-profiler-end 15 \
    --agent ppo \
    --env rlcade/SuperMarioBros-v0 \
    --rom games/super-mario-bros.nes \
    --device cuda

# Custom output path and max history entries
python -m rlcade.training \
    --memory-profiler \
    --memory-profiler-start 10 \
    --memory-profiler-end 20 \
    --memory-profiler-output profile/memory.pkl \
    --memory-profiler-max-entries 200000

Upload the snapshot file to pytorch.org/memory_viz for an interactive visualization of memory allocations, including stack traces and allocation timelines.

Viewing traces

vizviewer profile/trace_ppo.json

This opens an interactive timeline in the browser showing call stacks, durations, and PyTorch operations. Rust code (NES emulator via PyO3) appears as opaque C-extension blocks -- you can see how long env.step() takes but not the internal Rust call stack. For Rust-level profiling, use cargo flamegraph or samply.

Tests

# ROM file is not included in the repo due to licensing
# Tests are skipped when --rom is not provided
make test

# Run with a ROM file
python -m pytest tests/ -v --rom "/path/to/super-mario-bros.nes"

Controls

Player 1

NES Button Keyboard
D-pad Arrow keys
A J
B K
Start M
Select N

Player 2

NES Button Keyboard
D-pad WASD
A G
B F
Start Y
Select T
Action Key
Quit Escape

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages