ROCm_Robotics_RL_Lab

A reinforcement learning lab for robotic simulation on AMD GPUs, currently focused on the robosuite Panda + Lift baseline.

This project was developed, trained, and validated end-to-end on an AMD Ryzen AI Max+ 395 laptop, making full use of its integrated Radeon 8060S GPU for both OpenGL simulation rendering and ROCm / PyTorch AI compute. It has proven to be a very capable portable development platform for robotic RL work.

中文说明请见 README_zh.md

📁 Project Structure

ROCm_Robotics_RL_Lab/
├── docs/                        # Blog posts and publishing assets
├── environments/
│   ├── gym_wrapper.py           # robosuite -> Gymnasium adapters
│   └── pick_cube_place_cup.py   # Custom pick-and-place environment prototype
├── scripts/
│   ├── quickstart.py            # Quickstart example
│   ├── train_sac.py             # SAC training script for Panda Lift
│   ├── train_ppo.py             # PPO training script for Panda Lift
│   ├── evaluate.py              # Evaluation and video-recording script
│   ├── seed_sweep.py            # Multi-seed runner
│   └── param_sweep.py           # Parameter sweep runner
├── model_loading.py             # SB3 model loading helpers
├── requirements.txt
├── README.md
└── README_zh.md

🚀 Quick Start

1. Environment Setup (AMD GPU / ROCm)

# Create virtual environment
uv venv .venv --python 3.12
source .venv/bin/activate

# Install the ROCm build of PyTorch (AMD GPU)
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.1

Verify that the GPU is visible:

python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

Install dependencies:

uv pip install -r requirements.txt
uv pip install stable-baselines3[extra]

# One-time robosuite initialization
python .venv/lib/python3.12/site-packages/robosuite/scripts/setup_macros.py

2. Run the quick example

cd ROCm_Robotics_RL_Lab
python scripts/quickstart.py

This trains a small SAC policy on robosuite's Lift task with the Panda robot.

3. Full training

# SAC training
python scripts/train_sac.py --total-timesteps 500000 --n-envs 4

# PPO training
python scripts/train_ppo.py --total-timesteps 1000000 --n-envs 8

# Watch actions live during training (single environment)
python scripts/train_sac.py --total-timesteps 500000 --n-envs 1 --render

# Enable the Stable-Baselines3 progress bar explicitly if desired
python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --progress-bar

train_sac.py now includes more robust defaults for parallel environments:

--save-freq and --eval-freq are automatically scaled by n_envs into SB3 callback frequencies, avoiding overly sparse checkpointing and evaluation in vectorized training
Default hyperparameters are biased toward stable task success instead of quickly exploiting dense shaping rewards: learning_rate=1e-4, batch_size=512, learning_starts=20000, tau=0.002, and gradient_steps=1
These defaults reduce critic / actor oscillation and make it less likely for the policy to overfit to shaping rewards before it truly solves Lift
Training episodes terminate immediately on success, while time limits are treated as truncated, reducing value-learning bias caused by confusing timeouts with true terminal states
Based on the first round of parameter sweeps, the current success-oriented default adds a +100 terminal reward on successful episodes (--success-bonus) and does not apply an extra timeout penalty (--timeout-penalty 0); this configuration performed best on average across multiple seeds
best_success now uses 20 evaluation episodes and a 20% success threshold by default (--n-eval-episodes 20 --min-best-success-rate 0.20), balancing reliability and training speed
best_success/best_metrics.json records the checkpoint timestep, success_rate, and mean_reward, and the same information is printed at the end of training

If you want a stricter threshold, set it explicitly:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --n-eval-episodes 20 --min-best-success-rate 0.20

If the policy still prefers dense shaping reward over actually finishing the task, you can further strengthen the success signal:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --n-eval-episodes 20 --min-best-success-rate 0.20 --success-bonus 100 --timeout-penalty 0

If you want to spell out the currently recommended stable training configuration, use:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --learning-rate 1e-4 --batch-size 512 --learning-starts 20000 --gradient-steps 1 --tau 0.002 --n-eval-episodes 20 --min-best-success-rate 0.20 --success-bonus 100 --timeout-penalty 0

If you suspect large seed sensitivity, you can batch-run multiple seeds directly:

python scripts/seed_sweep.py --seeds 42 123 456 --total-timesteps 500000 --n-envs 4

This script will:

call the existing scripts/train_sac.py for each seed
evaluate each run's best_success checkpoint first, falling back to the final model if needed
summarize per-seed training and evaluation results in models/seed_sweeps/<timestamp>/summary.json

If you want to inspect the commands without actually launching the runs, use:

python scripts/seed_sweep.py --seeds 42 123 456 --dry-run

4. Evaluate a model

python scripts/evaluate.py --model models/sac_lift_final.zip --algo sac --n-episodes 10
python scripts/evaluate.py --model models/best/best_model.zip --algo sac --no-render
python scripts/evaluate.py --model <path> --algo ppo --record-video --video-dir videos/

The evaluation script counts success over the whole episode: if the task is completed at any step, the episode is marked successful instead of checking only the final frame.

🎯 Current Focus

The active, validated workflow in this repository is:

robosuite Lift task
Panda robot
Stable-Baselines3 SAC / PPO
AMD GPU + ROCm + PyTorch training
OpenGL-based rendering, evaluation, and video capture

The custom pick_cube_place_cup.py environment remains in the repository as a prototype for future work, but the current documented and tested baseline is Panda Lift.

📊 Recommended Training Parameters

SAC parameters

Parameter	Recommended value	Description
learning_rate	1e-4	Learning rate
buffer_size	1,000,000	Replay buffer size
batch_size	512	Batch size
learning_starts	20,000	Random sampling steps before learning starts
gradient_steps	1	Conservative update frequency
gamma	0.99	Discount factor
tau	0.002	Soft update coefficient

PPO parameters

Parameter	Default	Description
learning_rate	3e-4	Learning rate
n_steps	2048	Steps per update
batch_size	64	Batch size
n_epochs	10	Training epochs
clip_range	0.2	Clipping range

📈 Monitor Training

Use TensorBoard to inspect training curves:

tensorboard --logdir logs/

🔗 Related Resources

📝 Notes

The repository's validated baseline is Panda + Lift in robosuite.
For live rendering during training, use --render together with --n-envs 1.
For headless evaluation, prefer --no-render to avoid GLFW / DISPLAY issues.
The training scripts disable SB3's rich progress bar by default to avoid cleanup tracebacks from tqdm / rich in some environments; add --progress-bar if you want it enabled.

📜 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROCm_Robotics_RL_Lab

📁 Project Structure

🚀 Quick Start

1. Environment Setup (AMD GPU / ROCm)

2. Run the quick example

3. Full training

4. Evaluate a model

🎯 Current Focus

📊 Recommended Training Parameters

SAC parameters

PPO parameters

📈 Monitor Training

🔗 Related Resources

📝 Notes

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
environments		environments
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
model_loading.py		model_loading.py
requirements.txt		requirements.txt
requirements_rocm.txt		requirements_rocm.txt

Folders and files

Latest commit

History

Repository files navigation

ROCm_Robotics_RL_Lab

📁 Project Structure

🚀 Quick Start

1. Environment Setup (AMD GPU / ROCm)

2. Run the quick example

3. Full training

4. Evaluate a model

🎯 Current Focus

📊 Recommended Training Parameters

SAC parameters

PPO parameters

📈 Monitor Training

🔗 Related Resources

📝 Notes

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages