Skip to content

alexhegit/ROCm_Robotics_RL_Lab

Repository files navigation

ROCm_Robotics_RL_Lab

A reinforcement learning lab for robotic simulation on AMD GPUs, currently focused on the robosuite Panda + Lift baseline.

This project was developed, trained, and validated end-to-end on an AMD Ryzen AI Max+ 395 laptop, making full use of its integrated Radeon 8060S GPU for both OpenGL simulation rendering and ROCm / PyTorch AI compute. It has proven to be a very capable portable development platform for robotic RL work.

中文说明请见 README_zh.md

📁 Project Structure

ROCm_Robotics_RL_Lab/
├── docs/                        # Blog posts and publishing assets
├── environments/
│   ├── gym_wrapper.py           # robosuite -> Gymnasium adapters
│   └── pick_cube_place_cup.py   # Custom pick-and-place environment prototype
├── scripts/
│   ├── quickstart.py            # Quickstart example
│   ├── train_sac.py             # SAC training script for Panda Lift
│   ├── train_ppo.py             # PPO training script for Panda Lift
│   ├── evaluate.py              # Evaluation and video-recording script
│   ├── seed_sweep.py            # Multi-seed runner
│   └── param_sweep.py           # Parameter sweep runner
├── model_loading.py             # SB3 model loading helpers
├── requirements.txt
├── README.md
└── README_zh.md

🚀 Quick Start

1. Environment Setup (AMD GPU / ROCm)

# Create virtual environment
uv venv .venv --python 3.12
source .venv/bin/activate

# Install the ROCm build of PyTorch (AMD GPU)
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.1

Verify that the GPU is visible:

python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"

Install dependencies:

uv pip install -r requirements.txt
uv pip install stable-baselines3[extra]

# One-time robosuite initialization
python .venv/lib/python3.12/site-packages/robosuite/scripts/setup_macros.py

2. Run the quick example

cd ROCm_Robotics_RL_Lab
python scripts/quickstart.py

This trains a small SAC policy on robosuite's Lift task with the Panda robot.

3. Full training

# SAC training
python scripts/train_sac.py --total-timesteps 500000 --n-envs 4

# PPO training
python scripts/train_ppo.py --total-timesteps 1000000 --n-envs 8

# Watch actions live during training (single environment)
python scripts/train_sac.py --total-timesteps 500000 --n-envs 1 --render

# Enable the Stable-Baselines3 progress bar explicitly if desired
python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --progress-bar

train_sac.py now includes more robust defaults for parallel environments:

  • --save-freq and --eval-freq are automatically scaled by n_envs into SB3 callback frequencies, avoiding overly sparse checkpointing and evaluation in vectorized training
  • Default hyperparameters are biased toward stable task success instead of quickly exploiting dense shaping rewards: learning_rate=1e-4, batch_size=512, learning_starts=20000, tau=0.002, and gradient_steps=1
  • These defaults reduce critic / actor oscillation and make it less likely for the policy to overfit to shaping rewards before it truly solves Lift
  • Training episodes terminate immediately on success, while time limits are treated as truncated, reducing value-learning bias caused by confusing timeouts with true terminal states
  • Based on the first round of parameter sweeps, the current success-oriented default adds a +100 terminal reward on successful episodes (--success-bonus) and does not apply an extra timeout penalty (--timeout-penalty 0); this configuration performed best on average across multiple seeds
  • best_success now uses 20 evaluation episodes and a 20% success threshold by default (--n-eval-episodes 20 --min-best-success-rate 0.20), balancing reliability and training speed
  • best_success/best_metrics.json records the checkpoint timestep, success_rate, and mean_reward, and the same information is printed at the end of training

If you want a stricter threshold, set it explicitly:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --n-eval-episodes 20 --min-best-success-rate 0.20

If the policy still prefers dense shaping reward over actually finishing the task, you can further strengthen the success signal:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --n-eval-episodes 20 --min-best-success-rate 0.20 --success-bonus 100 --timeout-penalty 0

If you want to spell out the currently recommended stable training configuration, use:

python scripts/train_sac.py --total-timesteps 500000 --n-envs 4 --learning-rate 1e-4 --batch-size 512 --learning-starts 20000 --gradient-steps 1 --tau 0.002 --n-eval-episodes 20 --min-best-success-rate 0.20 --success-bonus 100 --timeout-penalty 0

If you suspect large seed sensitivity, you can batch-run multiple seeds directly:

python scripts/seed_sweep.py --seeds 42 123 456 --total-timesteps 500000 --n-envs 4

This script will:

  • call the existing scripts/train_sac.py for each seed
  • evaluate each run's best_success checkpoint first, falling back to the final model if needed
  • summarize per-seed training and evaluation results in models/seed_sweeps/<timestamp>/summary.json

If you want to inspect the commands without actually launching the runs, use:

python scripts/seed_sweep.py --seeds 42 123 456 --dry-run

4. Evaluate a model

python scripts/evaluate.py --model models/sac_lift_final.zip --algo sac --n-episodes 10
python scripts/evaluate.py --model models/best/best_model.zip --algo sac --no-render
python scripts/evaluate.py --model <path> --algo ppo --record-video --video-dir videos/

The evaluation script counts success over the whole episode: if the task is completed at any step, the episode is marked successful instead of checking only the final frame.

🎯 Current Focus

The active, validated workflow in this repository is:

  • robosuite Lift task
  • Panda robot
  • Stable-Baselines3 SAC / PPO
  • AMD GPU + ROCm + PyTorch training
  • OpenGL-based rendering, evaluation, and video capture

The custom pick_cube_place_cup.py environment remains in the repository as a prototype for future work, but the current documented and tested baseline is Panda Lift.

📊 Recommended Training Parameters

SAC parameters

Parameter Recommended value Description
learning_rate 1e-4 Learning rate
buffer_size 1,000,000 Replay buffer size
batch_size 512 Batch size
learning_starts 20,000 Random sampling steps before learning starts
gradient_steps 1 Conservative update frequency
gamma 0.99 Discount factor
tau 0.002 Soft update coefficient

PPO parameters

Parameter Default Description
learning_rate 3e-4 Learning rate
n_steps 2048 Steps per update
batch_size 64 Batch size
n_epochs 10 Training epochs
clip_range 0.2 Clipping range

📈 Monitor Training

Use TensorBoard to inspect training curves:

tensorboard --logdir logs/

🔗 Related Resources

📝 Notes

  1. The repository's validated baseline is Panda + Lift in robosuite.
  2. For live rendering during training, use --render together with --n-envs 1.
  3. For headless evaluation, prefer --no-render to avoid GLFW / DISPLAY issues.
  4. The training scripts disable SB3's rich progress bar by default to avoid cleanup tracebacks from tqdm / rich in some environments; add --progress-bar if you want it enabled.

📜 License

MIT License

About

Robotics Reinforcement learning (RL) Lab on AMD ROCm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages