Task: "Pick the orange cell and place it into the empty slot of the green module." Stack: SO-101 bimanual teleop → LeRobot v0.4.1 → ACT policy → AMD MI300X → HF Hub. Tracks: Best Use of AMD • Ford Industrial Robotics Grippers.
We built an end-to-end imitation-learning pipeline for a battery-cell pick-and-place task on an SO-101 arm pair, trained the ACT policy on AMD MI300X, and target hardware autonomous playback on the same SO-101 that recorded the demos. Everything runs on AMD silicon — training on MI300X via ROCm, inference on a Radeon 890M iGPU via the same ROCm stack.
Artifacts:
- Dataset →
wbell7/starkhacks_cell_pickplace(58 episodes, 29,166 frames, 2 cameras) - Policy →
wbell7/starkhacks_act_cell(ACT, 52M params; pushed on train completion) - Run → wandb/wrbell7/starkhacks/nbyv34do
┌───────────────────┐ ┌───────────────────┐
│ SO-101 leader │ ── human teleoperates ──▶ │ SO-101 follower │
│ (my_leader_arm) │ │ (my_follower_arm) │
└───────────────────┘ └───────────────────┘
│
captures │ 2 cams
30 fps │ 640×480
▼
┌───────────────────────────────┐
│ top (UGREEN, overhead) │
│ wrist (ARC, follower wrist) │
└───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ record_stdin.py wraps lerobot-record │
│ • pynput → stdin (Wayland fix) │
│ • 3-Enter per episode (start-gate keeps hands │
│ out of frame; crash-durable parquet rotation) │
└─────────────────────────────────────────────────────┘
│
local parquet │ + mp4 videos
▼
┌─────────────────────────────────────────┐
│ ~/.cache/huggingface/lerobot/ │
│ local/starkhacks_cell_pickplace/ │
└─────────────────────────────────────────┘
│
push_to_hub │ hf_transfer +
│ upload_large_folder
▼
┌───────────────────────────────────────────────────┐
│ 🤗 wbell7/starkhacks_cell_pickplace (public) │
└───────────────────────────────────────────────────┘
│
snapshot_download (prefetch)
▼
┌──────────────────────────────────────────┐
│ MI300X VM — /root/.cache/huggingface/ │
│ lerobot/wbell7/starkhacks_cell_... │
└──────────────────────────────────────────┘
│
lerobot-train (ACT) │
50k steps · bf16 · │
MIOpen+TunableOp │
▼
┌──────────────────────────────────────────┐
│ ACT policy, 52M params → ckpt │
└──────────────────────────────────────────┘
│
push_to_hub │
▼
┌───────────────────────────────────────────────┐
│ 🤗 wbell7/starkhacks_act_cell (public) │
└───────────────────────────────────────────────┘
│
lerobot-record --policy.path
▼
┌───────────────────────────────────────────────┐
│ SO-101 follower drives itself — H34 ship │
└───────────────────────────────────────────────┘
Why SO-101? Affordable open-source teleop arm (Hugging Face + The Robot Studio). Usable out-of-the-box via LeRobot, with a leader-follower topology that produces clean demonstration data.
Why ACT? Action Chunking Transformer (Zhao et al., 2023) is the workhorse for small-data bimanual/single-arm imitation learning in LeRobot. Handles temporal multimodality via action chunks; trains well on hundreds of demos.
Why MI300X? Training on the hackathon's AMD track. 192 GB HBM3 on one GPU trivialises any memory pressure for ACT. With the ROCm bf16 + MIOpen + TunableOp recipe, we measured 0.209 s/step at batch 32 — a 3.4× speedup over the fp32 baseline on the same hardware. A 50k-step run is ~3 hours.
Why battery-cell pick-and-place? Maps directly to the brownfield handling that the Ford Industrial Robotics Grippers track cares about. Orange cell + green module is a stand-in for a factory line task (cylindrical cell insertion into a battery pack).
| Layer | Component | Notes |
|---|---|---|
| Leader arm | SO-101 (id my_leader_arm) |
/dev/so101_leader → ttyACM1 |
| Follower arm | SO-101 (id my_follower_arm) |
/dev/so101_follower → ttyACM0 |
| Top camera | UGREEN (USB UVC 0c45:2283) |
/dev/cam_ugreen, overhead |
| Wrist camera | ARC (USB UVC 05a3:9230) |
/dev/cam_arc, on follower wrist |
| Local host | Ryzen AI + Radeon 890M (gfx1150) | ROCm 6.x, 23 GiB RAM |
| Cloud training | MI300X VF, 192 GB HBM3 | DO droplet (IP stored locally), ROCm 7.2 |
| Framework | LeRobot v0.4.1 + PyTorch 2.7.1+rocm6.3 | Python 3.10 local, 3.12 on VM |
| Tracking | wandb project starkhacks |
entity wrbell7 |
| Artifacts | Hugging Face Hub (public) | wbell7/* |
~/starkhacks/
├─ README.md # this file
├─ CLAUDE.md # conventions + gotchas for future Claude sessions
├─ ROADMAP.md # live phase tracker, ship-point checklist, done-log
├─ scripts/ # numbered runbook, 00_ → 08_
│ ├─ 00_anti_chaos.sh # udev rules, remove brltty
│ ├─ 01_find_port.sh # identify which arm is on which ttyACM*
│ ├─ 02_teleop.sh # H12 ship — leader→follower mirror
│ ├─ 02a_teleop_raw.sh # low-level teleop debug
│ ├─ 02b_teleop_cams.sh # teleop + camera preview
│ ├─ 03_pipeline_sanity.sh # ACT smoke on public SO-101 dataset
│ ├─ 04_record.sh # 50 episodes, cameras (top, wrist)
│ ├─ 04b_validate_dataset.sh # post-record integrity + summary
│ ├─ 04c_view_episode.sh # visual playback of one episode
│ ├─ 05_replay.sh # H24 ship — open-loop replay
│ ├─ 06_train_smoke.sh # 2k-step sanity train on our data
│ ├─ 07_train_full.sh # full train (local iGPU fallback)
│ ├─ 08_eval.sh # H34 ship — policy drives follower
│ ├─ record_stdin.py # lerobot-record wrapper: stdin + start-gate + visual banners + durability
│ └─ README.md # the runbook's own quickstart
├─ cloud/ # MI300X Developer Cloud recipes
│ ├─ 00_mi300x_bootstrap.sh # env (torch rocm, ffmpeg 7, lerobot)
│ ├─ 01_train_act_mi300x.sh # full ACT recipe with the bf16+tune flags
│ └─ 01a_smoke_public.sh # 200-step smoke on public aloha dataset
├─ amd_hackathon/ # AMD-track-specific materials, reference notebooks
├─ logs/ # run logs + watchers (HF upload, train watcher, etc.)
└─ outputs/ (on MI300X VM) # training checkpoints, tensorboard dumps
The repo contains our code + glue + vendored AMD hackathon reference materials. Two external pieces must still be obtained: lerobot (pinned commit) and the ROCm + PyTorch stack. Scripts hardcode ~/starkhacks and ~/lerobot, so clone to those exact paths.
- Ubuntu 24.04 (or similar) with Wayland session
- Python 3.10 via conda/miniforge
- 2× SO-101 arms (leader + follower), 2× USB cameras
- HuggingFace + wandb accounts; AMD Developer Cloud for MI300X training
git clone https://github.com/Garrett-R16/SH26_MindFlayer.git ~/starkhacksgit clone https://github.com/huggingface/lerobot.git ~/lerobot
cd ~/lerobot
git checkout -b v0.4.1 a5b29d43conda create -n lerobot python=3.10 -y
conda activate lerobot
cd ~/lerobot
pip install -e '.[feetech]'
cd ~/starkhacks
pip install -r requirements.txtFor the ROCm-specific PyTorch build on a Radeon iGPU, install torch==2.7.1+rocm6.3 via the ROCm index first — otherwise pip will pull the CUDA wheel from PyPI (see Gotchas).
FFmpeg 7+ is required by lerobot for video encoding:
sudo add-apt-repository ppa:ubuntuhandbook1/ffmpeg7 -y
sudo apt update && sudo apt install -y ffmpeghuggingface-cli login # write-scope token
wandb logincd ~/starkhacks
# Edit scripts/00_anti_chaos.sh with your arms' serial numbers first, then:
sudo bash scripts/00_anti_chaos.shThis creates /dev/so101_follower, /dev/so101_leader, /dev/cam_ugreen, /dev/cam_arc.
# Identify ports, write udev, remove brltty
./scripts/01_find_port.sh
# edit 00_anti_chaos.sh with the two serials, then:
sudo ./scripts/00_anti_chaos.sh
# Verify teleop (ship H12)
./scripts/02_teleop.sh# 50 episodes, two cameras. Three-Enter cycle per episode (start, stop, end-reset).
./scripts/04_record.sh
# If the process crashed partway: resume from where you left off
RESUME=1 NUM_EPISODES=<how-many-more> ./scripts/04_record.sh
# Verify integrity after recording
./scripts/04b_validate_dataset.sh./scripts/05_replay.shpython -c "
from lerobot.datasets.lerobot_dataset import LeRobotDataset
d = LeRobotDataset('local/starkhacks_cell_pickplace',
root='$HOME/.cache/huggingface/lerobot/local/starkhacks_cell_pickplace')
d.repo_id = 'wbell7/starkhacks_cell_pickplace'
d.push_to_hub(tags=['lerobot','so101','starkhacks-2026'], private=False,
upload_large_folder=True)
"# On the cloud VM, first time:
bash cloud/00_mi300x_bootstrap.sh
# Sanity-check on a public dataset (a few minutes, covers MIOpen autotune):
bash cloud/01a_smoke_public.sh
# Full run on our data (~3 hours):
bash cloud/01_train_act_mi300x.sh# Back on the local box:
./scripts/08_eval.shFrom ROADMAP.md's benchmark log (same policy, same dataset, same batch, one variable at a time):
| Config | s/step | Speedup | Notes |
|---|---|---|---|
| fp32 baseline, batch 32 | 0.715 | 1.0× | cold MIOpen |
| bf16 only | 0.583 | 1.23× | via ACCELERATE_MIXED_PRECISION=bf16 |
| bf16 + MIOpen + TunableOp, cold | 1.862 | 0.38× | autotune tax |
| bf16 + MIOpen + TunableOp, warm | 0.209 | 3.42× | caches persist at /root/.miopen |
The full env recipe lives in cloud/01_train_act_mi300x.sh:
export ACCELERATE_MIXED_PRECISION=bf16
export MIOPEN_FIND_MODE=3
export MIOPEN_FIND_ENFORCE=3
export MIOPEN_USER_DB_PATH=/root/.miopen
export MIOPEN_CUSTOM_CACHE_DIR=/root/.miopen
export PYTORCH_TUNABLEOP_ENABLED=1
export PYTORCH_TUNABLEOP_TUNING=1
export TORCH_BLAS_PREFER_HIPBLAS_LT=1
export HSA_NO_SCRATCH_RECLAIM=1
export GPU_MAX_HW_QUEUES=2First run at a given batch size pays ~5 min autotune; reruns are warm.
- pynput can't capture keys under Wayland. Workaround:
scripts/record_stdin.pywrapslerobot-record, replaces the pynput listener with a stdin reader (<Enter>/n/q). - Hands in frame on the first recorded frames. The reset-phase Enter was also the start-next-episode Enter.
record_stdin.pynow gates every new episode on a second explicit Enter, with a big terminal banner (no speaker on this box — visual feedback only). - Out-of-memory crashes corrupted parquet footers mid-record twice, losing 15 and then 6 episodes. Two defenses added:
--display_data=false(rerun in-memory buffer was the leak) +--dataset.num_image_writer_processes=1(PNG writer into a subprocess).meta/info.json: data_files_size_in_mb=1and monkeypatchedmetadata_buffer_size=1, so the parquet writer rotates — and thus finalises its footer — after ≈ every episode. A future crash costs 1 episode, not 10.
upload_folderwedges on wifi swap. Old sockets stuck inCLOSE-WAITfor ~15 min. Kill + restart (files already committed to the hub are skipped on retry). We now useupload_large_folder=True+HF_HUB_ENABLE_HF_TRANSFER=1for parallel chunked uploads.wandb.login(key=…)rejects thewandb_v1_…token format (40-char hex check is legacy).WANDB_API_KEYas an env var bypasses that check and is whatwandb.init()actually reads.- lerobot's deps pulled
torchfrom PyPI over our ROCm wheel. Install torch first, then install lerobot withpip install -c torch_constraint.txtto pin the ROCm build. - Ubuntu 24.04
.bashrcreturns early for non-interactive shells, so exports there never fire for SSH-invoked commands. Creds go in/etc/environment+/etc/profile.d/starkhacks_creds.sh. --policy.push_to_hub=falseis required locally — HF push is not configured on the local box. Omitting it makes training abort at the final checkpoint.- Camera key order is load-bearing.
top, wristexactly (UGREEN overhead, ARC on wrist). Must match between record and inference or SmolVLA-class policies fail silently; ACT likely the same. wrist_rollcalibration clipping. If teleop feels clamped, re-sweep both arms through their full wrist-roll range during calibration.
scripts/record_stdin.py— lerobot-record wrapper (stdin + banners + start gate + per-episode parquet rotation)./tmp/upload_watcher.sh— polls local upload, fires VM prefetch when done./root/prefetch_and_train.sh(on VM) — waits for smoke to finish cleanly, prefetches the dataset from the Hub, then kicks off the full 50k-step train./tmp/train_watcher.sh— polls the VM training every 90 s, raises anotify-sendpopup on crash or completion, logs step transitions.
- Dataset: https://huggingface.co/datasets/wbell7/starkhacks_cell_pickplace
- Policy (pushed at train-end): https://huggingface.co/wbell7/starkhacks_act_cell
- Training run: https://wandb.ai/wrbell7/starkhacks/runs/nbyv34do
- Phase tracker:
ROADMAP.md - Epic plan (full strategy PDF):
~/Downloads/epic_plan.pdf