Skip to content

Garrett-R16/SH26_MindFlayer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StarkHacks 2026 — SO-101 ACT pick-and-place on MI300X

Task: "Pick the orange cell and place it into the empty slot of the green module." Stack: SO-101 bimanual teleop → LeRobot v0.4.1 → ACT policy → AMD MI300X → HF Hub. Tracks: Best Use of AMD • Ford Industrial Robotics Grippers.


TL;DR

We built an end-to-end imitation-learning pipeline for a battery-cell pick-and-place task on an SO-101 arm pair, trained the ACT policy on AMD MI300X, and target hardware autonomous playback on the same SO-101 that recorded the demos. Everything runs on AMD silicon — training on MI300X via ROCm, inference on a Radeon 890M iGPU via the same ROCm stack.

Artifacts:


Pipeline overview

┌───────────────────┐                           ┌───────────────────┐
│   SO-101 leader   │ ── human teleoperates ──▶ │  SO-101 follower  │
│  (my_leader_arm)  │                           │ (my_follower_arm) │
└───────────────────┘                           └───────────────────┘
                                                          │
                                                 captures │ 2 cams
                                                 30 fps   │ 640×480
                                                          ▼
                                        ┌───────────────────────────────┐
                                        │  top (UGREEN, overhead)       │
                                        │  wrist (ARC, follower wrist)  │
                                        └───────────────────────────────┘
                                                          │
                                                          ▼
                   ┌─────────────────────────────────────────────────────┐
                   │  record_stdin.py wraps lerobot-record               │
                   │   • pynput → stdin (Wayland fix)                    │
                   │   • 3-Enter per episode (start-gate keeps hands     │
                   │     out of frame; crash-durable parquet rotation)   │
                   └─────────────────────────────────────────────────────┘
                                                          │
                                           local parquet  │ + mp4 videos
                                                          ▼
                               ┌─────────────────────────────────────────┐
                               │ ~/.cache/huggingface/lerobot/           │
                               │   local/starkhacks_cell_pickplace/      │
                               └─────────────────────────────────────────┘
                                                          │
                                              push_to_hub │ hf_transfer +
                                                          │ upload_large_folder
                                                          ▼
                     ┌───────────────────────────────────────────────────┐
                     │  🤗  wbell7/starkhacks_cell_pickplace  (public)   │
                     └───────────────────────────────────────────────────┘
                                                          │
                                           snapshot_download (prefetch)
                                                          ▼
                              ┌──────────────────────────────────────────┐
                              │  MI300X VM — /root/.cache/huggingface/   │
                              │    lerobot/wbell7/starkhacks_cell_...    │
                              └──────────────────────────────────────────┘
                                                          │
                                     lerobot-train (ACT)  │
                                     50k steps · bf16 ·   │
                                     MIOpen+TunableOp     │
                                                          ▼
                              ┌──────────────────────────────────────────┐
                              │  ACT policy, 52M params → ckpt           │
                              └──────────────────────────────────────────┘
                                                          │
                                              push_to_hub │
                                                          ▼
                          ┌───────────────────────────────────────────────┐
                          │  🤗  wbell7/starkhacks_act_cell  (public)     │
                          └───────────────────────────────────────────────┘
                                                          │
                                  lerobot-record --policy.path
                                                          ▼
                          ┌───────────────────────────────────────────────┐
                          │  SO-101 follower drives itself — H34 ship     │
                          └───────────────────────────────────────────────┘

Background

Why SO-101? Affordable open-source teleop arm (Hugging Face + The Robot Studio). Usable out-of-the-box via LeRobot, with a leader-follower topology that produces clean demonstration data.

Why ACT? Action Chunking Transformer (Zhao et al., 2023) is the workhorse for small-data bimanual/single-arm imitation learning in LeRobot. Handles temporal multimodality via action chunks; trains well on hundreds of demos.

Why MI300X? Training on the hackathon's AMD track. 192 GB HBM3 on one GPU trivialises any memory pressure for ACT. With the ROCm bf16 + MIOpen + TunableOp recipe, we measured 0.209 s/step at batch 32 — a 3.4× speedup over the fp32 baseline on the same hardware. A 50k-step run is ~3 hours.

Why battery-cell pick-and-place? Maps directly to the brownfield handling that the Ford Industrial Robotics Grippers track cares about. Orange cell + green module is a stand-in for a factory line task (cylindrical cell insertion into a battery pack).


Hardware / software stack

Layer Component Notes
Leader arm SO-101 (id my_leader_arm) /dev/so101_leader → ttyACM1
Follower arm SO-101 (id my_follower_arm) /dev/so101_follower → ttyACM0
Top camera UGREEN (USB UVC 0c45:2283) /dev/cam_ugreen, overhead
Wrist camera ARC (USB UVC 05a3:9230) /dev/cam_arc, on follower wrist
Local host Ryzen AI + Radeon 890M (gfx1150) ROCm 6.x, 23 GiB RAM
Cloud training MI300X VF, 192 GB HBM3 DO droplet (IP stored locally), ROCm 7.2
Framework LeRobot v0.4.1 + PyTorch 2.7.1+rocm6.3 Python 3.10 local, 3.12 on VM
Tracking wandb project starkhacks entity wrbell7
Artifacts Hugging Face Hub (public) wbell7/*

Repository layout

~/starkhacks/
├─ README.md              # this file
├─ CLAUDE.md              # conventions + gotchas for future Claude sessions
├─ ROADMAP.md             # live phase tracker, ship-point checklist, done-log
├─ scripts/               # numbered runbook, 00_ → 08_
│  ├─ 00_anti_chaos.sh       # udev rules, remove brltty
│  ├─ 01_find_port.sh        # identify which arm is on which ttyACM*
│  ├─ 02_teleop.sh           # H12 ship — leader→follower mirror
│  ├─ 02a_teleop_raw.sh      # low-level teleop debug
│  ├─ 02b_teleop_cams.sh     # teleop + camera preview
│  ├─ 03_pipeline_sanity.sh  # ACT smoke on public SO-101 dataset
│  ├─ 04_record.sh           # 50 episodes, cameras (top, wrist)
│  ├─ 04b_validate_dataset.sh # post-record integrity + summary
│  ├─ 04c_view_episode.sh    # visual playback of one episode
│  ├─ 05_replay.sh           # H24 ship — open-loop replay
│  ├─ 06_train_smoke.sh      # 2k-step sanity train on our data
│  ├─ 07_train_full.sh       # full train (local iGPU fallback)
│  ├─ 08_eval.sh             # H34 ship — policy drives follower
│  ├─ record_stdin.py        # lerobot-record wrapper: stdin + start-gate + visual banners + durability
│  └─ README.md              # the runbook's own quickstart
├─ cloud/                 # MI300X Developer Cloud recipes
│  ├─ 00_mi300x_bootstrap.sh    # env (torch rocm, ffmpeg 7, lerobot)
│  ├─ 01_train_act_mi300x.sh    # full ACT recipe with the bf16+tune flags
│  └─ 01a_smoke_public.sh       # 200-step smoke on public aloha dataset
├─ amd_hackathon/         # AMD-track-specific materials, reference notebooks
├─ logs/                  # run logs + watchers (HF upload, train watcher, etc.)
└─ outputs/ (on MI300X VM)   # training checkpoints, tensorboard dumps

Installation (from scratch, for third parties)

The repo contains our code + glue + vendored AMD hackathon reference materials. Two external pieces must still be obtained: lerobot (pinned commit) and the ROCm + PyTorch stack. Scripts hardcode ~/starkhacks and ~/lerobot, so clone to those exact paths.

Prerequisites

  • Ubuntu 24.04 (or similar) with Wayland session
  • Python 3.10 via conda/miniforge
  • 2× SO-101 arms (leader + follower), 2× USB cameras
  • HuggingFace + wandb accounts; AMD Developer Cloud for MI300X training

1. Clone this repo

git clone https://github.com/Garrett-R16/SH26_MindFlayer.git ~/starkhacks

2. Clone lerobot at the pinned commit

git clone https://github.com/huggingface/lerobot.git ~/lerobot
cd ~/lerobot
git checkout -b v0.4.1 a5b29d43

3. Create the conda env and install Python deps

conda create -n lerobot python=3.10 -y
conda activate lerobot
cd ~/lerobot
pip install -e '.[feetech]'
cd ~/starkhacks
pip install -r requirements.txt

For the ROCm-specific PyTorch build on a Radeon iGPU, install torch==2.7.1+rocm6.3 via the ROCm index first — otherwise pip will pull the CUDA wheel from PyPI (see Gotchas).

4. System dependencies

FFmpeg 7+ is required by lerobot for video encoding:

sudo add-apt-repository ppa:ubuntuhandbook1/ffmpeg7 -y
sudo apt update && sudo apt install -y ffmpeg

5. Authenticate HuggingFace + wandb

huggingface-cli login   # write-scope token
wandb login

6. Set up udev symlinks for stable device paths

cd ~/starkhacks
# Edit scripts/00_anti_chaos.sh with your arms' serial numbers first, then:
sudo bash scripts/00_anti_chaos.sh

This creates /dev/so101_follower, /dev/so101_leader, /dev/cam_ugreen, /dev/cam_arc.


How to reproduce, from zero

0. Local box prep (one time)

# Identify ports, write udev, remove brltty
./scripts/01_find_port.sh
# edit 00_anti_chaos.sh with the two serials, then:
sudo ./scripts/00_anti_chaos.sh
# Verify teleop (ship H12)
./scripts/02_teleop.sh

1. Record a dataset

# 50 episodes, two cameras. Three-Enter cycle per episode (start, stop, end-reset).
./scripts/04_record.sh

# If the process crashed partway: resume from where you left off
RESUME=1 NUM_EPISODES=<how-many-more> ./scripts/04_record.sh

# Verify integrity after recording
./scripts/04b_validate_dataset.sh

2. Replay to confirm the data is controllable (ship H24)

./scripts/05_replay.sh

3. Push dataset to the Hub

python -c "
from lerobot.datasets.lerobot_dataset import LeRobotDataset
d = LeRobotDataset('local/starkhacks_cell_pickplace',
                   root='$HOME/.cache/huggingface/lerobot/local/starkhacks_cell_pickplace')
d.repo_id = 'wbell7/starkhacks_cell_pickplace'
d.push_to_hub(tags=['lerobot','so101','starkhacks-2026'], private=False,
              upload_large_folder=True)
"

4. Train on MI300X

# On the cloud VM, first time:
bash cloud/00_mi300x_bootstrap.sh
# Sanity-check on a public dataset (a few minutes, covers MIOpen autotune):
bash cloud/01a_smoke_public.sh
# Full run on our data (~3 hours):
bash cloud/01_train_act_mi300x.sh

5. Evaluate on hardware (ship H34)

# Back on the local box:
./scripts/08_eval.sh

Performance recipe — how we got 3.4× on MI300X

From ROADMAP.md's benchmark log (same policy, same dataset, same batch, one variable at a time):

Config s/step Speedup Notes
fp32 baseline, batch 32 0.715 1.0× cold MIOpen
bf16 only 0.583 1.23× via ACCELERATE_MIXED_PRECISION=bf16
bf16 + MIOpen + TunableOp, cold 1.862 0.38× autotune tax
bf16 + MIOpen + TunableOp, warm 0.209 3.42× caches persist at /root/.miopen

The full env recipe lives in cloud/01_train_act_mi300x.sh:

export ACCELERATE_MIXED_PRECISION=bf16
export MIOPEN_FIND_MODE=3
export MIOPEN_FIND_ENFORCE=3
export MIOPEN_USER_DB_PATH=/root/.miopen
export MIOPEN_CUSTOM_CACHE_DIR=/root/.miopen
export PYTORCH_TUNABLEOP_ENABLED=1
export PYTORCH_TUNABLEOP_TUNING=1
export TORCH_BLAS_PREFER_HIPBLAS_LT=1
export HSA_NO_SCRATCH_RECLAIM=1
export GPU_MAX_HW_QUEUES=2

First run at a given batch size pays ~5 min autotune; reruns are warm.


Gotchas we hit (and fixed)

  • pynput can't capture keys under Wayland. Workaround: scripts/record_stdin.py wraps lerobot-record, replaces the pynput listener with a stdin reader (<Enter> / n / q).
  • Hands in frame on the first recorded frames. The reset-phase Enter was also the start-next-episode Enter. record_stdin.py now gates every new episode on a second explicit Enter, with a big terminal banner (no speaker on this box — visual feedback only).
  • Out-of-memory crashes corrupted parquet footers mid-record twice, losing 15 and then 6 episodes. Two defenses added:
    • --display_data=false (rerun in-memory buffer was the leak) + --dataset.num_image_writer_processes=1 (PNG writer into a subprocess).
    • meta/info.json: data_files_size_in_mb=1 and monkeypatched metadata_buffer_size=1, so the parquet writer rotates — and thus finalises its footer — after ≈ every episode. A future crash costs 1 episode, not 10.
  • upload_folder wedges on wifi swap. Old sockets stuck in CLOSE-WAIT for ~15 min. Kill + restart (files already committed to the hub are skipped on retry). We now use upload_large_folder=True + HF_HUB_ENABLE_HF_TRANSFER=1 for parallel chunked uploads.
  • wandb.login(key=…) rejects the wandb_v1_… token format (40-char hex check is legacy). WANDB_API_KEY as an env var bypasses that check and is what wandb.init() actually reads.
  • lerobot's deps pulled torch from PyPI over our ROCm wheel. Install torch first, then install lerobot with pip install -c torch_constraint.txt to pin the ROCm build.
  • Ubuntu 24.04 .bashrc returns early for non-interactive shells, so exports there never fire for SSH-invoked commands. Creds go in /etc/environment + /etc/profile.d/starkhacks_creds.sh.
  • --policy.push_to_hub=false is required locally — HF push is not configured on the local box. Omitting it makes training abort at the final checkpoint.
  • Camera key order is load-bearing. top, wrist exactly (UGREEN overhead, ARC on wrist). Must match between record and inference or SmolVLA-class policies fail silently; ACT likely the same.
  • wrist_roll calibration clipping. If teleop feels clamped, re-sweep both arms through their full wrist-roll range during calibration.

Automation glue built during this run

  • scripts/record_stdin.py — lerobot-record wrapper (stdin + banners + start gate + per-episode parquet rotation).
  • /tmp/upload_watcher.sh — polls local upload, fires VM prefetch when done.
  • /root/prefetch_and_train.sh (on VM) — waits for smoke to finish cleanly, prefetches the dataset from the Hub, then kicks off the full 50k-step train.
  • /tmp/train_watcher.sh — polls the VM training every 90 s, raises a notify-send popup on crash or completion, logs step transitions.

Links

About

StarkHacks 2026 Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors