# Train Models Using LeRobot on MI300x

This guide walks you through setting up environment for training imitation learning policies using LeRobot library on a DigitalOcean (DO) instance equipped with AMD MI300x GPUs and ROCm.

## ⚙️ Requirements
- A Hugging Face dataset repo ID containing your training data (`--dataset.repo_id=${HF_USER}/${DATASET_NAME}`).
  If you don’t have an access token yet, you can sign up for Hugging Face [here](https://huggingface.co/join). After signing up, create an access token by visiting [here](https://huggingface.co/settings/tokens).
- A wandb account to enable training visualization and upload your training evidence to our github.
  You can sign up for Wandb [here](https://wandb.ai/signup) and visit [here](https://wandb.ai/authorize) to create a token.
- Access to DO instance AMD Mi300x GPU


## Verify ROCm and GPU availability
This cell uses `pytorch` to check AMD GPU Info. The expected ouput is 
```
CUDA compatible device availability: True
device name [0]: AMD Instinct MI300X VF
```

In [76]:
import torch
print(f'CUDA compatible device availability:',torch.cuda.is_available())
print(f'device name [0]:', torch.cuda.get_device_name(0))

CUDA compatible device availability: True
device name [0]: AMD Instinct MI300X VF


## Install FFmpeg 7.x
This cell uses `apt` to install ffmpeg 7.x for LeRobot.

In [77]:
!add-apt-repository ppa:ubuntuhandbook1/ffmpeg7 -y # install PPA which contains ffmpeg 7.x
!apt update && apt install ffmpeg -y

Repository: 'Types: deb
URIs: https://ppa.launchpadcontent.net/ubuntuhandbook1/ffmpeg7/ubuntu/
Suites: noble
Components: main
'
Description:
unofficial build for FFmpeg 7 for Ubuntu 22.04 | 24.04, backport from Debian's deb.multimedia.org repository

If the packages here are helpful, you may buy me a coffee:

         https://ko-fi.com/ubuntuhandbook1
More info: https://launchpad.net/~ubuntuhandbook1/+archive/ubuntu/ffmpeg7
Adding repository.
Found existing deb entry in /etc/apt/sources.list.d/ubuntuhandbook1-ubuntu-ffmpeg7-noble.sources
Hit:1 https://repo.radeon.com/amdgpu/30.10/ubuntu jammy InRelease
Hit:2 http://archive.ubuntu.com/ubuntu noble InRelease                         
Hit:3 https://repo.radeon.com/rocm/apt/7.0 jammy InRelease                     
Get:4 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]        
Hit:5 https://repo.radeon.com/graphics/7.0/ubuntu jammy InRelease              
Get:6 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 

## Install LeRobot v0.4.1
This cell clones the `lerobot` repository from Hugging Face, and installs the package in editable mode. Extra Features: To install additional dependencies for training SmolVLA or Pi models, refer to the [LeRobot offical page](https://huggingface.co/docs/lerobot/index). 


In [78]:
!git clone https://github.com/huggingface/lerobot.git
!cd lerobot && git checkout -b v0.4.1 v0.4.1 # let’s synchronize using this version
!cd lerobot && pip install -e .

fatal: destination path 'lerobot' already exists and is not an empty directory.
fatal: a branch named 'v0.4.1' already exists
Obtaining file:///workspace/lerobot
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: lerobot
  Building editable for lerobot (pyproject.toml) ... [?25ldone
[?25h  Created wheel for lerobot: filename=lerobot-0.4.1-0.editable-py3-none-any.whl size=15631 sha256=9170a6834c1eb1d7f6877b436c3a4e01c043e252c3abed5a955f5643b6655668
  Stored in directory: /tmp/pip-ephem-wheel-cache-goz73oxa/wheels/05/0a/0d/80a4c08845345c44fe1e5f70929884983b90d85f46a77f7601
Successfully built lerobot
Installing collected packages: lerobot
  Attempting uninstall: lerobot
    Found existing installation: lerobot 0.4.1
    Uninstalling lerobot-0.4.1:
    

## Weights & Biases login
This cell install and log into Weights & Biases (wandb) to enable experiment tracking and logging.

In [79]:
!pip install wandb
import wandb
wandb.login(key="5480700e4b4d584d23da300912dc7e38db4d4970")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Login into Hugging Face Hub

In [None]:
from huggingface_hub import login
login(token="token_here")

In [107]:
!lerobot-edit-dataset \
    --repo_id Abubakar17/mission2_take_all_from_hand_2 \
    --operation.type merge \
    --operation.repo_ids "['Abubakar17/mission_2_take', 'Abubakar17/mission_2_take_ham', 'Abubakar17/mission_2_take_box']"


INFO 2025-12-14 01:31:44 _dataset.py:208 Loading 3 datasets to merge
INFO 2025-12-14 01:31:44 _dataset.py:213 Merging datasets into Abubakar17/mission2_take_all_from_hand_2
INFO 2025-12-14 01:31:44 ggregate.py:195 Start aggregate_datasets
Validate all meta data: 100%|██████████████████| 3/3 [00:00<00:00, 61680.94it/s]
INFO 2025-12-14 01:31:44 ggregate.py:226 Find all tasks
Copy data and videos: 100%|███████████████████████| 3/3 [00:03<00:00,  1.21s/it]
INFO 2025-12-14 01:31:48 ggregate.py:515 write tasks
INFO 2025-12-14 01:31:48 ggregate.py:518 write info
INFO 2025-12-14 01:31:48 ggregate.py:529 write stats
INFO 2025-12-14 01:31:48 ggregate.py:248 Aggregation complete.
INFO 2025-12-14 01:31:48 _dataset.py:220 Merged dataset saved to /root/.cache/huggingface/lerobot/Abubakar17/mission2_take_all_from_hand_2
INFO 2025-12-14 01:31:48 _dataset.py:221 Episodes: 60, Frames: 30290


In [37]:
from huggingface_hub import HfApi

hub_api = HfApi()
hub_api.create_tag("Abubakar17/mission2_take_all", tag="v3.0", repo_type="dataset")

## Start Training Models with LeRobot

This cell uses the lerobot-train CLI from the lerobot library to train a robot control policy.  

Make sure to adjust the following arguments to your setup:

1. `--dataset.repo_id=YOUR_HF_USERNAME/YOUR_DATASET`:  
   Replace this with the Hugging Face Hub repo ID where your dataset is stored, e.g., `lerobot/svla_so100_pickplace`.

2. `--policy.type=act`:  
   Specifies the policy configuration to use. `act` refers to [configuration_act.py](../lerobot/common/policies/act/configuration_act.py), which will automatically adapt to your dataset’s setup (e.g., number of motors and cameras).

3. `--output_dir=outputs/train/...`:  
   Directory where training logs and model checkpoints will be saved.

4. `--job_name=...`:  
   A name for this training job, used for logging and Weights & Biases.The name typically includes the model type (e.g., act, smolvla), the dataset name, and additional descriptive tags.

5. `--policy.device=cuda`:  
   Use `cuda` if training on an AMD or NVIDIA GPU. 

6. `--wandb.enable=true`:  
   Enables Weights & Biases for visualizing training progress. You must be logged in via `wandb login` before running this.

7. `--policy.push_to_hub=`:

   Enables automatic uploading of the trained policy to the Hugging Face Hub. You must specify `--policy.repo_id` (e.g., ${HF_USER}/{REPO_NAME}) if it is True.

In [108]:
!lerobot-train \
  --dataset.repo_id=Abubakar17/mission2_take_all_from_hand_2 \
  --batch_size=16 \
  --steps=15000 \
  --output_dir=outputs/train/act_so101_mission_2_take_all_from_hand_output_2 \
  --job_name=act_so101_mission_2_take_all_from_hand_job_2 \
  --policy.repo_id=Abubakar17/take_all_from_hand_2 \
  --policy.device=cuda \
  --policy.type=act \
  --policy.push_to_hub=true \
  --wandb.enable=true  \
  --dataset.image_transforms.enable=True  \
  --save_freq=2000  \
  --eval_freq=5000

INFO 2025-12-14 01:32:24 ot_train.py:163 {'batch_size': 16,
 'checkpoint_path': None,
 'dataset': {'episodes': None,
             'image_transforms': {'enable': True,
                                  'max_num_transforms': 3,
                                  'random_order': False,
                                  'tfs': {'affine': {'kwargs': {'degrees': [-5.0,
                                                                            5.0],
                                                                'translate': [0.05,
                                                                              0.05]},
                                                     'type': 'RandomAffine',
                                                     'weight': 1.0},
                                          'brightness': {'kwargs': {'brightness': [0.8,
                                                                                   1.2]},
                                                         't

In [26]:
!huggingface-cli repo create Abubakar17/so101-mission_2_drop_ham --type model


[33mThe --type argument is deprecated and will be removed in a future version. Use --repo-type instead.[0m
Successfully created [1mAbubakar17/so101-mission_2_drop_ham[0m on the Hub.
Your repo is now available at [1mhttps://huggingface.co/Abubakar17/so101-mission_2_drop_ham[0m


In [27]:
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
    folder_path="outputs/train/act_so101_mission_2_drop_ham_output/checkpoints/015000/pretrained_model",
    repo_id="Abubakar17/so101-mission_2_drop_ham",
    repo_type="model",
)

Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

CommitInfo(commit_url='https://huggingface.co/Abubakar17/so101-mission_2_drop_ham/commit/c38c51659966f9dd9b1f24d0a22b9e2222dc9037', commit_message='Upload folder using huggingface_hub', commit_description='', oid='c38c51659966f9dd9b1f24d0a22b9e2222dc9037', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Abubakar17/so101-mission_2_drop_ham', endpoint='https://huggingface.co', repo_type='model', repo_id='Abubakar17/so101-mission_2_drop_ham'), pr_revision=None, pr_num=None)

X-VLA

In [54]:
!lerobot-edit-dataset \
    --repo_id Abubakar17/mission2_drop_all_xvla \
    --operation.type merge \
    --operation.repo_ids "['Abubakar17/mission_2_drop_box', 'Abubakar17/mission_2_drop_ham', 'Abubakar17/mission_2_drop_tape']"

INFO 2025-12-13 18:52:38 _dataset.py:208 Loading 3 datasets to merge
INFO 2025-12-13 18:52:38 _dataset.py:213 Merging datasets into Abubakar17/mission2_drop_all_xvla
INFO 2025-12-13 18:52:38 ggregate.py:195 Start aggregate_datasets
Validate all meta data: 100%|█████████████████| 3/3 [00:00<00:00, 104857.60it/s]
INFO 2025-12-13 18:52:38 ggregate.py:226 Find all tasks
Copy data and videos: 100%|███████████████████████| 3/3 [00:13<00:00,  4.53s/it]
INFO 2025-12-13 18:52:52 ggregate.py:515 write tasks
INFO 2025-12-13 18:52:52 ggregate.py:518 write info
INFO 2025-12-13 18:52:52 ggregate.py:529 write stats
INFO 2025-12-13 18:52:52 ggregate.py:248 Aggregation complete.
INFO 2025-12-13 18:52:52 _dataset.py:220 Merged dataset saved to /root/.cache/huggingface/lerobot/Abubakar17/mission2_drop_all_xvla
INFO 2025-12-13 18:52:52 _dataset.py:221 Episodes: 120, Frames: 79781


In [84]:
!lerobot-train \
  --dataset.repo_id=Abubakar17/mission2_drop_all_xvla \
  --batch_size=16 \
  --steps=15000 \
  --output_dir=outputs/train/act_so101_mission_2_drop_all_output \
  --job_name=act_so101_mission_2_drop_all_job \
  --policy.repo_id=Abubakar17/drop_all_hand \
  --policy.device=cuda \
  --policy.type=act \
  --policy.push_to_hub=true \
  --wandb.enable=true  \
  --dataset.image_transforms.enable=True  \
  --save_freq=2000  \
  --eval_freq=5000

INFO 2025-12-13 19:48:27 ot_train.py:163 {'batch_size': 16,
 'checkpoint_path': None,
 'dataset': {'episodes': None,
             'image_transforms': {'enable': True,
                                  'max_num_transforms': 3,
                                  'random_order': False,
                                  'tfs': {'affine': {'kwargs': {'degrees': [-5.0,
                                                                            5.0],
                                                                'translate': [0.05,
                                                                              0.05]},
                                                     'type': 'RandomAffine',
                                                     'weight': 1.0},
                                          'brightness': {'kwargs': {'brightness': [0.8,
                                                                                   1.2]},
                                                         't

In [102]:
!huggingface-cli repo create Abubakar17/ext_drop_box --type model

[33mThe --type argument is deprecated and will be removed in a future version. Use --repo-type instead.[0m
Successfully created [1mAbubakar17/ext_drop_box[0m on the Hub.
Your repo is now available at [1mhttps://huggingface.co/Abubakar17/ext_drop_box[0m


In [None]:
from huggingface_hub import login
login(token="token_here")

In [103]:
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
    folder_path="outputs/train/act_so101_mission_2_ext_drop_box_output/checkpoints/015000/pretrained_model",
    repo_id="Abubakar17/ext_drop_box",
    repo_type="model",
)

Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

CommitInfo(commit_url='https://huggingface.co/Abubakar17/ext_drop_box/commit/03c77c98234f0992fd1e9b14e76ec56b501906ec', commit_message='Upload folder using huggingface_hub', commit_description='', oid='03c77c98234f0992fd1e9b14e76ec56b501906ec', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Abubakar17/ext_drop_box', endpoint='https://huggingface.co', repo_type='model', repo_id='Abubakar17/ext_drop_box'), pr_revision=None, pr_num=None)

In [104]:
!lerobot-train \
  --dataset.repo_id=Abubakar17/mission_2_drop_tape \
  --batch_size=16 \
  --steps=13500 \
  --output_dir=outputs/train/act_so101_mission_2_ext_drop_tape_output \
  --job_name=act_so101_mission_2_ext_drop_tape_job \
  --policy.repo_id=Abubakar17/ext_drop_tape \
  --policy.device=cuda \
  --policy.type=act \
  --policy.push_to_hub=true \
  --wandb.enable=true  \
  --dataset.image_transforms.enable=True  \
  --save_freq=2000  \
  --eval_freq=5000

INFO 2025-12-14 00:06:01 ot_train.py:163 {'batch_size': 16,
 'checkpoint_path': None,
 'dataset': {'episodes': None,
             'image_transforms': {'enable': True,
                                  'max_num_transforms': 3,
                                  'random_order': False,
                                  'tfs': {'affine': {'kwargs': {'degrees': [-5.0,
                                                                            5.0],
                                                                'translate': [0.05,
                                                                              0.05]},
                                                     'type': 'RandomAffine',
                                                     'weight': 1.0},
                                          'brightness': {'kwargs': {'brightness': [0.8,
                                                                                   1.2]},
                                                         't

## Download Models from Hugging Face to Local Machine
Now after training is done, download the model to local machine. 

In [None]:
!huggingface-cli download ${HF_USER}/{REPO_NAME} --repo-type model --local-dir path/to/model
# e.g. huggingface-cli upload ${HF_USER}/act_so101_3cube_1ksteps \
#  outputs/train/act_so101_3cube_1ksteps/checkpoints/last/pretrained_model

## Miscs
1. Once the environment is setup, you can open a terminal session for training by navigating to `File → New Launcher → Other → Terminal`.
2. You can also upload your datasets to the container by clicking the `Upload Files` button in the left pane.

## Q&A
1. If you encounter an error like:
   ```
   FileExistsError: Output directory outputs/train/act_so101_3cube_1ksteps already exists and resume is False. Please change your output directory so that outputs/train/act_so101_3cube_1ksteps is not overwritten. 
   ```
   Remove the existing directory before proceeding:

In [None]:
!rm -fr outputs/train/act_so101_3cube_1ksteps

2. When running models other than ACT, ensure you install the required additional dependencies for those models.

In [None]:
# For smolVLA
!cd lerobot && pip install -e ".[smolvla]"
# For Pi
!cd lerobot && pip install -e ".[pi]"

3. If you want to resume the training from last checkpoint, run the command below:

In [None]:
!lerobot-train \
  --resume=true \
  --config_path=outputs/train/<job name>/checkpoints/last/pretrained_model/train_config.json \
  --steps=<new total steps>

4. If you want to upload your dataset using `huggingface-cli upload <repo name> <path to the dataset> --repo-type=dataset`, be sure to set a codebase tag like below:

In [None]:
from huggingface_hub import HfApi
from huggingface_hub import login

login(token="your_huggingface_token")
hub_api = HfApi()
hub_api.create_tag(<HF_REPO_NAME>, tag="v3.0", revision="main", repo_type="dataset")