# ðŸ¤— x ðŸ¦¾: Training SmolVLA with LeRobot for Carrot Pick-and-Place

Welcome to the **LeRobot SmolVLA training notebook** for the carrot pick-and-place task!

This notebook trains a `SmolVLA` policy using your recorded dataset from the `gpt-act` repo.

For official Lerobot SmolVLA notebook visit here:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/lerobot/training-smolvla.ipynb

## Requirements
- A HuggingFace dataset repo ID (e.g., `your-username/so101-pick-and-place-carrot`)
- Optional: [wandb](https://wandb.ai/) account for training visualization
- Recommended: GPU runtime (NVIDIA A100) for faster training

## Expected Training Time
Training for 20,000 steps takes **~5 hours on an NVIDIA A100**.
Note: SmolVLA requires >20k steps for good performance. 20k is an undertrained example.

## Instructions
1. **Update `--dataset.repo_id`** to match your HuggingFace dataset.
2. **Update `--policy.repo_id`** to where you want the trained model uploaded.
3. **Update the output directory** if you want to change where checkpoints are saved.
4. Run all cells in order.

## Install conda
Bootstrap a full Conda environment inside Google Colab.

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

## Mount Google Drive
Persist checkpoints across Colab sessions by mounting Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Install LeRobot
Clone LeRobot, install FFmpeg, and install the base package.

In [None]:
!git clone https://github.com/huggingface/lerobot.git
!conda install ffmpeg=7.1.1 -c conda-forge
!cd lerobot && pip install -e .

## Weights & Biases login
Log into W&B for experiment tracking (optional).

In [None]:
!wandb login

## Install SmolVLA dependencies
Install transformers and other SmolVLA-specific packages.

In [None]:
!cd lerobot && pip install -e ".[smolvla]"

## Start Training SmolVLA

**UPDATE THESE VALUES:**
- `--dataset.repo_id`: Your HuggingFace dataset ID (e.g., `your-username/so101-pick-and-place-carrot`)
- `--policy.repo_id`: Where to upload the trained policy (e.g., `your-username/smolvla_so101_pick_and_place_carrot`)
- `--output_dir`: Google Drive path for checkpoints

**Training Options:**
- `--batch_size=64`: Number of samples per training step (reduce if OOM)
- `--steps=20000`: Total training steps (increase to 100k-200k for production)
- `--rename_map`: Maps camera names from dataset to model's expected names
- `--policy.empty_cameras=1`: Tells model to expect 1 empty camera slot

In [None]:
!lerobot-train \
  --policy.path=lerobot/smolvla_base \
  --dataset.repo_id=sangam-101/so101-pick-and-place-carrot \
  --batch_size=64 \
  --steps=20000 \
  --policy.repo_id=sangam-101/smolvla_so101_pick_and_place_carrot \
  --output_dir=/content/drive/MyDrive/lerobot_runs/smolvla_so101_pick_and_place_carrot \
  --job_name=smolvla_so101_pick_and_place_carrot \
  --policy.device=cuda \
  --wandb.enable=true \
  --rename_map='{"observation.images.top": "observation.images.camera1","observation.images.wrist": "observation.images.camera2"}' \
  --policy.empty_cameras=1

## Login to HuggingFace Hub
After training completes, log in to upload the model.

In [None]:
!hf auth login

## Upload Model to HuggingFace

**UPDATE THESE VALUES:**
- First argument: Your policy repo ID
- Second argument: Path to the trained checkpoint (update step number if different)

By default, this uploads the checkpoint at step 20,000. Update if you trained for more steps.

In [None]:
!hf upload sangam-101/smolvla_so101_pick_and_place_carrot \
  /content/drive/MyDrive/lerobot_runs/smolvla_so101_pick_and_place_carrot/checkpoints/20000/pretrained_model

## Training Complete!

Your SmolVLA policy is now trained and uploaded to HuggingFace.

### Next Steps:
1. Update `scripts/run_inference_smolvla_pick_and_place.py` in `gpt-act` with your policy repo ID
2. Run inference on your robot:
   ```bash
   cd gpt-act
   source setup.sh
   python scripts/run_inference_smolvla_pick_and_place.py
   ```

**Note:** SmolVLA models trained for only 20k steps are undertrained and may not perform well. For production use, train for 100k-200k steps with a smaller batch size (e.g., 32) if memory allows.