# Train and Finetune
This section shows how to train and finetune EO-1 on libero and custom dataset. Detailed scripts can be found in [../experiments/2_libero](../experiments/2_libero/train.sh).

## 1. Download Libero Dataset and Qwen2.5-VL-3B-Instruct

Before running the following code, please download the libero dataset from the [libero-benchmark-dataset](https://huggingface.co/collections/IPEC-COMMUNITY/libero-benchmark-dataset-684837af28d465aa8b043950) and Qwen2.5-VL-3B-Instruct model from the [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) with huggingface-cli.

```bash
# Install Hugging Face CLI if not already installed
pip install huggingface-cli
huggingface-cli login

# Download libero dataset
datasets=(
    libero_spatial_no_noops_1.0.0_lerobot
    libero_object_no_noops_1.0.0_lerobot
    libero_90_no_noops_lerobot
    libero_10_no_noops_1.0.0_lerobot
)

HF_LEROBOT_HOME=YOUR_PATH_TO_DATASET

for dataset in ${datasets[@]};
do
  echo "Downloading ${dataset}..."
  huggingface-cli download \
  --repo-type dataset --resume-download --local-dir-use-symlinks False \
  IPEC-COMMUNITY/${dataset} \
  --local-dir ${HF_LEROBOT_HOME}/${dataset}
done
```

Download the Qwen2.5-VL-3B-Instruct model from the [Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) with huggingface-cli.

```bash
huggingface-cli download \
  --resume-download --local-dir-use-symlinks False \
  Qwen/Qwen2.5-VL-3B-Instruct \
  --local-dir ../pretrained/Qwen2.5-VL-3B-Instruct
```


## 2. Finetune on Libero Dataset

Set the dataset config in `experiments/2_libero/data-libero.yaml` according to the metadata `info.json` in the dataset:

```yaml
lerobot_datasets:
  - repo_id: libero_spatial_no_noops_1.0.0_lerobot
    root: HF_LEROBOT_HOME
    select_video_keys: [observation.images.image, observation.images.wrist_image]
    select_state_keys: [observation.state]
    select_action_keys: [action]

  - repo_id: libero_90_no_noops_lerobot
    root: HF_LEROBOT_HOME
    select_video_keys: [observation.images.image, observation.images.wrist_image]
    select_state_keys: [observation.state]
    select_action_keys: [action]

  - repo_id: libero_object_no_noops_1.0.0_lerobot
    root: HF_LEROBOT_HOME
    select_video_keys: [observation.images.image, observation.images.wrist_image]
    select_state_keys: [observation.state]
    select_action_keys: [action]

  - repo_id: libero_10_no_noops_1.0.0_lerobot
    root: HF_LEROBOT_HOME
    # automatically load all features if not specified

```

Start training with the following command, and the model will be saved in `./outputs/libero_train`.

```bash
accelerate launch $ACCELERATE_ARGS scripts/train.py \
    --vlm-name-or-path ../pretrained/Qwen2.5-VL-3B-Instruct \
    --data-path experiments/2_libero/data-libero.yaml \
    --chunk-size 8 \
    --dataloader-num-workers 8 \
    --bf16 True \
    --tf32 True \
    --fp16 False \
    --num-train-epochs 50 \
    --per-device-train-batch-size 256 \
    --learning-rate 1e-4 \
    --merger-lr 1e-4 \
    --vision-lr 2e-5 \
    --weight-decay 0.1 \
    --warmup-ratio 0.03 \
    --lr-scheduler-type cosine \
    --gradient-checkpointing True \
    --save-strategy steps \
    --logging-steps 100 \
    --save-steps 5000 \
    --save-total-limit 3 \
    --report-to none \
    --run-name libero_train \
    --attn-implementation flash_attention_2
```

## 3 Visualize the Trained Model

Use the following command to visualize the trained model, where [../tools/openloop.py](../tools/openloop.py) read a lerobot dataset and visualize the inference action trajectory with the trained model.

```bash
python tools/openloop.py \
    --repo-id libero_spatial_no_noops_1.0.0_lerobot \
    --root HF_LEROBOT_HOME \
    --model_path ./outputs/libero_train/path/to/checkpoint
```

The script will visualize the inference action trajectory. With the following result:

<img src="../.assets/openloop_example.png" width="500">

## 4 Finetune on Custom Dataset

To fine-tune **EO-1** on your own embodiment, you only need to adapt the configuration file. Specifically, convert your dataset into the LeRobot format, then define the fields that describe where your videos, states, and actions are located.

### 4.1 Dataset Conversion with Any4LeRobot

[Any4LeRobot](https://github.com/Tavish9/any4lerobot) is a comprehensive tool collection for LeRobot that provides data conversion scripts, preprocessing tools, and training workflow helpers. Supported Input Formats

- **Custom Video + State + Action**: Convert from custom data structures
- **RLDS**: Convert from RLDS (Reinforcement Learning Datasets) format
- **RoboSet**: Convert from RoboSet format
- **Custom JSON**: Convert from custom JSON configurations

Please refer to the [Any4LeRobot](https://github.com/Tavish9/any4lerobot), clone the repo, and select the corresponding format to convert your dataset.

### 4.2 Dataset Configuration

Once your dataset is converted to LeRobot format, create a configuration file (e.g., `custom_dataset.yaml`):

```yaml
# @multimodal data config
# leave empty if only robot control data
mm_datasets:

lerobot_datasets:
  - repo_id: your_custom_dataset_name  # replace with your dataset name
    root: ./your_dataset_path/         # replace with your dataset root path
    select_video_keys: [
        observation.images.image,      # replace with your video feature keys
        observation.images.wrist_image,
      ]
    select_state_keys: [observation.state]  # replace with your state keys
    select_action_keys: [action]            # replace with your action keys
    # Optional fields:
    episodes: [1, 2, 3]                     # specific episodes to load (None = all)
    train_subtask: mix:0.9                  # mix sub-task instructions and overall instructions
    delta_action: false                     # train with delta actions
    state_mode: "MEAN_STD"                  # state normalization mode
    effector_indices: [14, 15]              # indices of effector channels
    weight: 1.0                             # dataset weight for sampling

  # Add more datasets if needed
  - repo_id: another_dataset
    root: ./another_dataset_path/
    # If not specified, uses all keys by default
```

### 4.3 Training Configuration

Create a training script (e.g., `train_custom.sh`) based on the Libero training script:

```bash
#!/bin/bash

# Set your custom dataset path
CUSTOM_DATA_PATH="experiments/custom/data-custom.yaml"
OUTPUT_DIR="./outputs/custom_train"

# Training hyperparameters
ACCELERATE_ARGS="--config_file accelerate_config.yaml"
VLM_PATH="../pretrained/Qwen2.5-VL-3B-Instruct"

# Launch training
accelerate launch $ACCELERATE_ARGS scripts/train.py \
    --vlm-name-or-path $VLM_PATH \
    --data-path $CUSTOM_DATA_PATH \
    --chunk-size 8 \
    --dataloader-num-workers 8 \
    --bf16 True \
    --tf32 True \
    --fp16 False \
    --num-train-epochs 50 \
    --per-device-train-batch-size 256 \
    --learning-rate 1e-4 \
    --merger-lr 1e-4 \
    --vision-lr 2e-5 \
    --weight-decay 0.1 \
    --warmup-ratio 0.03 \
    --lr-scheduler-type cosine \
    --gradient-checkpointing True \
    --save-strategy steps \
    --logging-steps 100 \
    --save-steps 5000 \
    --save-total-limit 3 \
    --report-to none \
    --run-name custom_train \
    --attn-implementation flash_attention_2 \
    --output-dir $OUTPUT_DIR
```

### 4.6 Tips for Custom Datasets

1. **Data Quality**: Ensure your dataset has consistent video frame rates and action frequencies
2. **Feature Keys**: Verify that your `select_video_keys`, `select_state_keys`, and `select_action_keys` match your dataset's metadata
3. **Episode Selection**: Use the `episodes` field to select specific episodes for training/testing
4. **State Normalization**: Choose appropriate `state_mode` (MEAN_STD, MIN_MAX, or NONE) based on your data distribution
5. **Memory Management**: Adjust `chunk_size` and `per-device-train-batch-size` based on your GPU memory

### 4.7 Troubleshooting

- **Data Loading Issues**: Check that your dataset follows LeRobot format and paths are correct
- **Memory Errors**: Reduce batch size or chunk size
- **Training Instability**: Adjust learning rates or add gradient clipping
- **Poor Performance**: Verify data quality and feature selection
