# 微调

本教程说明如何在相同实施的后训练数据集上微调`GR00T-N1`预训练检查点。这展示了后训练的好处，将通用模型转变为专家模型并展示性能提升。

在本教程中，我们将使用[demo_data](./demo_data)文件夹中的演示数据集`robot_sim.PickNPlace`。

我们将首先加载预训练模型并在数据集上评估它。然后我们将在数据集上微调模型并评估性能。

## 预训练模型

In [None]:
from gr00t.utils.eval import calc_mse_for_single_trajectory
import warnings
from gr00t.experiment.data_config import DATA_CONFIG_MAP
from gr00t.model.policy import Gr00tPolicy
from gr00t.data.schema import EmbodimentTag
from gr00t.data.dataset import LeRobotSingleDataset
import numpy as np
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

warnings.simplefilter("ignore", category=FutureWarning)

In [None]:
PRE_TRAINED_MODEL_PATH = "nvidia/GR00T-N1-2B"
EMBODIMENT_TAG = EmbodimentTag.GR1
DATASET_PATH = "../demo_data/robot_sim.PickNPlace"


data_config = DATA_CONFIG_MAP["gr1_arms_only"]
modality_config = data_config.modality_config()
modality_transform = data_config.transform()


pre_trained_policy = Gr00tPolicy(
    model_path=PRE_TRAINED_MODEL_PATH,
    embodiment_tag=EMBODIMENT_TAG,
    modality_config=modality_config,
    modality_transform=modality_transform,
    device=device,
)

dataset = LeRobotSingleDataset(
    dataset_path=DATASET_PATH,
    modality_configs=modality_config,
    video_backend="decord",
    video_backend_kwargs=None,
    transforms=None,  # We'll handle transforms separately through the policy
    embodiment_tag=EMBODIMENT_TAG,
)


mse = calc_mse_for_single_trajectory(
    pre_trained_policy,
    dataset,
    traj_id=0,
    modality_keys=["right_arm", "right_hand"],   # we will only evaluate the right arm and right hand
    steps=150,
    action_horizon=16,
    plot=True
)

print("MSE loss for trajectory 0:", mse)

太好了！我们可以看到预测的动作和真实动作。预测的动作不是完美的，但它们接近真实动作。这表明预训练检查点工作得很好。

现在让我们随机抽样10个轨迹并计算平均MSE，以获得更详细的结果。

In [None]:
total_trajectories = len(dataset.trajectory_lengths)

print("Total trajectories:", total_trajectories)

sampled_trajectories = np.random.choice(total_trajectories, 10)
print("Sampled trajectories:", sampled_trajectories)

all_mses = []

for traj_id in sampled_trajectories:
    mse = calc_mse_for_single_trajectory(
        pre_trained_policy,
        dataset,
        traj_id=traj_id,
        modality_keys=["right_arm", "right_hand"],   # we will only evaluate the right arm and right hand
        steps=150,
        action_horizon=16,
        plot=False
    )
    print(f"Trajectory {traj_id} MSE: {mse:.4f}")
    
    all_mses.append(mse)

print("====================================")
print("Mean MSE:", np.mean(all_mses))
print("Std MSE:", np.std(all_mses))


## 微调模型

现在我们将在数据集上微调模型。不深入微调过程的细节，我们将使用`gr00t_finetune.py`脚本来微调模型。您可以运行以下命令来微调模型。

```bash
python scripts/gr00t_finetune.py --dataset-path ./demo_data/robot_sim.PickNPlace --num-gpus 1 --max-steps 500 --output-dir /tmp/gr00t-1/finetuned-model --data-config gr1_arms_only
```

_要获取可用参数的完整列表，您可以运行`python scripts/gr00t_finetune.py --help`。_

脚本将在`/tmp/gr00t-1/finetuned-model`目录中保存微调后的模型。我们将加载具有`500`个检查点步骤的微调模型。

### 微调模型的评估

现在我们可以通过在数据集上运行策略来评估微调后的模型，看看它表现如何。我们将使用一个实用函数在数据集上评估策略。这类似于之前的教程[1_pretrained_model.ipynb](1_pretrained_model.ipynb)

In [None]:
finetuned_model_path = "/tmp/gr00t-1/finetuned-model/checkpoint-500"

from gr00t.utils.eval import calc_mse_for_single_trajectory
import warnings

finetuned_policy = Gr00tPolicy(
    model_path=finetuned_model_path,
    embodiment_tag="new_embodiment",
    modality_config=modality_config,
    modality_transform=modality_transform,
    device=device,
)

warnings.simplefilter("ignore", category=FutureWarning)

mse = calc_mse_for_single_trajectory(
    finetuned_policy,
    dataset,
    traj_id=0,
    modality_keys=["right_arm", "right_hand"],   # we will only evaluate the right arm and right hand
    steps=150,
    action_horizon=16,
    plot=True
)

print("MSE loss for trajectory 0:", mse)

耶！我们已经微调了模型并在数据集上对其进行了评估。我们可以看到，模型已经学会了任务，并且能够比预训练模型更好地执行任务。 