KALM: Keypoint Abstraction using Large Models for Object-Relative Imitation Learning
Xiaolin Fang*,
Bo-Ruei Huang*,
Jiayuan Mao*,
Jasmine Shone,
Joshua B. Tenenbaum,
Tomás Lozano-Pérez,
Leslie Pack Kaelbling
ICRA 2025
CoRL Workshop on Language and Robot Learning, 2024. Best Paper Award
[Paper]
[Website]
@inproceedings{fang2025kalm,
title={{KALM: Keypoint Abstraction using Large Models for Object-Relative Imitation Learning}},
author={Xiaolin Fang* and Bo-Ruei Huang* and Jiayuan Mao* and Jasmine Shone and Joshua B. Tenenbaum and Tomás Lozano-Pérez and Leslie Pack Kaelbling},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2025}
}
[News]
Support for the SAM2 backbone has been added.
To use SAM2 for mask proposal, please follow the installation instructions from SAM2 and specify the backbone option USE_SAM2 in local_config.py accordingly.
Install the dependencies using conda:
conda env create -f environment.yaml
conda activate kalm
pip install git+https://github.com/mhamilton723/FeatUpFill in the OpenAI API key in the config (GPT_API_KEY).
cp kalm/configs/local_config_template.py kalm/configs/local_config.pyYou can specify the prompt (task name and description) in TASKNAME2DESC in kalm/vlm_client.py. We provide an example for data spec.
python -m scripts.main_kalm_distill_keypoints --save_path keypoint_files/example --use_gpt_guided_mask_in_query_image --task_name drawer --data_path keypoint_files/drawer_example_traj.npzWe assume RGB images and point cloud (in camera frame) as input .
The data is stored in a .npz file with the following format:
{
'reference_video_rgb': np.array, # (N, 512, 512, 3)
'reference_video_pcd': np.array, # (N, 256, 256, 3), in camera frame
'verification_dataset': [
{
'ee_poses_worldframe_trajectory': np.array, # (T, 4, 4)
'gripper_openness_trajectory': np.array, # (T, 1)
'joint_q_trajectory': np.array, # (T, 7)
'extrinsic': np.array, # (4, 4)
'observed_rgb': np.array, # (512, 512, 3)
'observed_pcd': np.array, # (256, 256, 3), in camera frame
} * N
]
}Train the keypoint-conditioned policy. The parameters could be found in the script.
bash scripts/train_kalmdiffuser.shWe provide a sample main_kalm_eval_robot.py on how the trained models are used at inference time. Please modify according to your hardware setup.
For debugging purpose, we provide a dummy evaluation pipeline that can be used for visualizing the predicted trajectories. To run on your own tasks, please add the configs accordingly in configs/model_config.py.
Run the dummy robot evaluation with observation from file.
python -m scripts.main_kalm_eval_robot --task drawer --dummy_data_path keypoint_files/drawer_sample_eval.npzThe data is stored in a .npz file with the following format:
{
'rgb_im': np.array,
'dep_im': np.array,
'extrinsic': np.array,
'intrinsic': np.array,
}Run the real robot evaluation.
python -m scripts.main_kalm_eval_robot --task drawer