Skip to content


Repository files navigation

🦾 GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

License Paper Website Python PyTorch

Pu Hua$^{1*}$, Minghuan Liu$^{2,3*}$, Annabella Macaluso$^{2^*}$, Yunfeng Lin$^{3}$, Weinan Zhang$^{3}$, Huazhe Xu$^{1}$, Lirui Wang$^{4}$

$^1$ Tsinghua University, $^2$ UCSD, $^3$ Shanghai Jiao Tong University, $^4$ MIT CSAIL * equal contribution

Project Page | Arxiv

Conference on Robot Learning, 2024

This repo explores using an LLM code generation pipeline to generate task codes & demonstrations for zero-shot and few-shot sim2real transfer.

⚙️ Install

  1. Clone the repository

    git clone --recursive
    cd gensim2
  2. Create a conda environment

    conda create -n gensim2 python=3.9 -y
    conda activate gensim2
  3. Install PyTorch which matches your cuda version (check with nvcc --version), or you may meet with errors when installing pytorch3d later. Please refer to the PyTorch website for the installation commands. For example, for cuda 11.8 :

    conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
  4. Install other dependencies


    If you meet with errors with the installation above, you can refer to a detailed step-by-step installation here.

🚶Getting Started

1. Generate Simulated Tasks with GenSim2: A Video Tutorial

First you should add your OpenAI API key to the environment variable by running:

export OPENAI_KEY=your_openai_api_key

To run the GenSim2 pipeline, you can use the following commands:

# Generate a primitive task
python gensim2/pipeline/ prompt_folder=keypoint_pipeline_articulated_3stage prompt_data_folder=data_articulated/

# Generate a long-horizon task with the top-down approach
python gensim2/pipeline/ prompt_folder=keypoint_pipeline_longhorizon_topdown prompt_data_folder=data_longhorizon/

# Generate a long-horizon task with the bottom-up approach
# Remember to store adequate tasks in the prompt_data_folder
python gensim2/pipeline/ prompt_folder=keypoint_pipeline_longhorizon_bottomup prompt_data_folder=data_longhorizon/ mode=bottomup

For more details of how to create a kPAM solver with GenSim2 (especially with multi-modal LLM and rejection sampling), please refer to solver_creation.

Common command line arguments for

prompt_folder: Name of the prompt folder in prompts/ to use for the pipeline.

prompt_data_folder: Name of the data folder in prompts/ to use for the pipeline, including the asset library and initial task libraty.

output_folder: Name of the output folder to save the generated results. (Default to be logs/).

num_tasks: Number of tasks to generate. (Default to be 1).

solver_trials: Number of solver configs to output in each generation iteration. (Default to be 3).

max_regeneration: Maximum number of times to regenerate a task before giving up. (Default to be 5).

gpt_model: GPT model to use for task proposal and task decomposition. (Default to be "gpt-4o").

gpt_temperature: GPT temperature for task proposal and task decomposition. (Default to be 0.3 to ensure stability).

visual_solver_generation: Whether to use multi-modal LLM (GPT-4V) in solver generation. You need a monitor to load a GUI. (Default to be False).

solver_temperature: GPT-4V temperature for solver generation. (Default to be 0.8 to encourage diversity).

reject_sampling: Whether to use rejection sampling in the pipeline. You need a monitor to load a GUI. (Default to be True).

target_task_name: Name of the target task to generate. (Default to be None).

target_object_name: Name of the target object to use for generation. (Default to be None).

mode: Mode of the long-horizon task generation. (Default to be "topdown").

For more arguments, please refer to the pipeline config.

To run a generated task with a kPAM solver, you can use the following commands:

# Run the task "OpenBox"
python scripts/ --env OpenBox
Common command line arguments for

--env Name of the environment to run. (Default to be "OpenBox").

--asset_id ID of the asset to use for the environment. It can be an id (number) in the asset folder assets/articulated_objs/ARTICULATED_NAME/, or "" to represent a pre-defined instance, or "random" represents a randomly chosen id from the folder. (Default to be "").

--random Whether to randomize the initial poses of the objects. (Add this flag to set true).

--render Whether to render the environment. You need a monitor to load a GUI. (Add this flag to set true).

--num_episode Number of episodes to run. (Default to be 5).

--max_steps Maximum number of steps to run in each episode. (Default to be 500).

--video Whether to save the video of the environment. (Add this flag to set true).

--early_stop Whether to early stop the episode if the task is completed. (Add this flag to set true).

2. Generate Demonstrations

To generate demonstrations for generated tasks, you can use the following commands

# Collect data for multiple tasks with multi-processing
# Remember to modify envs in to assign tasks for demonstration collection
python scripts/ --dataset gensim2 --random --asset_id random --obs_mode pointcloud --save

# Collect data for a given task (e.g. "OpenBox") without multi-processing
python scripts/ --env OpenBox --dataset gensim2 --random --asset_id random --obs_mode pointcloud --save
Common command line arguments for

--env Name of the environment to run. If not None, the variable "envs" in the script will be overwritten by the given task. (Default to be None).

--dataset Name of the collected dataset. It will appear in folder gensim2/agent/data/.

--asset_id ID of the asset to use for the environment. It can be an id (number) in the asset folder assets/articulated_objs/ARTICULATED_NAME/, or "" to represent a pre-defined instance, or "random" represents a randomly chosen id from the folder. (Default to be "").

--random Whether to randomize the initial poses of the objects. (Add this flag to set true).

--render Whether to render the environment. You need a monitor to load a GUI. (Add this flag to set true).

--num_episode Number of episodes to run. (Default to be 5).

--max_steps Maximum number of steps to run in each episode. (Default to be 500).

--obs_mode The modality of your observation, supporting "state", "image", and "pointcloud". (Default to be "pointcloud").

--save Whether to save the collected data. (Add this flag to set true).

--nprocs Number of processes to use for data collection. Only for (Default to be 20).

For a collected dataset, you can try the following commands to check its size:

# Check the size of a dataset, e.g. gensim2
python gensim2/agent/dataset/ --dataset gensim2

3. Train&Test Multi-Task Policy

Download the pretrained models from Google Drive and put them in the gensim2/agent/experiments/pretrained_weights/ folder. After generating demonstrations, you can train (and test) a multi-task policy with the following commands:

cd gensim2/agent/
# Train
python suffix=gensim2_multitask domains=gensim2

# Test
python suffix=gensim2_multitask domains=gensim2 train.total_epochs=0 train.pretrained_dir=dir_or_path_to_the_model(.pth)
Common command line arguments for

suffix: Name of the current run.

domains: Name of the dataset to use for training.

env: Name of the environment to use for training. Select from gensim2/agent/experiments/configs/env. (Default to be "gensim2").

dataset.action_horizon: Number of predicted action sequences. Should be set to 1 if you use MLP as policy head. (Default to be 4).

dataset.observation_horizon: Number of historical observation sequences. (Default to be 3).

train.total_epochs: Number of training epochs. Set to 0 if you aim to evaluate a trained policy. (Default to be 250).

train.pretrained_dir: Directory or path to the pretrained model to load for evaluation. (Default to be None).

rollout_runner.env_names: Names of the environments to use for testing. You need to modify this argument in the env config, e.g. Line 42 in gensim2 config.

For more arguments, please refer to the training config and env config.

If you are interested in trying PPT in RLBench, please follow this instruction.

4. Sim2Real transfer

For steps on performing Sim2Real transfer, please read the following README.

Task List

Please refer to task list for a full list of supported tasks.


  • We would like to thank Professor Xiaolong Wang for his kind support and discussion of this project. We thank Yuzhe Qin and Fanbo Xiang for their generous help in sapien development. We thank Mazeyu Ji for his help on real-world experiments.
  • The dataset and modeling codes are referred to HPT.


If you find GenSim2 useful, please consider citing:

      title={GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs}, 
      author={Pu Hua and Minghuan Liu and Annabella Macaluso and Yunfeng Lin and Weinan Zhang and Huazhe Xu and Lirui Wang},

If you have any questions, consider to contact Pu Hua, Lirui Wang or Minghuan Liu.


No description, website, or topics provided.







No releases published


No packages published
