Wei Sui2β, Yuxin Guo1,3, Hu Su3β
2D-Robotics
3State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS),
Institute of Automation, Chinese Academy of Sciences
4Horizon Robotics
SceneShowcase_compressed.mp4
- [2026-06-18] π TabletopGen has been accepted to ECCV 2026!
- [2025-12-30] π€ We have released the Robotic Manipulation Demo code and assets on Hugging Face.
- [2025-12-30] π¨ A Scene Gallery containing diverse generated 3D tabletop scenes (GLB format) is now available on Hugging Face.
- [2025-12-10] π TabletopGen is now open source!
Simulation provides a low-cost, scalable pathway to large-scale robotic manipulation data collection. However, existing 3D scene generation methods can rarely be applied directly to manipulation data synthesis, as their generated scenes often lack instance-level interactivity and physical plausibility.
Focusing on tabletop manipulation, we propose TabletopGen, a training-free and automated tabletop scene generation and interactive simulation engine. Starting from text or a single image, we first obtain independent 3D object models via generative instance extraction. Second, we introduce a novel pose and scale alignment approach that recovers a collision-free scene layout using a Differentiable Rotation Optimizer and a Top-View Spatial Alignment mechanism.
Finally, we assemble the generated scene in a physics simulator with collision geometry, yielding a stable, interactable environment for synthesizing multimodal manipulation data. Extensive experiments and user studies demonstrate that TabletopGen achieves state-of-the-art performance in visual fidelity, layout accuracy, and physical plausibility.
Furthermore, we validate the executability of the collected trajectories on a real robotic arm via zero-shot real-to-sim-to-real policy transfer, indicating that TabletopGen can serve as a reliable data engine for robotic manipulation data synthesis.
We release the 18 scenes showcased on our project website for quick preview and testing. These models cover diverse scene types (e.g., office, dining, workshop) and various styles (e.g., realistic, cartoon).
| Description | Download |
|---|---|
| Project Showcase Collection Contains all 18 high-fidelity interactive scenes featured on our website. |
π Browse on Hugging Face |
Note: All scenes are in
.glbformat with separated distinct instances, ready to be imported into 3D renderers for visualization or assigned physical properties for robotic simulation.
This project utilizes two distinct environments, tabletopgen and rotation, to handle complex dependencies.
We provide an automated setup workflow. You do not need to manually configure the two environments or compile dependencies one by one.
git clone https://github.com/D-Robotics-AI-Lab/TabletopGen.git
cd TabletopGenWe provide a shell script that automatically:
- Creates the primary environment
tabletopgen(CUDA 11.8, Torch 2.6). - Compiles Grounded-SAM-2 and installs BiRefNet.
- Creates the secondary environment
rotation(CUDA 12.1, PyTorch3D).
For Linux Users:
Please export your local CUDA path before running the script (required for compiling Grounded-SAM-2):
# Replace with your own CUDA path (e.g., /usr/local/cuda-11.8)
export CUDA_HOME=/path/to/cuda-11.8
bash install_env.shβ Note: This process involves compiling CUDA extensions locally. It may take a few minutes depending on your network and CPU.
Run this script to automatically download the correct checkpoints for BiRefNet, SAM 2.1, and Grounding DINO to their respective directories.
# Activate the main environment first
conda activate tabletopgen
# Run the auto-download script
python install_scripts/download_weights.pyBefore running the pipeline, please configure your API settings (e.g., OpenAI, Hunyuan3D, etc.) in the configuration file:
# Edit this file with your own API settings
configs/config.yamlIf you do not have an input image, you can generate one from text using text2img.py.
- Arguments:
--doubao_api_key: Your API key for the generation service.--text: Description of the scene (e.g., "A hobby desk with some model cars and tools.").--id(Optional): Manually specify the generated image ID. If omitted, it auto-increments.
- Output: Generated images will be saved in
scene_image/.
conda activate tabletopgen
python text2img.py --doubao_api_key "YOUR_API_KEY" --text "A hobby desk with some model cars and tools."Run the main pipeline to generate the 3D scene.
Arguments:
--input_image(Required): Path to the input image file.--scene_id(Optional): Manually specify the Scene ID (directory name).--skip_step(Optional): Skip specific pipeline steps (space-separated integers). Useful for debugging or resuming.
Example Commands:
conda activate tabletopgen
python pipeline.py --input_image scene_image/scene_image_1.png
π‘ Critical Tip for Best Results: In Step 1 of the pipeline, we strongly recommend adjusting the Grounded-SAM-2 thresholds to ensure all object instances are correctly segmented and extracted. You can tweak the following parameters in the pipeline code:
box_thresholdtext_thresholdconfidence_threshold
View GLB Model:
Once the generation is complete, you can view the assembled 3D scene at:
output_scene/scene_{id}/scene_{id}.glb
NVIDIA Isaac Sim (Physics-based Assembly): For a scene assembly with full physical properties, use the Isaac Sim script.
- Prerequisite: Ensure NVIDIA Isaac Sim is installed (Installation Guide).
# Run the Isaac Sim visualization script
python isaac_final_scene.pyTo demonstrate the physical interactivity and realism of the generated scenes, we provide a Pick-and-Place demo using a Franka Emika Panda robot in NVIDIA Isaac Sim.
This demo showcases the robot picking and placing generated objects within the TabletopGen scenes, verifying accurate collision meshes and physical properties.
Get the Demo Kit: Due to the large size of simulation assets, the demo code and USD files are hosted externally.
How to Run:
- Download the
manipulation_demofolder from the link above. - Ensure NVIDIA Isaac Sim is installed.
- Please refer to the detailed guide in
manipulation_demo/README.mdto run the following scripts:pick_place.py: Run the interactive pick-and-place demo.collect.py: Execute the data collection pipeline.
Please scan the QR code to connect with us on WeChat and join the community for the latest updates and discussions with the authors.
We would like to express our gratitude to the following projects and services that made this work possible:
If you use this code in your research, please cite our project:
@article{wang2025tabletopgen,
title={TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image},
author={Wang, Ziqian and He, Yonghao and Yang, Licheng and Zou, Wei and Ma, Hongxuan and Liu, Liu and Sui, Wei and Guo, Yuxin and Su, Hu},
journal={arXiv preprint arXiv:2512.01204},
year={2025}
}