Official repository for the paper "DragMesh: Interactive 3D Generation Made Easy".
Tianshan Zhang*, Zeyu Zhang*β , Hao Tang#
*Equal contribution. β Project lead. #Corresponding author.
Note
GAPartNet (link above) is the canonical dataset source for all articulated assets used in DragMesh.
teaser.mp4
If you find DragMesh helpful, please cite:
While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Current methods for articulated motion lie at a crossroads: they are either physically consistent but too slow for real-time use, or generative but violate basic kinematic constraints. We present DragMesh, a robust framework for real-time interactive 3D articulation built around a lightweight motion generation core. Our core contribution is a novel decoupled kinematic reasoning and motion generation framework. First, we infer the latent joint parameters by decoupling semantic intent reasoning (which determines the joint type) from geometric regression (which determines the axis and origin using our Kinematics Prediction Network (KPP-Net)). Second, to leverage the compact, continuous, and singularity-free properties of dual quaternions for representing rigid body motion, we develop a novel Dual Quaternion VAE (DQ-VAE). This DQ-VAE receives these predicted priors, along with the original user drag, to generate a complete, plausible motion trajectory. To ensure strict adherence to kinematics, we inject the joint priors at every layer of the DQ-VAE's non-autoregressive Transformer decoder using FiLM (Feature-wise Linear Modulation) conditioning. This persistent, multi-scale guidance is complemented by a numerically-stable cross-product loss to guarantee axis alignment. This decoupled design allows DragMesh to achieve real-time performance and enables plausible, generative articulation on novel objects without retraining, offering a practical step toward generative 3D intelligence.
- Upload the DragMesh paper and project page.
- Release the training and inference code.
- Provide GAPartNet processing pipeline and LMDB builder.
- Share checkpoints on Hugging Face.
- Create an interactive presentation.
- Publish a Hugging Face Space for browser-based manipulation.
We ship a full Conda specification in environment.yml (environment name: dragmesh). It targets Python 3.10, CUDA 12.1, and PyTorch 2.4.1. Create or update via:
conda env create -f environment.yml
conda activate dragmesh
# or update an existing env
conda env update -f environment.yml --pruneThe spec already installs trimesh, pyrender, pygltflib, viser, Objaverse, SAPIEN, pytorch3d, and tiny-cuda-nn. If you prefer a minimal setup, install those packages manually before running the scripts.
Chamfer distance kernels are required for the VAE loss. Clone and build the upstream project:
git clone https://github.com/ThibaultGROUEIX/ChamferDistancePytorch.git
cd ChamferDistancePytorch
python setup.py install
cd ..- Visit https://pku-epic.github.io/GAPartNet/ and download the articulated assets for the categories listed in
config/category_split_v2.json. - Arrange files so that each object folder contains
mobility_annotation_gapartnet.urdf,meta.json, and textured meshes (*.obj). Example:data/gapartnet/<object_id>/ |- mobility_annotation_gapartnet.urdf |- meta.json |- textured_objs/*.obj - Convert to LMDB for fast training IO:
python utils/build_lmdb.py \ --dataset_root data/gapartnet \ --output_prefix data/dragmesh \ --config config/category_split_v2.json \ --num_frames 16 \ --num_points 4096 # Produces data/dragmesh_train.lmdb and data/dragmesh_val.lmdb - Use
utils/balanced_dataset_utils.get_motion_type_weightswithWeightedRandomSamplerif you need balanced revolute/prismatic sampling.
python scripts/train_vae_v2.py \
--lmdb_train_path data/dragmesh_train.lmdb \
--lmdb_val_path data/dragmesh_val.lmdb \
--data_split_json_path config/category_split_v2.json \
--output_dir outputs/vae \
--num_epochs 300 \
--batch_size 16 \
--latent_dim 256 \
--num_frames 16 \
--mesh_recon_weight 10.0 \
--cd_weight 30.0 \
--kl_weight 0.001 \
--kl_anneal_epochs 80 \
--use_tensorboard --use_wandbpython scripts/train_predictor.py \
--lmdb_train_path data/dragmesh_train.lmdb \
--lmdb_val_path data/dragmesh_val.lmdb \
--data_split_json_path config/category_split_v2.json \
--output_dir outputs/kpp \
--batch_size 32 \
--num_epochs 200 \
--encoder_type attention \
--head_type decoupled \
--predict_type TrueBoth scripts log to TensorBoard and optionally Weights & Biases. Check modules/loss.py and modules/predictor_loss.py for objective details.
python inference_animation.py \
--dataset_root data/gapartnet \
--checkpoint best_model.pth \
--sample_id 40261 \
--output_dir results_deterministic \
--num_samples 5 \
--num_frames 16Outputs MP4, GIF, and animated GLB per object. If you plan to process a large dataset using dual-quaternion ground truth (no manual drags), prefer this script because running only KPP predictions frame-by-frame may introduce cumulative drift that eventually breaks physical alignment.
python inference_pipeline.py \
--mesh_file assets/cabinet.obj \
--mask_file assets/cabinet_vertex_labels.npy \
--mask_format vertex \
--drag_point 0.12,0.48,0.05 \ # example: x,y,z point on the movable part
--drag_vector 0.0,0.0,0.2 \ # example: direction+magnitude of the drag
--manual_joint_type revolute \
--kpp_checkpoint best_model_kpp.pth \
--vae_checkpoint best_model.pth \
--output_dir outputs/cabinet_demo \
--num_samples 3Supply drag points/vectors directly through the CLI (no viewer UI). Use --manual_joint_type revolute or --manual_joint_type prismatic to force a specific motion family when needed. If you omit the manual override, the pipeline first trusts KPP-Net and, when --llm_endpoint + --llm_api_key are provided, backs off to the LLM-based classifier described in inference_pipeline.py. Outputs share the same MP4/GIF/GLB format as the batch pipeline.
- GIF/MP4 exports rely on
pyrenderandimageio. For headless servers, setPYOPENGL_PLATFORM=osmesa. inference_animation.pyalso exports animated GLB files for direct use in GLTF viewers.- For additional visualization tooling (e.g., rerun or Blender scripts), see
inference_animation.pyandinference_pipeline.py.
| Scenario | Description |
|---|---|
| Drawer opening | Translational motion predicted entirely from drag cues. |
| Microwave door | Revolute joint inference with FiLM conditioned motion generation. |
| Bucket handle | High curvature rotations showing the benefit of dual quaternions. |
Translational drags
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
Rotational drags
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
Self-spin / free-spin
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
predicted_z_0.mp4 |
| Path | Content |
|---|---|
modules/model_v2.py |
Dual Quaternion VAE (encoder, decoder, FiLM Transformer). |
modules/predictor.py |
KPP-Net architecture. |
modules/data_loader_v2.py |
GAPartNet parsing and dual quaternion labels. |
utils/balanced_dataset_utils.py |
LMDB dataset builder and balanced sampling utilities. |
scripts/train_vae_v2.py, scripts/train_predictor.py |
Training entry points. |
inference_animation*.py, inference_pipeline.py |
Inference pipelines (batch and interactive). |
ChamferDistancePytorch/ |
CUDA kernels for Chamfer distance and auxiliary metrics. |
DragMesh/
βββ assets/ # Logos, teaser figures, future demo media
β βββ dragmesh_logo.png
β βββ teaser.png
checkpoints/
β βββ dqvae.pth
β βββ kpp.pth
βββ ChamferDistancePytorch/ # CUDA/C++ Chamfer distance implementation (build with setup.py)
βββ config/
β βββ category_split_v2.json # GAPartNet in-domain split definition
βββ modules/
β βββ model_v2.py # Dual Quaternion VAE architecture
β βββ predictor.py # KPP-Net for kinematic reasoning
β βββ loss.py # VAE objectives (Chamfer, dual quaternions, constraints)
β βββ predictor_loss.py # Loss terms for KPP-Net
β βββ data_loader_v2.py # GAPartNet loader + dual quaternion ground truth builder
βββ scripts/
β βββ train_vae_v2.py # Training loop for the VAE motion prior
β βββ train_predictor.py # Training loop for KPP-Net
βββ utils/
β βββ balanced_dataset_utils.py # LMDB dataset class + balanced sampling helper
β βββ dataset_utils.py # Category-aware dataset wrappers
β βββ build_lmdb.py # CLI to build LMDBs from GAPartNet folders
βββ partnet/
β βββ Hunyuan3D-Part/ # External resources (P3-SAM, XPart docs)
βββ results_deterministic/ # Placeholder for inference outputs (MP4/GIF/GLB)
βββ inference_animation.py # Batch evaluation + GLB export
βββ inference_animation_kpp.py # Dataset-driven animation tests (legacy interface)
βββ inference_glb.py # Helper for converting trajectories to GLB
βββ inference_pipeline.py # Interactive mesh manipulation pipeline
βββ environment.yml # Conda environment (name: dragmesh)
βββ README.md
We thank the GAPartNet team for the articulated dataset, and upstream projects such as ChamferDistancePytorch, Objaverse, SAPIEN, and PyTorch3D for their open-source contributions.
