Jiawei Weng1,*, Saining Zhang1,*,†, Zhenxin Diao2,*, Peishuo Li1, Henghaofan Zhang2, Junhao Chen2, Hao Zhao2,†
1Nanyang Technological University, Singapore 2Tsinghua University, China
*Equal contribution. †Corresponding author.
PartFlow is a feedforward 3D editing network that edits an existing 3D asset to match a target edit image — no per-asset optimisation, no 3D mask at inference. We train it on Pxform, a high-quality 3D editing dataset with 100K+ consistent before/after pairs across seven edit types, grounding edits in semantic 3D parts.
- Feedforward — one forward pass per edit
- Semantic-part grounded — trained on Pxform's part-level pairs
- Mask-free at inference — only needs the source asset + a target image
- Two-stage flow — sparse-structure edit ➜ structured-latent edit
PartFlow edits in two stages, conditioning a pretrained 3D generative prior (TRELLIS) on the source asset's latent and a target edit image. Each stage is a controlled flow model with a zero-linear gated reference branch and a mask-aware training loss:
- Stage 1 — Sparse-structure flow. Inputs the source SS latent + edit condition, predicts the edited 16³ voxel structure.
- Stage 2 — Structured-latent (SLAT) flow. Inputs the source SLAT mapped
to the edited coords + edit condition, predicts the edited SLAT, which the
TRELLIS decoders turn into a textured
edit.glb.
PartFlow reuses the TRELLIS runtime (same CUDA extensions, same frozen DINOv2 / SS / SLAT decoders). Set up TRELLIS first, then add PartFlow on top. Tested with Python 3.10, PyTorch 2.5.0, CUDA 12.4.
1. Set up the TRELLIS environment. Follow the official
TRELLIS installation guide
to create the conda env and build the CUDA extensions (spconv,
flash-attn, kaolin, diff_gaussian_rasterization, nvdiffrast,
diffoctreerast). For convenience, an equivalent one-liner is bundled here:
. ./setup.sh --new-env --basic --flash-attn --diffoctreerast --spconv \
--mipgaussian --kaolin --nvdiffrast2. Install PartFlow's extra Python dependencies into the same env:
pip install -r requirements.txtpython download_weights.py # -> ./weights/{stage1_ss,stage2_slat}/Pulls the two trained stage models from
ART-3D/PartFlow_models.
Inference reads pre-encoded inputs. Each case is a directory:
<case_dir>/
ori_ss_latents.npz # key `mean`: float32 [8, 16, 16, 16] — source sparse-structure latent
ori_latents.npz # `coords` [N,3] int, `feats` [N,8] f32 — source structured latent (SLAT)
edit_img.png # the target edit image (RGB or RGBA)
case_meta.json # optional metadata (prompt, edit type, ...)
ori_ss_latents.npz / ori_latents.npz are the TRELLIS latents of the
source asset; produce them with the standard TRELLIS image-to-3D encoder.
Ground-truth edit_* files, if present, are ignored by inference.
# single case
python inference.py --input examples/mod_glass_disc_table --output_dir outputs
# a whole directory of cases
python inference.py --input /path/to/pxform/cases --output_dir outputs
# useful flags
# --steps 50 flow-sampling steps
# --cfg_strength 0.0 classifier-free guidance (0 = condition only)
# --manifest ids.json restrict to a list of case ids
# --skip_existing resume a partial runEach case writes outputs/<edit_id>/edit.glb and pred_slat.npz.
PartFlow/
├── inference.py two-stage inference pipeline + CLI
├── dataset.py PxformDataset (per-case loader)
├── download_weights.py fetch weights from Hugging Face
├── configs/ Stage 1 / Stage 2 model configs
├── examples/ one ready-to-run example case
├── trellis/ TRELLIS backbone + PartFlow stage models
├── assets/ README figures
├── setup.sh CUDA-extension installer
└── requirements.txt pure-pip dependencies
@article{weng2026partflow,
title = {Feedforward 3D Editing Learns from Semantic-Part Transformation},
author = {Weng, Jiawei and Zhang, Saining and Diao, Zhenxin and Li, Peishuo and Zhang, Henghaofan and Chen, Junhao and Zhao, Hao},
journal = {arXiv preprint arXiv:2605.27351},
year = {2026}
}Built on TRELLIS.



