Project Page | Paper | Hugging Face | Twitter
🔥🔥🔥DiT-3D is a novel Diffusion Transformer for 3D shape generation, which can directly operate the denoising process on voxelized point clouds using plain Transformers.
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Shentong Mo, Enze Xie, Ruihang Chu, Lanqing Hong, Matthias Nießner, Zhenguo Li
arXiv 2023.
Make sure the following environments are installed.
python==3.6
pytorch==1.7.1
torchvision==0.8.2
cudatoolkit==11.0
matplotlib==2.2.5
tqdm==4.32.1
open3d==0.9.0
trimesh=3.7.12
scipy==1.5.1
Install PyTorchEMD by
cd metrics/PyTorchEMD
python setup.py install
cp build/**/emd_cuda.cpython-36m-x86_64-linux-gnu.so .
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
Or please simply run
pip install -r requirements.txt
For generation, we use ShapeNet point cloud, which can be downloaded here.
Pretrained models can be downloaded here.
Note that this pre-trained model is based on Small with a patch size of 4. We reported the XL models to the main table in our paper for final comparisons.
Testing this S/4 model, you should get performance close to 56.31, 55.82, 47.21, and 50.75 for 1-NNA-CD, 1-NNA-EMD, COV-CD, and COV-EMD.
Our DiT-3D supports multiple configuration settings:
- voxel sizes: 16, 32, 64
- patch dimensions: 2, 4, 8
- model complexity: Small (S), Base (B), Large (L) and Extra Large (XL)
For training the DiT-3D model (Small, patch dim 4) with a voxel size of 32 on chair, please run
$ python train.py --distribution_type 'multi' \
--dataroot /path/to/ShapeNetCore.v2.PC15k/ \
--category chair \
--experiment_name /path/to/experiments \
--model_type 'DiT-S/4' \
--bs 16 \
--voxel_size 32 \
--lr 1e-4 \
--use_tb
# for using window attention, please add flags below
--window_size 4 --window_block_indexes '0,3,6,9'
Please check more training scripts in the scripts folder.
For testing and visualization on chair using the DiT-3D model (S/4, no window attention) with voxel size of 32, please run
$ python test.py --dataroot /path/to/ShapeNetCore.v2.PC15k/ \
--category chair \
--model_type 'DiT-S/4' \
--voxel_size 32 \
--model MODEL_PATH
For point clouds rendering, we use mitsuba for visualization.
If you find this repository useful, please cite our paper:
@article{mo2023dit3d,
title = {DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation},
author = {Shentong Mo and Enze Xie and Ruihang Chu and Lewei Yao and Lanqing Hong and Matthias Nießner and Zhenguo Li},
journal = {arXiv preprint arXiv: 2307.01831},
year = {2023}
}
This repo is inspired by DiT and PVD. Thanks for their wonderful works.