Skip to content

GaryZhu1996/UMIFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMIFormer

This repository contains the source code for the paper UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction.

Highlight

Architecture

Performance

Methods 1 view 2 views 3 views 4 views 5 views 8 views 12 views 16 views 20 views
3D-R2N2 0.560 / 0.351 0.603 / 0.368 0.617 / 0.372 0.625 / 0.378 0.634 / 0.382 0.635 / 0.383 0.636 / 0.382 0.636 / 0.382 0.636 / 0.383
AttSets 0.642 / 0.395 0.662 / 0.418 0.670 / 0.426 0.675 / 0.430 0.677 / 0.432 0.685 / 0.444 0.688 / 0.445 0.692 / 0.447 0.693 / 0.448
Pix2Vox++ 0.670 / 0.436 0.695 / 0.452 0.704 / 0.455 0.708 / 0.457 0.711 / 0.458 0.715 / 0.459 0.717 / 0.460 0.718 / 0.461 0.719 / 0.462
GARNet 0.673 / 0.418 0.705 / 0.455 0.716 / 0.468 0.722 / 0.475 0.726 / 0.479 0.731 / 0.486 0.734 / 0.489 0.736 / 0.491 0.737 / 0.492
GARNet+ 0.655 / 0.399 0.696 / 0.446 0.712 / 0.465 0.719 / 0.475 0.725 / 0.481 0.733 / 0.491 0.737 / 0.498 0.740 / 0.501 0.742 / 0.504
EVolT - / - - / - - / - 0.609 / 0.358 - / - 0.698 / 0.448 0.720 / 0.475 0.729 / 0.486 0.735 / 0.492
LegoFormer 0.519 / 0.282 0.644 / 0.392 0.679 / 0.428 0.694 / 0.444 0.703 / 0.453 0.713 / 0.464 0.717 / 0.470 0.719 / 0.472 0.721 / 0.472
3D-C2FT 0.629 / 0.371 0.678 / 0.424 0.695 / 0.443 0.702 / 0.452 0.702 / 0.458 0.716 / 0.468 0.720 / 0.475 0.723 / 0.477 0.724 / 0.479
3D-RETR
(3 view)
0.674 / - 0.707 / - 0.716 / - 0.720 / - 0.723 / - 0.727 / - 0.729 / - 0.730 / - 0.731 / -
3D-RETR* 0.680 / - 0.701 / - 0.716 / - 0.725 / - 0.736 / - 0.739 / - 0.747 / - 0.755 / - 0.757 / -
UMIFormer 0.6802 / 0.4281 0.7384 / 0.4919 0.7518 / 0.5067 0.7573 / 0.5127 0.7612 / 0.5168 0.7661 / 0.5213 0.7682 / 0.5232 0.7696 / 0.5245 0.7702 / 0.5251
UMIFormer+ 0.5672 / 0.3177 0.7115 / 0.4568 0.7447 / 0.4947 0.7588 / 0.5104 0.7681 / 0.5216 0.7790 / 0.5348 0.7843 / 0.5415 0.7873 / 0.5451 0.7886 / 0.5466
* The results in this row are derived from models that train individually for the various number of input views.

Demo

Cite this work

@article{zhu2023umiformer,
  title={UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction},
  author={Zhu, Zhenwei and Yang, Liying and Li, Ning and Jiang, Chaohao and Liang, Yanyan},
  journal={arXiv preprint arXiv:2302.13987},
  year={2023}
}

Datasets

We use the ShapeNet in our experiments, which are available below:

Pretrained Models

The pretrained models on ShapeNet are available as follows:

Please download them and put into ./pths/

Prerequisites

Clone the Code Repository

git clone https://github.com/GaryZhu1996/UMIFormer

Install Python Dependencies

cd UMIFormer
conda env create -f environment.yml

Modify the path of datasets

Modify __C.DATASETS.SHAPENET.RENDERING_PATH and __C.DATASETS.SHAPENET.VOXEL_PATH in config.py to the correct path of ShapeNet dataset.

3D Reconstruction Model

For training, please use the following command:

CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py

For testing, please follow the steps below:

  1. Update the setting of __C.CONST.WEIGHTS in config.py as the path of the reconstruction model;
  2. Run the following command to evaluate the performance of the model when facing the number of input views defined by __C.CONST.N_VIEWS_RENDERING in config.py:
CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py --test
  1. Run the following command to evaluate the performance of the model when facing various numbers of input views mentioned in the paper:
CUDA_VISIBLE_DEVICES=gpu_ids python -m torch.distributed.launch --nproc_per_node=num_of_gpu runner.py --batch_test

Our Other Works on Multi-View 3D Reconstruction

@article{zhu2023garnet,
  title={GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff},
  author={Zhu, Zhenwei and Yang, Liying and Lin, Xuxin and Yang, Lin and Liang, Yanyan},
  journal={Pattern Recognition},
  pages={109674},
  year={2023},
  publisher={Elsevier}
}
@InProceedings{Yang_2023_ICCV,
    author    = {Yang, Liying and Zhu, Zhenwei and Lin, Xuxin and Nong, Jian and Liang, Yanyan},
    title     = {Long-Range Grouping Transformer for Multi-View 3D Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {18257-18267}
}

License

This project is open sourced under MIT license.

About

Official implementation of "UMIFormer: Mining the Correlations between Similar Tokens for Multi-View 3D Reconstruction" [ICCV 2023]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages