Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding.
This code is tested on Python 3.6 and Pytorch 1.0 +.
First download the ModelNet datasets and unzip inside the data/
directories as follows:
- Dodecahedron-20 [this link] (rendered from viewpoints uniformly sampled on a bounding sphere encompassing the 3D object, corresponding to virtual camera positions at the twenty vertices of a dodecahedron).
- Circle-12 [this link] (rendered from 12 virtual camera viewpoints evenly spaced around the object circumference at an elevation angle of 30 degrees).
-
Pretrain CNN model in the
train_cnn/
directory:python train_cnn.py
-
Train GMViT:
python train.py -name GMViT -num_views 20 -group_num 12 -cnn_name resnet18
-
Distillation student model:
python KD_GMViT_simple.py -name GMViT_simple -num_views 20 -group_num 12 python KD_GMViT_mini.py -name GMViT_mini -num_views 20 -group_num 12
We have provided a pre-trained model in [here] (code: bs66), achieving an Overall Accuracy (OA) of 97.77% and a mean Accuracy (mA) of 97.07% under the Dodecahedron-20 setting on the ModelNet40 dataset. Please download it and please it in models/GMViT/models/
directory.
L. Xu, Q. Cui, W. Xu, E. Chen, H. Tong, Y. Tang, Walk in views: Multi-view path aggregation graph network for 3d shape analysis, Information Fusion 103 (2024) 102131.
L. Xu, Q. Cui, R. Hong, W. Xu, E. Chen, X. Yuan, and Y. Tang, “Group multi-view transformer for 3d shape analysis with spatial encoding,” arXiv preprint arXiv:2312.16477, 2023.