Bach Tran,
Binh-Son Hua,
Anh Tuan Tran,
Minh Hoai
VinAI Research, Vietnam
Abstract: Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utilizing the multi-view rendering of the 3D data. Our pre-training is self-supervised by a local pixel/point level correspondence loss computed from perspective projection and a global image/point cloud level loss based on knowledge distillation, thus effectively improving upon popular point cloud networks, including PointNet, DGCNN and SR-UNet. These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks. We also analyze the benefits of synthetic and real data for pre-training, and observe that pre-training on synthetic data is also useful for high-level downstream tasks.
Details of the model architecture and experimental results can be found in our following paper.
@inproceedings{tran2022selfsup,
title={Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis},
author={Bach Tran and Binh-Son Hua and Anh Tuan Tran and Minh Hoai},
booktitle={Proceedings of the Asian Conference on Computer Vision (ACCV)},
year={2022}
}
Please CITE our paper whenever our model implementation is used to help produce published results or incorporated into other software.
The codebase is tested on
- Ubuntu
- CUDA 11.0
- MinkowskiEngine v.0.5.0
- Clone this repo:
git clone https://github.com/VinAIResearch/selfsup_pcd.git
cd selfsup_pcd
- Install dependencies:
conda env create -f environment.yml
conda activate sspcd
Download code from https://github.com/NVIDIA/MinkowskiEngine/releases/tag/v0.5.0, compile and install MinkowskiEngine.
-
Synthetic data: we evaluate our pre-trained model on two synthetic datasets that include ModelNet40 for the classification task and ShapeNetPart for the part segmentation task with official training and test sets.
-
Real data: We also evaluate our pre-trained model on real datasets. Particularly, we use ScanObjectNN with two variants (without and with background) for the classification task, S3DIS and ScanNet for the semantic segmentation task, and ScanNet and SUN RGB-D for the object detection task.
We also provide official pre-trained models.
Please follow the instruction.
Please follow the instruction.
Our source code is developed based on the below codebase:
Overall, thank you so much.
If you have any questions, please drop an email to tranxuanbach1412@gmail.com or open an issue in this repository.