This is the official pytorch implementation of our CVPR2022 paper: Generalized Binary Search Network for Highly-Efficient Multi-View Stereo. In this work, we propose a novel method for highly efficient MVS that remarkably decreases the memory footprint, meanwhile clearly advancing state-of-the-art depth prediction performance. We investigate what a search strategy can be reasonably optimal for MVS taking into account of both efficiency and effectiveness. We first formulate MVS as a binary search problem, and accordingly propose a generalized binary search network for MVS. Specifically, in each step, the depth range is split into 2 bins with extra 1 error tolerance bin on both sides. A classification is performed to identify which bin contains the true depth. We also design three mechanisms to respectively handle classification errors, deal with out-of-range samples and decrease the training memory. The new formulation makes our method only sample a very small number of depth hypotheses in each step, which is highly memory efficient, and also greatly facilitates quick training convergence.
Clone repo:
git clone https://github.com/MiZhenxing/GBi-Net.git
cd GBi-Net
The code is tested with Python == 3.6, PyTorch == 1.9.0 and CUDA == 11.1 on NVIDIA GeForce RTX 3090. We recommend you to use anaconda to manage dependencies. You may need to change the torch and cuda version in the requirements.txt
according to your computer.
conda create -n gbinet python=3.6
conda activate gbinet
pip install -r requirements.txt
Download the DTU dataset pre-processed by MVSNet and extract the archive. You could use gdown to download it form Google Drive. You could refer to MVSNet for the detailed documents of the file formats.
Download the original resolution depth maps provided by YaoYao. Extract it and rename the folder to Depths_raw
.
Download the original resolution Rectified
images from the DTU website. Extract it and rename the folder to Rectified_raw
.
Merge the three folders together and you should get a dataset folder like below:
dtu
├── Cameras
├── Depths
├── Depths_raw
├── Rectified
└── Rectified_raw
Download the low-res set from BlendedMVS. Extract the file and you should get a data folder like below:
BlendedMVS
└── low_res
Download the Tanks and Temples testing set pre-processed by MVSNet. For the intermediate
subset, remember to replace the cameras by those in short_range_caemeras_for_mvsnet.zip
in the intermediate
folder, see here. You should get a dataset folder like below:
tankandtemples
├── advanced
│ ├── Auditorium
│ ├── Ballroom
│ ├── Courtroom
│ ├── Museum
│ ├── Palace
│ └── Temple
└── intermediate
├── Family
├── Francis
├── Horse
├── Lighthouse
├── M60
├── Panther
├── Playground
└── Train
We use yaml file to set options in our codes. Several key options are explained below. Other options are self-explanatory in the codes. Before running our codes, you may need to change the true_gpu
, data: root_dir
and model_path
(only for testing).
output_dir
A relative or absolute folder path for writing logs, depthmaps.true_gpu
The true GPU IDs, used for setting CUDA_VISIBLE_DEVICES in the code. You may change it to your GPU IDs.gpu
The GPU ID used in your experiment. If true_gpu: "5, 6". Then you could use gpu: [0], gpu: [1], or gpu: [0, 1]max_depth
Max depth of the binary tree in the code, same as thestage
number described in our paper. For historical reasons, thestage
number in the code means the number of 3D networks.model_path
The checkpoint file path used for testing.stage_num
The number of 3D networks.depth2stage
The correspondence between iterations and 3D networks.data: root_dir
A relative or absolute folder path for training or testing data. In order to successfully run the codes, you may need to change it to your data folder.fusion: output_dir
A relative or absolute folder path for writing point clouds.
Train the model on DTU dataset with random crop augmentation
python train_gbinet.py --cfg configs/train_dtu_crop.yaml
Train the model on DTU dataset without random crop augmentation
python train_gbinet.py --cfg configs/train_dtu.yaml
Train the model on BlendedMVS dataset
python train_gbinet.py --cfg configs/train_blended.yaml
We have provided several checkpoints in the checkpoints
folder. You could use test_gbinet.py
to reconstruct depthmaps and point clouds with these checkpoints. To reproduce the DTU results in our paper, run commands below:
python test_gbinet.py --cfg configs/test_dtu_crop.yaml
python test_gbinet.py --cfg configs/test_dtu.yaml
After you get the point clouds, you could follow the instructions in DTU website and also the PatchmatchNet to quantitatively evaluate the point clouds.
To reproduce the Tanksandtemples results in our paper, run commands below:
python test_gbinet.py --cfg configs/test_tanks_intermediate.yaml
python test_gbinet.py --cfg configs/test_tanks_advanced.yaml
After you get the point clouds, you could submit them to the Tanksandtemples website for quantitative evaluatation.
Our code is distributed under the MIT License. See LICENSE
file for more information.
@inproceedings{mi2022generalized,
title={Generalized Binary Search Network for Highly-Efficient Multi-View Stereo},
author={Zhenxing Mi and Chang Di and Dan Xu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2022}
}
If you have any questions, please raise an issue or email to Zhenxing Mi (zmiaa@connect.ust.hk
).
Our code follows several awesome repositories. We appreciate them for making their codes available to public.