The official code repository for the paper:
Boosting Multi-view Stereo with Late Cost Aggregation
Jiang Wu, Rui Li, Yu Zhu, Jinqiu Sun, and Yanning Zhang [arXiv]
[TL; DR] We present a simple channel-wise MVS cost aggregation that achieves competitive performances to SOTAs with only minor adjustments to the CasMVSNet.
Pairwise matching cost aggregation is a crucial step for modern learning-based Multi-view Stereo (MVS). Prior works adopt an early aggregation scheme, which adds up pairwise costs into an intermediate cost. However, we analyze that this process can degrade informative pairwise matchings, thereby blocking the depth network from fully utilizing the original geometric matching cues. To address this challenge, we present a late aggregation approach that allows for aggregating pairwise costs throughout the network feed-forward process, achieving accurate estimations with only minor changes of the plain CasMVSNet. Instead of building an intermediate cost by weighted sum, late aggregation preserves all pairwise costs along a distinct view channel. This enables the succeeding depth network to fully utilize the crucial geometric cues without loss of cost fidelity. Grounded in the new aggregation scheme, we propose further techniques addressing view order dependence inside the preserved cost, handling flexible testing views, and improving the depth filtering process. Despite its technical simplicity, our method improves significantly upon the baseline cascade-based approach, achieving comparable results with state-of-the-art methods with favorable computation overhead.
# create a clean conda environment from scratch
conda create -n MVS python=3.8
conda activate MVS
# install required packages
pip install -r requirements.txt
Download preprocessed DTU training data and Depth raw for training, and DTU testing data for testing. Organize the data like below:
dtu_training
├── Cameras
├── Depths
├── Depths_raw
└── Rectified
dtu_testing
├── Cameras
├── scan1
├── scan2
├── ..
Download the preprocessed BlendedMVS and unzip it to as the dataset folder like below:
blendedmvs
├── 5a0271884e62597cdee0d0eb
├── 5a3ca9cb270f0e3f14d0eddb
├── ...
├── all_list.txt
├── training_list.txt
├── ...
You can download the Tanks and Temples here and unzip it. For the intermediate set, unzip "short_range_caemeras_for_mvsnet.zip" and replace the camera parameter files inside the "cam" folder with them.
tanksandtemples
├── advanced
│ ├── Auditorium
│ ├── ...
└── intermediate
├── Family
├── ...
Replace the path of the DTU training dataset in the "datapath" and then run the script:
bash ./script/dtu_train.sh
Specify the path for the BlendMVS dataset and the save path for the DTU training model, then proceed to run the script.
bash ./scripts/blendedmvs_finetune.sh
You can also download our pre-trained and fine-tuned models.
Specify the "datapath" and execute the testing script:
bash ./script/dtu_test.sh
Note that the point cloud fusion strategy we employed is the improved dynamic filtering strategy. For more details, please refer to the paper.
bash ./scripts/tank_test.sh
The point cloud filtering parameters for each scene are stored in ./filter/tank_test_config.py. Fine-tuning these parameters might lead to improved results.
Methods | Acc. | Comp. | Overall. |
---|---|---|---|
CasMVSNet(baseline) | 0.325 | 0.385 | 0.355 |
UniMVSNet | 0.352 | 0.278 | 0.315 |
TransMVSNet | 0.312 | 0.298 | 0.305 |
Ours(Pre-trained model) | 0.335 | 0.258 | 0.297 |
Methods | Mean | Auditorium | Ballroom | Courtroom | Museum | Palace | Temple |
---|---|---|---|---|---|---|---|
CasMVSNet(baseline) | 31.12 | 19.81 | 38.46 | 29.10 | 43.87 | 27.36 | 28.11 |
UniMVSNet | 38.96 | 28.33 | 44.36 | 39.74 | 52.89 | 33.80 | 34.63 |
TransMVSNet | 37.00 | 24.84 | 44.59 | 34.77 | 46.49 | 34.69 | 36.62 |
Ours(Pre-trained model) | 40.12 | 29.40 | 45.61 | 38.55 | 51.69 | 35.16 | 41.87 |
Our code is based on UniMVSNet and DH2C-MVSNet. We express gratitude for these works.