This repository contains the PyTorch implementation of our ECCV 2020 paper:
Guiding Monocular Depth Estimation Using Depth Attention-Volume
Lam Huynh,
Phong Nguyen-Ha,
Jiří Matas,
Esa Rahtu,
Janne Heikkilä
University of Oulu,
Tampere University,
Czech Technical University in Prague
| Project page | Arxiv | Demo video |
Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations. In recent works, those priors have been learned in an end-to-end manner from large datasets by using deep neural networks. In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. This is achieved by incorporating a non-local coplanarity constraint to the network with a novel attention mechanism called depth-attention volume (DAV). Experiments on two popular indoor datasets, namely NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results while using only a fraction of the number of parameters needed by the competing methods.
The pipeline of our proposed network. An image is passed through the encoder,then the non-local depth-attention module, and finally the decoder to produce the estimated depth map. The model is trained using L_attention and L_depth losses, which are described in the paper.
Details structure of the depth attention module.
Evaluation on NYU-v2 test set.
Comparison between number of parameters and model performance.
Qualitative results on NYU.
Cross-dataset evaluation on SUN-RGBD.
Result video on unseen data from the real world: