Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation-CVPR 2018 Spotlight

The repository is an official implementation for the paper.
Links: [Paper][Oral Presentation]
By Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci

Installation & Setup

The code is implemented based on the Caffe framework. Please first download and install the modified caffe version. The code is tested on CUDA 8.0, cudnn 5.1, and python 2.7. The installation can follow the following instructions:
First clone the repository:

git clone 

Then build caffe and pycaffe:

cd $Caffe_ROOT
cp Makefile.config.example Makefile.config
vim Makefile.config ### changing neccessary lines to add dependancy

Data Preparation

First download KITTI raw data from the official website to the folder ./StructuredAttentionDepthEstimation/data/KITTI. To generate the training data, follow the commands:

cd ./StructuredAttentionDepthEstimation/data

The process will generate a training pair text file 'eigen_train_pairs.txt' under ./utils/filenames for use in the training phase.
For testing, the eigen split of 697 images is used.

Testing and Evaluation

Please first download the trained model from Google Drive, and put the model under ./StructuredAttentionDepthEstimation/models. The saved testing results can be also downloaded the same link. To test the trained model, follow the instructions:

cd ./StructuredAttentionDepthEstimation/prototxt
python ### generating a network definition for the deploy network
sh ### testing and evaluating the model

We refine and fuse the multi-scale features derived from different deep semantic layers (e.g. res3d, res4f, res5c layers) using the proposed MeanFieldUpdate module as follows:

    #the first meanfield updating
    MeanFieldUpdate(n, n.res3d_dec, n.res5c_dec, 1, 1, feat_num)
    MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf1, 2, 1, feat_num)
    MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf1, 3, 1, feat_num)
    #the second meanfield updating
    MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf1, 1, 2, feat_num)
    MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf2, 2, 2, feat_num)
    MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf2, 3, 2, feat_num)
    #the third meanfield updating
    MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf2, 1, 3, feat_num)
    MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf3, 2, 3, feat_num)
    MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf3, 3, 3, feat_num)
    #the fourth meanfield updating
    MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf3, 1, 4, feat_num)
    MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf4, 2, 4, feat_num)
    MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf4, 3, 4, feat_num)
    #the fifth meanfield updating
    MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf4, 1, 5, feat_num)
    MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf5, 2, 5, feat_num)
    MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf5, 3, 5, feat_num)

Our testing runs very fast, and approaches around 8 fps in nearly real-time, which is significantly faster than previous graphical model-based approaches for single image depth estimation. The testing results on KITTI are shown in the table below using both the Eigen and the Garg crop. We further improved the accuracy over the results in the paper. The table and the figure below show the qualitative and the quatitative results respectively. The results are not exactly the same as the results reported in our paper. We further improved the accuracy.

The produced visualization results can be downloaded from here.


To retrain the model, please first download the ResNet50 pretrained model on the ImageNet, and then put it under the foler ./StructuredAttentionDepthEstimation/models/pretrained_model, and rename it with ResNet-50-pratrained-model.caffemodel, which will be used as an initialization of our backbone network. To train our whole model, please follow:

cd ./StructuredAttentionDepthEstimation/prototxt
python ### generate a network definition for the training network 

The training supports multiple GPU speedup. You can modify the iter_size in the ./prototxt/solver.prototxt, the batch_size in and the gpu number in to change the overall batch size.
The # of overall batch size = # of gpus * batch_size * iter_size.

Pytorch Implementation

A Pytorch implementation of our model can be found here:


Please consider citing the following paper if the code is helpful in your research work:

  title={Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation},
  author={Xu, Dan and Wang, Wei and Tang, Hao and Liu, Hong and Sebe, Nicu and Ricci, Elisa},


Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation in CVPR 2018 (Spotlight)







No releases published


No packages published