LSI-Faster R-CNN: An enhanced version of Faster R-CNN for joint object detection and viewpoint estimation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
caffe-fast-rcnn @ 156a392

This repository contains a modified version of the deep-learning-based object detector Faster R-CNN, created by Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun (Microsoft Research). It is a fork of their python implementation available here.

This version, lsi-faster-rcnn, has been developed by Carlos Guindel at the Intelligent Systems Laboratory research group, from the Universidad Carlos III de Madrid.

Features introduced in this fork include:

  • Training (and eventually testing) on the KITTI Object Detection Dataset.
  • Mixed external/RPN proposals.
  • Discrete viewpoint prediction.
  • Four-channel input.

The last two features are introduced in two published research papers. Please check the citation section for further details.

All the included methods can be quantitatively evaluated using the companion eval_kitti repository.


Modifications have been introduced trying to preserve the different functionalities present in the original Faster R-CNN code, which are largely configurable via parameters. Nevertheless, testing has been conducted over a limited set of combinations of parameters; it is not guaranteed in any case the proper operation under all the configuration alternatives. Pull requests fixing unfeasible configuration setups will be welcome.


This work is released under the MIT License (refer to the LICENSE file for details).

Citing this work

In case you make use of the solutions adopted in this code regarding the viewpoint estimation, please consider citing:

    author={Guindel, Carlos and Mart{\'i}n, David and Armingol, Jos{\'e} Mar{\'i}a},
    booktitle={2017 {IEEE} International Conference on Vehicular Electronics and Safety ({ICVES})},
    title={Joint object detection and viewpoint estimation using {CNN} features},

Otherwise, if you use the four-channel input solution, please consider citing:

    author={Guindel, Carlos and Mart{\'i}n, David and Armingol, Jos{\'e} Mar{\'i}a},
    editor={Moreno-D{\'i}az, Roberto and Pichler, Franz and Quesada-Arencibia, Alexis},
    title={Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments},
    booktitle={Computer Aided Systems Theory - EUROCAST 2017},
    publisher={Springer International Publishing},

You can find the original research paper presenting the Faster R-CNN approach in:

    Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
    Title = {Faster {R-CNN}: Towards Real-Time Object Detection
             with Region Proposal Networks},
    Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
    Year = {2015}


  1. Requirements: software
  2. Requirements: hardware
  3. Basic installation
  4. Demo
  5. Beyond the demo: training and testing
  6. Usage

Requirements: software

  1. Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers!

# In your Makefile.config, make sure to have this line uncommented
# Unrelatedly, it's also recommended that you use CUDNN
  1. Python packages you might not have: cython, python-opencv, easydict

Requirements: hardware

This fork has been tested with the following GPU devices: NVIDIA Tesla K40, Titan X (Pascal), Titan Xp. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the cited devices to our research group.

For reference, training the VGG16 model uses ~6G of memory in the Titan Xp. Training (and inference) could be performed with less powerful devices using smaller network architectures (ZF, VGG_CNN_M_1024).

Installation (sufficient for the demo)

  1. Clone the Faster R-CNN repository
# Make sure to clone with --recursive
git clone --recursive

The --recursive flag allows to automatically clone the caffe-fast-rcnn submodule. I use my own fork of the official repository. I try to keep it updated with the upstream Caffe repository as far as possible; that is specially relevant when major changes are introduced in some dependency (e.g. cuDNN).

  1. We'll call the directory that you cloned Faster R-CNN into FRCN_ROOT

Ignore notes 1 and 2 if you followed step 1 above.

Note 1: If you didn't clone Faster R-CNN with the --recursive flag, then you'll need to manually clone the caffe-fast-rcnn submodule:

git submodule update --init --recursive

Note 2: My caffe-fast-rcnn submodule is expected to be on the lsi-faster-rcnn branch. This will happen automatically if you followed step 1 instructions.

  1. Edit the line 141 of lib/ to reflect the CUDA compute capability of your GPU. This can be made with an editor (e.g. gedit):
cd $FRCN_ROOT/lib

The line to be edited is the arch flag. For example, for the Titan X Pascal, the following should be writen:

extra_compile_args={'gcc': ["-Wno-unused-function"],
                            'nvcc': ['-arch=sm_61',

Then, build the Cython modules.

cd $FRCN_ROOT/lib
  1. Build Caffe and pycaffe
cd $FRCN_ROOT/caffe-fast-rcnn
# Now follow the Caffe installation instructions here:

# If you're experienced with Caffe and have all of the requirements installed
# and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
  1. If you want to run our demo, please download the trained models:

This will populate the $FRCN_ROOT/data folder with lsi_models. These models were trained on KITTI.

  1. Our demo also requires to found the KITTI object dataset in $FRCN_ROOT/data/kitti/images. You will need to download the dataset from their site and then create a symbolic link to $FRCN_ROOT/data/kitti/images:
ln -s $PATH_TO_OBJECT_KITTI_DATASET $FRCN_ROOT/data/kitti/images

Please note that PATH_TO_OBJECT_KITTI_DATASET must contain, at least, the testing folder with the left color images (image_2) in it.


After successfully completing basic installation, you'll be ready to run the demo.

To run the demo


The demo performs Faster R-CNN detection and viewpoint inference using a VGG16 network trained for detection on the KITTI Object Detection Dataset.