Instance-aware Semantic Segmentation via Multi-task Network Cascades
Python Cuda Shell Other
Latest commit 3f43cc0 Jan 9, 2017 @daijifeng001 committed on GitHub Update README.md

README.md

Instance-aware Semantic Segmentation via Multi-task Network Cascades

By Jifeng Dai, Kaiming He, Jian Sun

This python version is re-implemented by Haozhi Qi when he was an intern at Microsoft Research.

Introduction

MNC is an instance-aware semantic segmentation system based on deep convolutional networks, which won the first place in COCO segmentation challenge 2015, and test at a fraction of a second per image. We decompose the task of instance-aware semantic segmentation into related sub-tasks, which are solved by multi-task network cascades (MNC) with shared features. The entire MNC network is trained end-to-end with error gradients across cascaded stages.

MNC was initially described in a CVPR 2016 oral paper.

This repository contains a python implementation of MNC, which is ~10% slower than the original matlab implementation.

This repository includes a bilinear RoI warping layer, which enables gradient back-propagation with respect to RoI coordinates.

Misc.

This code has been tested on Linux (Ubuntu 14.04), using K40/Titan X GPUs.

The code is built based on py-faster-rcnn.

MNC is released under the MIT License (refer to the LICENSE file for details).

Citing MNC

If you find MNC useful in your research, please consider citing:

@inproceedings{dai2016instance,
    title={Instance-aware Semantic Segmentation via Multi-task Network Cascades},
    author={Dai, Jifeng and He, Kaiming and Sun, Jian},
    booktitle={CVPR},
    year={2016}
}

Main Results

training data test data mAP^r@0.5 mAP^r@0.7 time (K40) time (Titian X)
MNC, VGG-16 VOC 12 train VOC 12 val 65.0% 46.3% 0.42sec/img 0.33sec/img

Installation guide

  1. Clone the MNC repository:

    # Make sure to clone with --recursive
    git clone --recursive https://github.com/daijifeng001/MNC.git
  2. Install Python packages: numpy, scipy, cython, python-opencv, easydict, yaml.

  3. Build the Cython modules and the gpu_nms, gpu_mask_voting modules by:

    cd $MNC_ROOT/lib
    make
  4. Install Caffe and pycaffe dependencies (see: Caffe installation instructions for official installation guide)

    Note: Caffe must be built with support for Python layers!

    # In your Makefile.config, make sure to have this line uncommented
    WITH_PYTHON_LAYER := 1
    # CUDNN is recommended in building to reduce memory footprint
    USE_CUDNN := 1
  5. Build Caffe and pycaffe:

    cd $MNC_ROOT/caffe-mnc
    # If you have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe

Demo

First, download the trained MNC model.

./data/scripts/fetch_mnc_model.sh

Run the demo:

cd $MNC_ROOT
./tools/demo.py

Result demo images will be stored to data/demo/.

The demo performs instance-aware semantic segmentation with a trained MNC model (using VGG-16 net). The model is pre-trained on ImageNet, and finetuned on VOC 2012 train set with additional annotations from SBD. The mAP^r of the model is 65.0% on VOC 2012 validation set. The test speed per image is ~0.33sec on Titian X and ~0.42sec on K40.

Training

This repository contains code to end-to-end train MNC for instance-aware semantic segmentation, where gradients across cascaded stages are counted in training.

Preparation:

  1. Run ./data/scripts/fetch_imagenet_models.sh to download the ImageNet pre-trained VGG-16 net.
  2. Download the VOC 2007 dataset to ./data/VOCdevkit2007
  3. Run ./data/scripts/fetch_sbd_data.sh to download the VOC 2012 dataset together with the additional segmentation annotations in SBD to ./data/VOCdevkitSDS.

1. End-to-end training of MNC for instance-aware semantic segmentation

To end-to-end train a 5-stage MNC model (on VOC 2012 train), use experiments/scripts/mnc_5stage.sh. Final mAP^r@0.5 should be ~65.0% (mAP^r@0.7 should be ~46.3%), on VOC 2012 validation.

cd $MNC_ROOT
./experiments/scripts/mnc_5stage.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng 1701 RNG_SEED 1701

2. Training of CFM for instance-aware semantic segmentation

The code also includes an entry to train a convolutional feature masking (CFM) model for instance aware semantic segmentation.

@inproceedings{dai2015convolutional,
    title={Convolutional Feature Masking for Joint Object and Stuff Segmentation},
    author={Dai, Jifeng and He, Kaiming and Sun, Jian},
    booktitle={CVPR},
    year={2015}
}
2.1. Download pre-computed MCG proposals

Download and process the pre-computed MCG proposals.

cd $MNC_ROOT
./data/scripts/fetch_mcg_data.sh
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db train --output data/cache/voc_2012_train_mcg_maskdb/
python ./tools/prepare_mcg_maskdb.py --para_job 24 --db val --output data/cache/voc_2012_val_mcg_maskdb/

Resulting proposals would be at folder data/MCG/.

2.2. Train the model

Run experiments/scripts/cfm.sh to train on VOC 2012 train set. Final mAP^r@0.5 should be ~60.5% (mAP^r@0.7 should be ~42.6%), on VOC 2012 validation.

cd $MNC_ROOT
./experiments/scripts/cfm.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng 1701 RNG_SEED 1701

3. End-to-end training of Faster-RCNN for object detection

Faster-RCNN can be viewed as a 2-stage cascades composed of region proposal network (RPN) and object detection network. Run script experiments/scripts/faster_rcnn_end2end.sh to train a Faster-RCNN model on VOC 2007 trainval. Final mAP^b should be ~69.1% on VOC 2007 test.

cd $MNC_ROOT
./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] VGG16 [--set ...]
# GPU_ID is the GPU you want to train on
# --set ... allows you to specify fast_rcnn.config options, e.g.
#   --set EXP_DIR seed_rng1701 RNG_SEED 1701