A PyTorch inplementation of RefineDet
May 24, 2019

RefineDet in PyTorch

This is a PyTorch implementation of Single-Shot Refinement Neural Network for Object Detection that is a work by Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei and Stan Z. Li in CVPR2018. The official and original Caffe code can be found here. This implementation mainly refers the official RefineDet in Caffe sfzhang15/RefineDet and a PyTorch implementation of SSD amdegroot/ssd.pytorch. Its sublibrary(libs/dataset/datasets) to process datasets is obtained from jwyang/faster-rcnn.pytorch with minor modification. A simple post in Chinese about this project is here.


  • Python 2.7
  • PyTorch 0.3.1
  • CUDA 8.0 or higher

Data Preparation

  • PASCAL_VOC 07+12: You can follow the instructions in py-faster-rcnn or jwyang/faster-rcnn.pytorch to prepare VOC datasets, i.e., putting the data or creating soft links in the folder data/.

  • COCO: You can use COCO2014 to train your model by the same setting as PASCAL_VOC07+12.

The directory trees in data/ in my projects are as following:

├── coco -> /root/dataset/coco
├── VOCdevkit2007
│   └── VOC2007 -> /root/dataset/voc/VOCdevkit/VOC2007
└── VOCdevkit2012
    └── VOC2012 -> /root/dataset/voc/VOCdevkit/VOC2012


You can train a RefineDet detector with VGG16 or ResNet101 as a base network. The pretrained models can be downloaded from vgg16_reducedfc.pth and resnet101.pth. By default, the directories for VGG16 weights is assumed as following:

mkdir -p weights/vgg16
cd weights/vgg16

For ResNet101:

mkdir -p weights/resnet101
cd weights/resnet101
wget -O resnet101.pth

To train a RefineDet model with PASCAL VOC07+12, run as:

python --dataset voc --input_size 320 --network vgg16 --basenet vgg16_reducedfc.pth --save_folder "weights/vgg16"

Change dataset to "coco", if you want to use COCO2014.


To evaluate a trained model, run as:

python --input_size 512 --dataset voc --network vgg16 --model_path your/weights/path


The project trys to reproduce the performance of RefineDet in Caffe, but there are some gaps. If you have any suggetion to promote this reproduction, please leave a message in the issues.

1). PASCAL VOC07+12 (Train/Test: VOC07+12/VOC07)

Method Backbone Input Size mAP FPS
SSD VGG16 300 x 300 77.2 46
SSD VGG16 512 x 512 79.8 19
RefineDet(Official) VGG16 320 x 320 80.0 40.3
RefineDet(Official) VGG16 512 x 512 81.8 24.1
RefineDet(Our) VGG16 320 x 320 78.4 ~45
RefineDet(Our) VGG16 512 x 512 79.8 ~30

The speed is evaluted on P40. The trained models producing the above performance can be downloaded from Dropbox at vgg16_refinedet320_voc and vgg16_refinedet512_voc, or from BaiduPan at vgg16_refinedet320_voc(password: 3wj4) and vgg16_refinedet512_voc(password: xpd1).

2). COCO2014 (Train/Test: trainval115k/minval5k)

Method Backbone Input Size mAP FPS
SSD321 ResNet101 321 x 321 28.0 -
RefineDet(Official) ResNet101 320 x 320 32.0 -
RefineDet(Our) ResNet101 320 x 320 31.7 ~11

The speed is evaluated on P40. The trained model can be download from Dropbox at resnet101_refinedet320_coco or from BaiduPan at resnet101_refinedet320_coco(password: iyvw). Training is failed with NAN loss when input size is 512x512, and I am seeking reasons.


You can run demo/demo.ipynb with jupyter notebook to visualize detection results.



