Update SSD documentation (#987)

* Refine documentation and code.
PaddlePaddle · Jun 20, 2018 · 3257b64 · 3257b64
1 parent 894e7ac
commit 3257b64
Show file tree

Hide file tree

Showing 10 changed files with 92 additions and 175 deletions.
diff --git a/fluid/object_detection/README.md b/fluid/object_detection/README.md
@@ -4,6 +4,14 @@ The minimum PaddlePaddle version needed for the code sample in this directory is
 
 ## SSD Object Detection
 
+## Table of Contents
+- [Introduction](#introduction)
+- [Data Preparation](#data-preparation)
+- [Train](#train)
+- [Evaluate](#evaluate)
+- [Infer and Visualize](#infer-and-visualize)
+- [Released Model](#released-model)
+
 ### Introduction
 
 [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class.
@@ -19,8 +27,6 @@ SSD is readily pluggable into a wide variant standard convolutional network, suc
 
 You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download).
 
-#### PASCAL VOC Dataset
-
 If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.
 
 ```bash
@@ -30,8 +36,6 @@ cd data/pascalvoc
 
 The command `download.sh` also will create training and testing file lists.
 
-#### MS-COCO Dataset
-
 If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.
 
 ```
@@ -71,7 +75,13 @@ We will release the pre-trained models by ourself in the upcoming soon.
   python train.py --help
   ```
 
-We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achive XXX% mAP under 11point metric.
+Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped:
+   - distort: distort brightness, contrast, saturation, and hue.
+   - expand: put the original image into a larger expanded image which is initialized using image mean.
+   - crop: crop image with respect to different scale, aspect ratio, and overlap.
+   - flip: flip horizontally.
+
+We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric.
 
 ### Evaluate
 
@@ -115,4 +125,4 @@ MobileNet-v1-SSD 300x300 Visualization Examples
 
 | Model                    | Pre-trained Model  | Training data    | Test data    | mAP |
 |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
-|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
+|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | 73.32%  |
diff --git a/fluid/object_detection/README_cn.md b/fluid/object_detection/README_cn.md
@@ -4,6 +4,14 @@
 
 ## SSD 目标检测
 
+## Table of Contents
+- [简介](#简介)
+- [数据准备](#数据准备)
+- [模型训练](#模型训练)
+- [模型评估](#模型评估)
+- [模型预测以及可视化](#模型预测以及可视化)
+- [模型发布](#模型发布)
+
 ### 简介
 
 [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同，单阶段目标检测并不进行区域推荐，而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想，并且对其进行改进：在不同尺度的特征图上检测对应尺度的目标。如下图所示，SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别，SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。
@@ -19,8 +27,6 @@ SSD 可以方便地插入到任何一种标准卷积网络中，比如 VGG、Res
 
 你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)。
 
-#### PASCAL VOC 数据集
-
 如果你想在 PASCAL VOC 数据集上进行训练，请先使用下面的命令下载数据集。
 
 ```bash
@@ -30,8 +36,6 @@ cd data/pascalvoc
 
 `download.sh` 命令会自动创建训练和测试用的列表文件。
 
-#### MS-COCO 数据集
-
 如果你想在 MS-COCO 数据集上进行训练，请先使用下面的命令下载数据集。
 
 ```
@@ -70,7 +74,13 @@ cd data/coco
   python train.py --help
   ```
 
-我们使用了 RMSProp 优化算法来训练 MobileNet-SSD，batch大小为64，权重衰减系数为0.00005，初始学习率为 0.001，并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后，11point评价标准下的mAP为XXX%。
+数据的读取行为定义在 `reader.py` 中，所有的图片都会被缩放到300x300。在训练时，数据还会进行图片增强和标签增强，图片增强包括对图片本身的随机扰动、扩张和翻转，标签增强包括随机裁剪:
+   - 扰动: 扰动图片亮度、对比度、饱和度和色相。
+   - 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中，再对此图进行裁剪、缩放和翻转。
+   - 翻转: 水平翻转。
+   - 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框，再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。
+
+我们使用了 RMSProp 优化算法来训练 MobileNet-SSD，batch大小为64，权重衰减系数为0.00005，初始学习率为 0.001，并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后，11point评价标准下的mAP为73.32%。
 
 ### 模型评估
 
@@ -114,4 +124,4 @@ MobileNet-v1-SSD 300x300 预测可视化
 
 | 模型                    | 预训练模型  | 训练数据    | 测试数据    | mAP |
 |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
-|MobileNet-v1-SSD 300x300  | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | XXX%  |
+|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test   | 73.32%  |
diff --git a/fluid/object_detection/eval.py b/fluid/object_detection/eval.py
@@ -64,6 +64,7 @@ def if_exist(var):
         place=place, feed_list=[image, gt_box, gt_label, difficult])
 
     def test():
+        # switch network to test mode (i.e. batch norm test mode)
         test_program = fluid.default_main_program().clone(for_test=True)
         with fluid.program_guard(test_program):
             map_eval = fluid.evaluator.DetectionMAP(
@@ -79,12 +80,12 @@ def test():
         _, accum_map = map_eval.get_map_var()
         map_eval.reset(exe)
         for batch_id, data in enumerate(test_reader()):
-            test_map = exe.run(test_program,
-                               feed=feeder.feed(data),
-                               fetch_list=[accum_map])
+            test_map, = exe.run(test_program,
+                                feed=feeder.feed(data),
+                                fetch_list=[accum_map])
             if batch_id % 20 == 0:
-                print("Batch {0}, map {1}".format(batch_id, test_map[0]))
-        print("Test model {0}, map {1}".format(model_dir, test_map[0]))
+                print("Batch {0}, map {1}".format(batch_id, test_map))
+        print("Test model {0}, map {1}".format(model_dir, test_map))
 
     test()
 
@@ -101,9 +102,9 @@ def test():
         raise ValueError("The model path [%s] does not exist." %
                          (args.model_dir))
     if 'coco' in args.dataset:
-        data_dir = './data/coco'
+        data_dir = 'data/coco'
         if '2014' in args.dataset:
-            test_list = 'annotations/instances_minival2014.json'
+            test_list = 'annotations/instances_val2014.json'
         elif '2017' in args.dataset:
             test_list = 'annotations/instances_val2017.json'
 

diff --git a/fluid/object_detection/eval_coco_map.py b/fluid/object_detection/eval_coco_map.py
@@ -133,7 +133,7 @@ def test():
 
     data_dir = './data/coco'
     if '2014' in args.dataset:
-        test_list = 'annotations/instances_minival2014.json'
+        test_list = 'annotations/instances_val2014.json'
     elif '2017' in args.dataset:
         test_list = 'annotations/instances_val2017.json'
 

diff --git a/fluid/object_detection/images/009943.jpg b/fluid/object_detection/images/009943.jpg
diff --git a/fluid/object_detection/images/009956.jpg b/fluid/object_detection/images/009956.jpg
diff --git a/fluid/object_detection/images/009960.jpg b/fluid/object_detection/images/009960.jpg
diff --git a/fluid/object_detection/images/009962.jpg b/fluid/object_detection/images/009962.jpg
diff --git a/fluid/object_detection/infer.py b/fluid/object_detection/infer.py
@@ -34,8 +34,20 @@ def infer(args, data_args, image_path, model_dir):
     image_shape = [3, data_args.resize_h, data_args.resize_w]
     if 'coco' in data_args.dataset:
         num_classes = 91
+        # cocoapi
+        from pycocotools.coco import COCO
+        from pycocotools.cocoeval import COCOeval
+        label_fpath = os.path.join(data_dir, label_file)
+        coco = COCO(label_fpath)
+        category_ids = coco.getCatIds()
+        label_list = {
+            item['id']: item['name']
+            for item in coco.loadCats(category_ids)
+        }
+        label_list[0] = ['background']
     elif 'pascalvoc' in data_args.dataset:
         num_classes = 21
+        label_list = data_args.label_list
 
     image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
     locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
@@ -54,13 +66,16 @@ def if_exist(var):
     feeder = fluid.DataFeeder(place=place, feed_list=[image])
 
     data = infer_reader()
-    nmsed_out_v = exe.run(fluid.default_main_program(),
-                          feed=feeder.feed([[data]]),
-                          fetch_list=[nmsed_out],
-                          return_numpy=False)
-    nmsed_out_v = np.array(nmsed_out_v[0])
+
+    # switch network to test mode (i.e. batch norm test mode)
+    test_program = fluid.default_main_program().clone(for_test=True)
+    nmsed_out_v, = exe.run(test_program,
+                           feed=feeder.feed([[data]]),
+                           fetch_list=[nmsed_out],
+                           return_numpy=False)
+    nmsed_out_v = np.array(nmsed_out_v)
     draw_bounding_box_on_image(image_path, nmsed_out_v, args.confs_threshold,
-                               data_args.label_list)
+                               label_list)
 
 
 def draw_bounding_box_on_image(image_path, nms_out, confs_threshold,
@@ -93,10 +108,20 @@ def draw_bounding_box_on_image(image_path, nms_out, confs_threshold,
     args = parser.parse_args()
     print_arguments(args)
 
+    data_dir = 'data/pascalvoc'
+    label_file = 'label_list'
+
+    if not os.path.exists(args.model_dir):
+        raise ValueError("The model path [%s] does not exist." %
+                         (args.model_dir))
+    if 'coco' in args.dataset:
+        data_dir = 'data/coco'
+        label_file = 'annotations/instances_val2014.json'
+
     data_args = reader.Settings(
         dataset=args.dataset,
-        data_dir='data/pascalvoc',
-        label_file='label_list',
+        data_dir=data_dir,
+        label_file=label_file,
         resize_h=args.resize_h,
         resize_w=args.resize_w,
         mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R],