faster-rcnn aeon integration

use cache_dir for saving VGG weights skip serialization of dataloader object refactor voc_eval to skip temporary file write update unit tests for aeon reorganize faster-rcnn example use config files, add kitti dataset
NervanaSystems · Nov 15, 2016 · ca28fd9 · ca28fd9
1 parent f352419
commit ca28fd9
Show file tree

Hide file tree

Showing 16 changed files with 1,173 additions and 1,398 deletions.
diff --git a/examples/faster-rcnn/NOTICE b/examples/faster-rcnn/NOTICE
@@ -4,13 +4,13 @@ This directory uses the open sourced code in the following file:
 voc_eval.py
 generate_anchors.py
 util.py
-# 
+#
 # --------------------------------------------------------
 # Fast R-CNN
 # Copyright (c) 2015 Microsoft
 # Licensed under The MIT License [see LICENSE for details]
 # Written by Ross Girshick
 # --------------------------------------------------------
-The mAP evaluation script and various util functions are from:
+The mAP evaluation script and various util functions are adapted from:
 https://github.com/rbgirshick/py-faster-rcnn/
 
diff --git a/examples/faster-rcnn/README.md b/examples/faster-rcnn/README.md
@@ -1,60 +1,91 @@
-## Model
+## Faster-RCNN
 
-This example demonstrates how to train and test a faster R-CNN model using PASCAL VOC dataset.
-
-The script will download the PASCAL dataset and ingest and provide the data for training and inference.
+This example demonstrates how to train and test the Faster R-CNN model on the [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) dataset. This object localization model learns to detect objects in natural scenes and provide bounding boxes and category information for each object
 
 Reference:
 
-"Faster R-CNN"\
-http://arxiv.org/abs/1506.01497\
-https://github.com/rbgirshick/py-faster-rcnn
+"Faster R-CNN"http://arxiv.org/abs/1506.01497, https://github.com/rbgirshick/py-faster-rcnn
 
-### Model script
-#### train.py
+### Data preparation
 
-Trains a Faster-RCNN model to do object localization using PASCAL VOC dataset.
+Note: This example requires installing our new dataloader: [aeon](https://github.com/NervanaSystems/aeon). For more information, see the [aeon documentation](http://aeon.nervanasys.com/index.html/)
 
-By default, the faster R-CNN model has several convolution and linear layers initialized from a pre-trained VGG16 model, and this script will download the VGG model from neon model zoo and load the weights for those layers. If the script is given --model_file, it will continue training the Faster R-CNN from the given model file. 
+First, download and unzip the PASCALVOC 2007 [training](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) and [testing](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar) datasets to a local directory, which we call `PASCAL_DATA_PATH`. These datasets consist of images of scenes and corresponding annotation files with bounding box and category information for each object in the scene.
+
+Then, run the `ingest_pascalvoc.py` script to ingest the data:
 
-Usage:
 ```
-python examples/faster-rcnn/train.py -r0 -e7 -s faster_rcnn.pkl -vv
-````
+python ingest_pascal.py --input_dir $PASCAL_DATA_PATH
+```
+The above script will:
+
+1. Convert the annotations from the XML format to the json format expected by our dataloader. The converted json files to the folders `Annotations-json` and `Annotations-json-inference`. When training the model, we exclude objects with the 'difficult' metadata tag. For evaluating the model however, the 'difficult' objects are included (following the above reference), so we create separate folders for the two conditions.
+
+2. Write manifest files for the training and testing sets. These are written to `$PASCAL_DATA_PATH`
+
+3. Write a configuration file to pass to neon. The config file is written to the `faster_rcnn` folder as `pascalvoc.cfg`. The config file contains the paths to the manifest files, as well as some other dataset-specific settings. For example:
+
+```
+manifest = [train:/usr/local/data/VOCdevkit/VOC2007/trainval.csv, val:/usr/local/data/VOCdevkit/VOC2007/val.csv]
+epochs = 14
+height = 1000
+width = 1000
+batch_size = 1
+```
 
-Notes:
 
-1. The training currently runs 1 image in each minibatch.
+#### Training
 
-2. The original caffe model goes through 40000 iteration (mb) of training, with
-1 images per minibatch (iteration), but 2 iterations per weight updates.
+To train the model on the PASCALVOC 2007 dataset, use:
+```
+python examples/faster-rcnn/train.py -c <path-to-config-file> --verbose --rng_seed 0 -s frcn_model.prm
+````
+
+The above command will train the model for 14 epochs (~70K iterations), saving the model to the file `frcn_model.prm`. Note that the training uses a minibatch of 1 image.
 
-3. The model converges after training for 7 epochs.
+By default, the Faster R-CNN model has several convolution and linear layers that are initialized from a pre-trained VGG16 model. These VGG weights will be automatically downloaded from the [neon model zoo](https://github.com/NervanaSystems/ModelZoo) and saved in `$PASCAL_DATA_PATH/pascalvoc_cache/`.
+
+Note: the config file passes its contents to the python script as command-line arguments. The equivalent command by passing in the arguments directly is:
+```
+python examples/faster-rcnn/train.py --manifest train:$PASCAL_DATA_PATH/VOC2007/trainval.csv \
+--manifest val:$PASCAL_DATA_PATH/VOC2007/val.csv \
+-e 14 --height 1000 --width 1000 --batch_size 1 --verbose --rng_seed 0 -s frcn_model.prm
+```
 
-4. The dataset can be cached as the preprocessed file and re-use if the same
-configuration of the dataset is used again. The cached file by default ~/nervana/data
-
-### inference.py
+### Testing
 
-Test a trained Faster-RCNN model to do object detection using PASCAL VOC dataset.
+To evaluate the trained model using the Mean Average Precision (MAP) metric, use the below command.
 
 Usage:
 ```
-    python examples/faster-rcnn/inference.py --model_file faster_rcnn.pkl
+    python examples/faster-rcnn/inference.py -c <path-to-config-file> --model_file frcn_model.prm
 ```
-Notes:
 
-1. This test currently runs 1 image at a time.
+A fully trained model should yield a MAP of >69%. The mAP evaluation script is adapted from: https://github.com/rbgirshick/py-faster-rcnn/
 
-2. The dataset can be cached as the preprocessed file and re-use that if the same
-configuration of the dataset is used again. 
+### Other files
 
-3. The mAP evaluation script is adapted from:
-https://github.com/rbgirshick/py-faster-rcnn/
+This folder includes several other key files, which we describe here:
+- `faster_rcnn.py`: Functions for creating the Faster R-CNN network and transforming the output to bounding box predictions.
+- `roi_pooling.py`: ROI-pooling layer.
+- `proposal_layer.py`: Proposal layer.
+- `objectlocalization.py`: Dataset-specific configurations and settings.
 
+Several utility functions are also included:
+- `voc_eval.py`: computes the MAP on the voc dataset.
+- `util.py`: Bounding box calculations and non-max suppression.
+- `generate_anchors.py`: Generate anchor boxes.
+- `convert_xml_to_json.py`: Converts PASCAL XML format to json format.
 
 ### Tests
-There are a few unit tests for components of the model. It is setup based on the py.test framework. To run the tests,
+There are a few unit tests for components of the model, set up using the py.test framework. To run these tests, use the below command. The unit tests require defining the environment variable
 ```
 py.test examples/faster-rcnn/tests
 ```
+
+### Other datasets
+
+To extend Faster-RCNN to other datasets, write a script to ingest the data by converting the annotations into json format, and generate a manifest file according to the specifications in our [aeon documentation](http://aeon.nervanasys.com/index.html/). As an example, we included the ingest script for the KITTI dataset `ingest_kitti.py` and the configuration class `KITTI` in `objectlocalization.py`.
+
+
+
diff --git a/examples/faster-rcnn/convert_xml_to_json.py b/examples/faster-rcnn/convert_xml_to_json.py
@@ -0,0 +1,149 @@
+#!/usr/bin/python
+
+import json
+import glob
+import collections
+import os
+from os.path import join
+import xml.etree.ElementTree as et
+from collections import defaultdict
+import argparse
+
+
+# http://stackoverflow.com/questions/7684333/converting-xml-to-dictionary-using-elementtree
+def etree_to_dict(t):
+    d = {t.tag: {} if t.attrib else None}
+    children = list(t)
+    if children:
+        dd = defaultdict(list)
+        for dc in map(etree_to_dict, children):
+            for k, v in dc.iteritems():
+                dd[k].append(v)
+        d = {t.tag: {k: v[0] if len(v) == 1 else v for k, v in dd.iteritems()}}
+    if t.attrib:
+        d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
+    if t.text:
+        text = t.text.strip()
+        if children or t.attrib:
+            if text:
+                d[t.tag]['#text'] = text
+        else:
+            d[t.tag] = text
+    return d
+
+
+def validate_metadata(jobj, file):
+    boxlist = jobj['object']
+    if not isinstance(boxlist, collections.Sequence):
+        print('{0} is not a sequence').format(file)
+        return False
+
+    index = 0
+    for box in boxlist:
+        if 'part' in box:
+            parts = box['part']
+            if not isinstance(parts, collections.Sequence):
+                print('parts {0} is not a sequence').format(file)
+                return False
+        index += 1
+    return True
+
+
+def convert_xml_to_json(input_path, output_path, difficult):
+
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+    onlyfiles = glob.glob(join(input_path, '*.xml'))
+    onlyfiles.sort()
+    for file in onlyfiles:
+        outfile = join(output_path, os.path.basename(file))
+        outfile = os.path.splitext(outfile)[0] + '.json'
+        trimmed = parse_single_file(join(input_path, file), difficult)
+        if validate_metadata(trimmed, file):
+            result = json.dumps(trimmed, sort_keys=True, indent=4, separators=(',', ': '))
+            f = open(outfile, 'w')
+            f.write(result)
+        else:
+            print('error parsing metadata {0}').format(file)
+
+
+def parse_single_file(path, difficult):
+    tree = et.parse(path)
+    root = tree.getroot()
+    d = etree_to_dict(root)
+    trimmed = d['annotation']
+    olist = trimmed['object']
+    if not isinstance(olist, collections.Sequence):
+        trimmed['object'] = [olist]
+        olist = trimmed['object']
+    size = trimmed['size']
+
+    # Add version number to json
+    trimmed['version'] = {'major': 1, 'minor': 0}
+
+    # convert all numbers from string representation to number so json does not quote them
+    # all of the bounding box numbers are one based so subtract 1
+    size['width'] = int(size['width'])
+    size['height'] = int(size['height'])
+    size['depth'] = int(size['depth'])
+    width = trimmed['size']['width']
+    height = trimmed['size']['height']
+    for obj in olist:
+        obj['difficult'] = int(obj['difficult']) != 0
+        obj['truncated'] = int(obj['truncated']) != 0
+        box = obj['bndbox']
+        box['xmax'] = int(box['xmax']) - 1
+        box['xmin'] = int(box['xmin']) - 1
+        box['ymax'] = int(box['ymax']) - 1
+        box['ymin'] = int(box['ymin']) - 1
+        if 'part' in obj:
+            for part in obj['part']:
+                box = part['bndbox']
+                box['xmax'] = float(box['xmax']) - 1
+                box['xmin'] = float(box['xmin']) - 1
+                box['ymax'] = float(box['ymax']) - 1
+                box['ymin'] = float(box['ymin']) - 1
+        xmax = box['xmax']
+        xmin = box['xmin']
+        ymax = box['ymax']
+        ymin = box['ymin']
+        if xmax > width - 1:
+            print('xmax {0} exceeds width {1}').format(xmax, width)
+        if xmin < 0:
+            print('xmin {0} exceeds width {1}').format(xmin, width)
+        if ymax > height - 1:
+            print('ymax {0} exceeds width {1}').format(ymax, height)
+        if ymin < 0:
+            print('ymin {0} exceeds width {1}').format(ymin, height)
+
+    # exclude difficult objects
+    if not difficult:
+        trimmed['object'] = [o for o in trimmed['object'] if not o['difficult']]
+
+    return trimmed
+
+
+def main(args):
+    input_path = args.input
+    output_path = args.output
+    parse_file = args.parse
+
+    if parse_file:
+        print(parse_file)
+        parsed = parse_single_file(parse_file, args.difficult)
+        json1 = json.dumps(parsed, sort_keys=True, indent=4, separators=(',', ': '))
+        print(json1)
+    elif input_path:
+        convert_xml_to_json(input_path, output_path, args.difficult)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="convert xml to json for pascalvoc dataset")
+    parser.add_argument('-i, --input', dest='input', help='input directory with xml files.')
+    parser.add_argument('-o, --output', dest='output', help='output directory of json files.')
+    parser.add_argument('-p, --parse', dest='parse', help='parse a single xml file.')
+    parser.add_argument('--difficult', dest='difficult', action='store_true',
+                        help='include objects with the difficult tag. Default is to exclude.')
+
+    args = parser.parse_args()
+    main(args)