# **How PASCAL VOC2007 dataset comes into data input blobs to network in Fast R-CNN**

    By Jincheng Su@HikVision, Shanghai, 2017/07/12

## Preparing datasets

1. Download the training, validation, test data and VOCdevkit
```shell
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar (jcsu: seems to be optional)
```    
    But these links are invalid inside the WALL. -_-

2. Extract all of these tars into one directory named VOCdevkit
```shell
    tar xvf VOCtrainval_06-Nov-2007.tar
    tar xvf VOCtest_06-Nov-2007.tar
    tar xvf VOCdevkit_08-Jun-2007.tar
```    
3. It should have this basic structure (jcsu: The VOCcode seems to be optional)
```shell
    $ VOCdevkit/                           # development kit
    $ VOCdevkit/VOCcode/                   # VOC utility code
    $ VOCdevkit/VOC2007                    # image sets, annotations, etc.
    $ ls VOC2007
      Annotations  ImageSets  JPEGImages  SegmentationClass  SegmentationObject
    # ... and several other directories ...
```
4. Create symlinks for the PASCAL VOC dataset
```shell
    cd $FRCN_ROOT/data
    ln -s $VOCdevkit VOCdevkit2007
```    
    Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects.

5. [Optional] follow similar steps to get PASCAL VOC 2010 and 2012


## What is in the PASCAL VOC 2007 dataset?

Directory `SegmentationClass` seems to be unrelated to Fast R-CNN.

### The `JPEGImages` directory
```shell
    $ cd VOCdevkit/VOC2007
    $ ls
    Annotations  ImageSets  JPEGImages  SegmentationClass
    $ ls JPEGImages
    000001.jpg  001642.jpg  003283.jpg  004924.jpg  006565.jpg
    000002.jpg  001643.jpg  003284.jpg  004925.jpg  006566.jpg
    000003.jpg  001644.jpg  003285.jpg  004926.jpg  006567.jpg
    ...
```
### The `ImageSets` directory
```shell
    $ ls ImageSets
    Layout  Main  Segmentation
    $ ls ImageSets/Main
    aeroplane_test.txt cat_test.txt person_test.txt
    aeroplane_train.txt cat_train.txt person_train.txt
    aeroplane_trainval.txt cat_trainval.txt person_trainval.txt
    ...
    $ vim ImageSets/Main/trainval.txt
    000005
    000007
    000009
    000012
    000016
    ...
```
### The `Annotation` directory
```shell
    $ ls Annotations
    000001.xml  001662.xml  003323.xml  004984.xml  006645.xml 
    000002.xml  001663.xml  003324.xml  004985.xml  006646.xml
    000003.xml  001664.xml  003325.xml  004986.xml  006647.xml
    ...
    $ vim Annotations/000001.xml
```
```xml
    <annotation>
        <folder>VOC2007</folder>
        <filename>000001.jpg</filename>
        <source>
            <database>The VOC2007 Database</database>
            <annotation>PASCAL VOC2007</annotation>
            <image>flickr</image>
            <flickrid>341012865</flickrid>
        </source>
        <owner>
            <flickrid>Fried Camels</flickrid>
            <name>Jinky the Fruit Bat</name>
        </owner>
        <size>
            <width>353</width>
            <height>500</height>
            <depth>3</depth>
        </size>
        <segmented>0</segmented>
        <object>
            <name>dog</name>
            <pose>Left</pose>
            <truncated>1</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>48</xmin>
                <ymin>240</ymin>
                <xmax>195</xmax>
                <ymax>371</ymax>
            </bndbox>
        </object>
        <object>
            <name>person</name>
            <pose>Left</pose>
            <truncated>1</truncated>
            <difficult>0</difficult>
            <bndbox>
                <xmin>8</xmin>
                <ymin>12</ymin>
                <xmax>352</xmax>
                <ymax>498</ymax>
            </bndbox>
        </object>
    </annotation>
```

Directory `Annotation` contains many `.xml` files, one for each image, which contain ground-truth bounding box annotations.

## [to do] Using selective search method to extract bounding boxes

   ## Creating `imdb` file

<font color=red>**NOTE: To follow the code below, please switch to directory `FRCN_ROOT/lib/datasets`.**</font>

The basic `imdb` database structure is defined under `FRCN_ROOT/lib/datasets`, mainly in module `pascal_voc.py`. 

Let's begin from it!

```shell
    $ vim pascal_voc.py
```
```python
    import datasets
    import datasets.pascal_voc
    import os
    #...
    import subprocess
    
    class pascal_voc(datasets.imdb):
        def __init__(self, image_set, year, devkit_path=None):
            datasets.imdb.__init__(self, 'voc_' + year + '_' + image_set)
            self._year = year
            self._image_set = image_set
            self._devkit_path = self._get_default_path() if devkit_path is None \
                                else devkit_path
            self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)
            self._classes = ('__background__', # always index 0
                             'aeroplane', 'bicycle', 'bird', 'boat',
                             'bottle', 'bus', 'car', 'cat', 'chair',
                             'cow', 'diningtable', 'dog', 'horse',
                             'motorbike', 'person', 'pottedplant',
                             'sheep', 'sofa', 'train', 'tvmonitor')
            self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes)))
            self._image_ext = '.jpg'
            self._image_index = self._load_image_set_index()
            # Default to roidb handler
            self._roidb_handler = self.selective_search_roidb

            # PASCAL specific config options
            self.config = {'cleanup'  : True,
                           'use_salt' : True,
                           'top_k'    : 2000}

            assert os.path.exists(self._devkit_path), \
                    'VOCdevkit path does not exist: {}'.format(self._devkit_path)
            assert os.path.exists(self._data_path), \
                    'Path does not exist: {}'.format(self._data_path)
    # ...
```

The `imdb` structure of PASCAL VOC 2007 dataset is mainly defined by this class `pascal_voc`. We can see that there are 21 classes including `background`.

To understand how it construct the `imdb` database, let's dive into the code!

### 1. def selective_search_roidb(self)
```python
"""
Return the database of selective search regions of interest.
Ground-truth ROIs are also included.

This function loads/saves from/to a cache file to speed up future calls.
"""
```

In [1]:
import os
import scipy.io as sio

## Loading `gt_roidb` => `self.gt_roidb()`
## If we are running this function for the first time:
## In `self.gt_roidb()` => `self._load_pascal_annotation(image_index)`
## Returned a dict:
##     {'boxes' : boxes,
##      'gt_classes': gt_classes,
##      'gt_overlaps' : overlaps,
##      'flipped' : False}

## But here we assume that the `gt_roidb` has been saved in a `.pkl` file,
## and we are loading the `gt_roidb` from it.

### 2. def gt_roidb(self)
```python
"""
Return the database of ground-truth regions of interest.

This function loads/saves from/to a cache file to speed up future calls.
"""
```

In [2]:
import cPickle as cpk
data_root = '../../data/'
gt_file = os.path.join(data_root, 'cache/voc_2007_trainval_gt_roidb.pkl')
if os.path.exists(gt_file):
    with open(gt_file, 'rb') as fid:
        gt_roidb = cpk.load(fid)
print "voc_2007_trainval gt roidb loaded from file:\n'{}'".format(gt_roidb_filename)

NameError: name 'gt_roidb_filename' is not defined

In [None]:
### Let's digress for a while to have a look at how the `gt_roidb` looks like
print "type(gt_roidb) = ", type(gt_roidb)

In [None]:
print "len(gt_roidb) = ", len(gt_roidb)
print "gt_roidb[0] = \n", gt_roidb[0]
print "\ngt_roidb[1] = \n", gt_roidb[1]

In [None]:
print "gt_roidb[0]['gt_overlaps']:\n", gt_roidb[0]['gt_overlaps']
print "\ngt_roidb[1]['gt_overlaps']:\n", gt_roidb[1]['gt_overlaps']

Now that we have loaded the ground-truth annotation, next step is to load the bounding boxes proposal extracted by selective search. Let's go to this line:            

`ss_roidb = self._load_selective_search_roidb(gt_roidb)`


### 3. `def _load_selective_search_roidb(self, gt_roidb):`

In [3]:
import os
import scipy.io as sio

## load selective search data
ss_file = os.path.join(data_root, 'selective_search_data/voc_2007_trainval.mat')

assert os.path.exists(ss_file), 'Selective search data not found at: {}'.format(filename)

raw_data = sio.loadmat(ss_file)['boxes'].ravel()

## raw_data.shape = (5011,), raw_data[0].shape = (2443, 4)

box_list = []

for i in xrange(raw_data.shape[0]):
    box_list.append(raw_data[i][:, (1, 0, 3, 2)] - 1)

## load gt_roidb

**Now that we have both `box_list` and `gt_roidb`, we are ready to diving into function `create_roidb_from_box_list(box_list, gt_roidb)`**

### 4. def create_roidb_from_box_list(self, box_list, gt_roidb):

In [4]:
## `num_images` of trainval set is 5011, 
## see 'VOC2007/ImageSets/Main.trainval.txt'
import numpy as np
num_images = 5011
num_classes = 21
assert len(box_list) == num_images, "Number of boxes must match number of ground-truth images"

## The original code is in a for-loop,
## but we only focus on one iteration of the loop.
boxes = box_list[0]
num_boxes = boxes.shape[0]
print "boxes.shape = ", boxes.shape
print "numb_boxes = ", num_boxes
overlaps = np.zeros((num_boxes, num_classes), dtype = np.float32)
print "overlaps.shape = ", overlaps.shape

boxes.shape =  (2443, 4)
numb_boxes =  2443
overlaps.shape =  (2443, 21)


In [5]:
gt_boxes = gt_roidb[0]['boxes']
gt_classes = gt_roidb[0]['gt_classes']
print 'gt_boxes.shape = ', gt_boxes.shape
print 'gt_classes.shape = ', gt_classes.shape

import sys
sys.path.append('../utils')
from cython_bbox import bbox_overlaps
help(bbox_overlaps)

gt_boxes.shape =  (5, 4)
gt_classes.shape =  (5,)
Help on built-in function bbox_overlaps:

bbox_overlaps(...)
    Parameters
    ----------
    boxes: (N, 4) ndarray of float
    query_boxes: (K, 4) ndarray of float
    Returns
    -------
    overlaps: (N, K) ndarray of overlap between boxes and query_boxes



In [6]:
gt_overlaps = bbox_overlaps(boxes.astype(np.float), gt_boxes.astype(np.float))
print 'gt_overlaps.shape = ', gt_overlaps.shape
print 'gt_overlaps[:5,:]:\n', gt_overlaps[:5, :]

gt_overlaps.shape =  (2443, 5)
gt_overlaps[:5,:]:
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]


** Now `gt_overlaps` is a `numpy.ndarray` with shape (2443, 5), one row for each ss-box, one colum for each ground-truth box. Each column is the IoUs between boxes and the ground-truth box**.

In [7]:
argmaxes = gt_overlaps.argmax(axis = 1)
maxes = gt_overlaps.max(axis = 1)
print 'len(argmaxes) = ', len(argmaxes)
print 'len(maxes) = ', len(maxes)
help(gt_overlaps.argmax)
help(gt_overlaps.max)

len(argmaxes) =  2443
len(maxes) =  2443
Help on built-in function argmax:

argmax(...)
    a.argmax(axis=None, out=None)
    
    Return indices of the maximum values along the given axis.
    
    Refer to `numpy.argmax` for full documentation.
    
    See Also
    --------
    numpy.argmax : equivalent function

Help on built-in function max:

max(...)
    a.max(axis=None, out=None)
    
    Return the maximum along a given axis.
    
    Refer to `numpy.amax` for full documentation.
    
    See Also
    --------
    numpy.amax : equivalent function



In [8]:
I = np.where(maxes > 0)

In [9]:
I = I[0]
print I.shape

(1111,)


In [10]:
overlaps[I, gt_classes[argmaxes[I]]] = maxes[I]

**Well, now  `overlaps` is a `numpy.ndarray` with shape (2443, 5),  recording the maximum IoUs of each ss-box with repect to the 21 classes**.

In [11]:
import scipy
overlaps = scipy.sparse.csr_matrix(overlaps)
ss_roidb = []
ss_roidb.append({'boxes' : boxes,
              'gt_classes' : np.zeros((num_boxes,), dtype = np.int32),
              'gt_overlaps' : overlaps,
              'flipped': False})

### 5. Back to `def selective_search_roidb(self)`

Above is what the two line mainly about:
```python
 gt_roidb = self.gt_roidb()
 ss_roidb = self._load_selective_search_roidb(gt_roidb)
```

The next line is: 
```python
roidb = datasets.imdb.merge_roidbs(gt_roidb, ss_roidb)
```
This line is to merge the `gt_roidb` and `ss_roidb` by stacking one above the other to make the final `roidb`.
```python
a = gt_roidb
b = ss_roidb
## def merge_roidbs(a, b)
    assert len(a) == len(b)
    for i in xrange(len(a)):
        a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes']))
        a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'],
                                        b[i]['gt_classes']))
        a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'],
                                                   b[i]['gt_overlaps']])
    return a                                                   
```
Note that this code requires `len(gt_roidb) == len(ss_roidb)`, we cannot run this code. Because `ss_roidb` only contains the the data for one image (possibly the first image).

Anyway, let's just run one iteration:

In [12]:
a = gt_roidb[0]
b = ss_roidb[0]
a['boxes'] = np.vstack((a['boxes'], b['boxes']))
a['gt_classes'] = np.hstack((a['gt_classes'], b['gt_classes']))
a['gt_overlaps'] = scipy.sparse.vstack([a['gt_overlaps'], b['gt_overlaps']])

roidb = []
roidb.append(a)

**Now let's we are ready to look into code under directory `./lib/roi_data_layer/`**
---
### 6. `def prepare_roidb(imdb)` in `roidb.py` (called in `train.py`)
```python
 """Enrich the imdb's roidb by adding some derived quantities that
    are useful for training. This function precomputes the maximum
    overlap, taken over ground-truth boxes, between each ROI and
    each ground-truth box. The class with maximum overlap is also
    recorded.
"""
```

In [13]:
imdb = roidb[0]
print "imdb = \n", imdb
#print "\nimdb['gt_overlaps'][:5] = ", imdb['gt_overlaps'][:5]

imdb = 
{'boxes': array([[262, 210, 323, 338],
       [164, 263, 252, 371],
       [  4, 243,  66, 373],
       ..., 
       [349, 363, 370, 374],
       [349, 363, 371, 374],
       [349, 363, 377, 374]], dtype=uint16), 'gt_overlaps': <2448x21 sparse matrix of type '<type 'numpy.float32'>'
	with 1116 stored elements in Compressed Sparse Row format>, 'gt_classes': array([9, 9, 9, ..., 0, 0, 0], dtype=int32), 'flipped': False}


In [14]:
roidb[0]['image'] = 'imdb.image_path_at(i)' # path to the real `.jpg` image
# need `gt_overlaps` as a dence array for argmax
gt_overlaps = roidb[0]['gt_overlaps'].toarray()
# max overlap with gt over classes (columns)
max_overlaps = gt_overlaps.max(axis = 1)
# gt class that had the max overlap
max_classes = gt_overlaps.argmax(axis = 1)

roidb[0]['max_classes'] = max_classes
roidb[0]['max_overlaps'] = max_overlaps

# sanity checks
# max overlap of 0 => class should be zero (background)
zero_inds = np.where(max_overlaps == 0)[0]
assert all(max_classes[zero_inds] == 0)

# max overlap > 0 => class should not be zero (must be a fg class)
nonzero_inds = np.where(max_overlaps > 0)[0]
assert all(max_classes[nonzero_inds] != 0)

### 7. **def add_bbox_regression_targets(roidb)** in `roidb.py` (called in `train.py`)

In [15]:
"""Add information needed to train bounding-box regressors."""
assert len(roidb) > 0
assert 'max_classes' in roidb[0], 'Did you call `prepare_roidb first?'

num_images = len(roidb) ## 1 
num_classes = roidb[0]['gt_overlaps'].shape[1]
# for im_i in xrange(num_images):
im_i = 0
rois = roidb[im_i]['boxes']
max_overlaps = roidb[im_i]['max_overlaps']
max_classes = roidb[im_i]['max_classes']
## next cell => roidb[im_i]['bbox_targets'] = _compute_targets(rois, max_overlaps, max_classes)
print max_overlaps[np.where(max_overlaps > 0.9)]

[ 1.          1.          1.          1.          1.          0.94051784]


In [16]:
rois = rois.astype(np.float, copy = False)
# Inidces of ground-truth RoIs
gt_inds = np.where(max_overlaps == 1)[0]
# Indices of examples for which we try to make predictions
BBOX_THRESH = 0.5
ex_inds = np.where(max_overlaps > BBOX_THRESH)[0]

# Get IoU overlap between each ex RoI and gt RoI
import cython_bbox
ex_gt_overlaps = cython_bbox.bbox_overlaps(rois[ex_inds, :], rois[gt_inds, :])
print ex_gt_overlaps

[[ 1.          0.          0.          0.26967221  0.04045853]
 [ 0.          1.          0.          0.03106951  0.        ]
 [ 0.          0.          1.          0.          0.        ]
 [ 0.26967221  0.03106951  0.          1.          0.07799909]
 [ 0.04045853  0.          0.          0.07799909  1.        ]
 [ 0.07851504  0.          0.          0.07600596  0.54537122]
 [ 0.          0.          0.60328947  0.          0.        ]
 [ 0.01548975  0.          0.          0.08071749  0.56129032]
 [ 0.03827751  0.          0.          0.06703411  0.71949602]
 [ 0.00971997  0.          0.          0.05467996  0.57777778]
 [ 0.01295996  0.          0.          0.05731257  0.6       ]
 [ 0.05024577  0.          0.          0.06568403  0.63931624]
 [ 0.07531944  0.          0.          0.08171886  0.61538462]
 [ 0.01765121  0.          0.          0.0522897   0.51587302]
 [ 0.0355434   0.          0.          0.05325624  0.58490566]
 [ 0.03859098  0.          0.          0.05192878  0.52

In [18]:
## Find which gt ROI each ex ROI has max overlap with:
## this will be the ex ROI's gt target

EPS = 1e-14
gt_assignment = ex_gt_overlaps.argmax(axis=1)
gt_rois = rois[gt_inds[gt_assignment], :]
ex_rois = rois[ex_inds, :]

ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + EPS
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + EPS
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights

gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + EPS
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + EPS
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights

targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = np.log(gt_widths / ex_widths)
targets_dh = np.log(gt_heights / ex_heights)

targets = np.zeros((rois.shape[0], 5), dtype=np.float32)
targets[ex_inds, 0] = max_classes[ex_inds]
targets[ex_inds, 1] = targets_dx
targets[ex_inds, 2] = targets_dy
targets[ex_inds, 3] = targets_dw
targets[ex_inds, 4] = targets_dh

roidb[im_i]['bbox_targets'] = targets

### 8. **Now, let's go to `def get_minibatch(roidb, num_classes)` in `minibatch.py`**

In [19]:
num_images = 2  # sample two images
rois_per_image = 128 / num_images # sample 128/2 = 64 rois per image
fg_rois_per_image = np.round(0.25 * 64) # sample 64 * .25 = 16 rois as fg
num_classes = 21 # 21 classes (including background)

### 9. def _sample_rois(roidb, fg_rois_per_image, rois_per_image, num_classes):
```python
"""Generate a random sample of RoIs comprising foreground and background
    examples.
"""
```

In [20]:
labels = roidb[0]['max_classes']
overlaps = roidb[0]['max_overlaps']
rois = roidb[0]['boxes']
# cfg.Train.FG_THRESH = 0.5
FG_THRESH = 0.5
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(overlaps >= FG_THRESH)[0]
print fg_inds

[   0    1    2    3    4 1208 1229 1278 1280 1281 1282 1283 1299 1319 1320
 1321 1379 1381 1396 1397 1398 1413 1447 1456 1477 1478 1555 1556 1567 1745
 1747 1752 1756 1759 1760 1761 1762 1767 1779 1792 1844 1848 1849 1851 1852
 1857 1880 1881 1929 1983 2072]


In [21]:
# Guard against the case when an image has fewer than `fg_rois_per_image` foreground RoIs
fg_rois_per_this_image = int(np.minimum(fg_rois_per_image, fg_inds.size))

# Select foreground regions without replacement
import numpy.random as npr
if fg_inds.size > 0:
    fg_inds = npr.choice(fg_inds, size = fg_rois_per_this_image, replace=False)
print fg_inds
help(npr.choice)

[1792 1556 1278 1396 1555 1851    4 1456 1282 1929 1848 1447 1849 1381 1779
 1857]
Help on built-in function choice:

choice(...)
    choice(a, size=None, replace=True, p=None)
    
    Generates a random sample from a given 1-D array
    
            .. versionadded:: 1.7.0
    
    Parameters
    -----------
    a : 1-D array-like or int
        If an ndarray, a random sample is generated from its elements.
        If an int, the random sample is generated as if a was np.arange(n)
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    replace : boolean, optional
        Whether the sample is with or without replacement
    p : 1-D array-like, optional
        The probabilities associated with each entry in a.
        If not given the sample assumes a uniform distribution over all
        entries in a.
    
    Retu

In [22]:
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI]
BG_THRESH_LO = 0.1
BG_THRESH_HI = 0.5
bg_inds = np.where((overlaps < BG_THRESH_HI) &
                   (overlaps > BG_THRESH_LO))[0]
# Compute number of background RoIs to take from this image
# (guarding against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = np.minimum(bg_rois_per_this_image, bg_inds.size)
# Sample background regions without replacement
if bg_inds.size > 0:
    bg_inds = npr.choice(bg_inds, size = bg_rois_per_this_image, replace = False)
    
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)

labels = labels[keep_inds]
# Clamp labels for the backgound RoIs to 0
labels[fg_rois_per_this_image:] = 0

print labels

[9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


In [23]:
overlaps = overlaps[keep_inds]
rois = rois[keep_inds]
## What left behind is how to get `bbox_targets` and `bbox_loss_weights`

In [24]:
bbox_target_data = roidb[0]['bbox_targets'][keep_inds, :]

### 10. def _get_bbox_regression_labels(bbox_target_data, num_classes)
``` python
"""Bounding-box regression targets are stored in a compact form in the
roidb.

This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets). The loss weights
are similarly expanded.

Returns:
    bbox_target_data (ndarray): N x 4K blob of regression targets
    bbox_loss_weights (ndarray): N x 4K blob of loss weights
"""
```


In [25]:
print num_classes

21


In [26]:
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_loss_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]
for ind in inds:
    cls = clss[ind]
    start = 4 * cls
    end = start + 4
    ## the following three lines are added by jcsu
    ## to convert `ind`, `start` and `end` from type `np.float64` to `np.int`
    ind = int(ind)
    start = int(start)
    end = int(end)
    bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
    bbox_loss_weights[ind, start:end] = [1., 1., 1., 1.]


**<font color = red> All five requred network input blobs: ` labels, overlaps, rois, bbox_targets, bbox_loss_weights` are finally obtained!!</font>**

Arriving at this point, the data input layer `RoIDataLayer` becomes trivial.

[to do]A Summary
---