# Object Detection using Tensorflow Object Detection API

This notebook prepares the dataset for the tensorflow object detection API as well as providing some notes on running and using the files generated by this notebook. This notebook assumes that the format of the data to be converted is similar to the format provided by the RPA team.

## Import the modules

Make sure that the ```models/research``` is added into your ```PYTHONPATH```

In [1]:
import narau as nr
import xmltodict
import json
import os
import tensorflow as tf
from object_detection.protos.string_int_label_map_pb2 import StringIntLabelMap, StringIntLabelMapItem

## Define data related constants
*   ```SRC_*```: Paths related to the dataset provided by the RPA team
*   ```DST_*```: Save locations of the created files for the object detection API
*   ```TRAIN_SPLIT```: The part of the data used for training
*   ```CLASSES```: The classes/annotations expected in the dataset

In [2]:
SRC_PATH = os.path.expanduser('~/Documents/data/rpa')
SRC_ANNOTATION_PATH = os.path.join(SRC_PATH, 'annotations')
SRC_IMAGES_PATH = os.path.join(SRC_PATH, 'images')

DST_LABEL_PATH = 'rpa_label.pbtxt'
DST_TRAIN_PATH = 'rpa_train.tfrecord'
DST_DEV_PATH = 'rpa_dev.tfrecord'

TRAIN_SPLIT = 0.8
CLASSES = ['textbox']

## Create a protobuf text that defines the labels
Create a protobuf text that contains the labels. Each label must be defined using the ```StringIntLabelMapItem``` which are passed ```StringIntLabelMap```.

In [3]:
classmap = {cls:idx for idx, cls in enumerate(CLASSES, 1)}
labelmap = StringIntLabelMap(item=[StringIntLabelMapItem(id=id_, name=name) 
                             for name, id_ in classmap.items()])
with tf.gfile.GFile(DST_LABEL_PATH, 'w') as f:
    f.write(str(labelmap))

{'textbox': 1}

## Verify that the dimensions of the images

The image dimensions are checked to make sure they still lie within the image and that the boxes dimensions makes sense

In [6]:
def check_dimensions(height, width, xmin, xmax, ymin, ymax):
    checks = [
        xmin <= width,
        xmax <= width,
        ymin <= height,
        ymax <= height,
        xmin <= xmax,
        ymin <= ymax, 
    ]
    return all(checks)

## Convert the image and its annotation to a TFRecord

```xmltodict``` is used to parse the annotations while ```narau``` is used as helper in creating the examples for the TFRecord.

References:
*   Object Detection Documentation on Datasets: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md

In [7]:
def annotation_to_example(annotation_path, image_basedir, classmap):
    with open(annotation_path) as f:
        annotation = xmltodict.parse(f.read())
    annotation = annotation['annotation']
    
    size = annotation['size']
    width = int(size['width'])
    height = int(size['height'])
    
    filename = os.path.basename(annotation['path'])
    format_ = os.path.splitext(filename)[-1][1:].lower()
    
    image_path = os.path.join(image_basedir, filename)
    with open(image_path, 'rb') as f:
        image_bytes = f.read()
        
    classes_text = []
    xmins = []
    ymins = []
    xmaxs = []
    ymaxs = []
    
    try:
        objects = nr.example._maybe_as_iterable(annotation['object'])
    except KeyError:
        return None, 0
    
    count = 0
    for obj in objects:
        classes_text.append(obj['name'])
        
        box = obj['bndbox']
        xmin = int(box['xmin'])
        ymin = int(box['ymin'])
        xmax = int(box['xmax'])
        ymax = int(box['ymax'])
        
        if check_dimensions(height, width, xmin, xmax, ymin, ymax):  
            xmins.append(xmin/width)
            ymins.append(ymin/height)
            xmaxs.append(xmax/width)
            ymaxs.append(ymax/height)
            count += 1
        else:
            return None, 0
    
    classes = [classmap[ct] for ct in classes_text]  
    example = nr.example.Example(
        nr.example.Features({
            'image/height': nr.example.Int64Feature(height),
            'image/width': nr.example.Int64Feature(width),
            'image/filename': nr.example.BytesFeature(filename.encode('utf-8')),
            'image/source_id': nr.example.BytesFeature(filename.encode('utf-8')),
            'image/encoded': nr.example.BytesFeature(image_bytes),
            'image/format': nr.example.BytesFeature(format_.encode('utf-8')),
            'image/object/bbox/xmin': nr.example.FloatFeature(xmins),
            'image/object/bbox/xmax': nr.example.FloatFeature(xmaxs),
            'image/object/bbox/ymin': nr.example.FloatFeature(ymins),
            'image/object/bbox/ymax': nr.example.FloatFeature(ymaxs),
            'image/object/class/text': nr.example.BytesFeature(map(lambda x: x.encode('utf-8'), classes_text)),
            'image/object/class/label': nr.example.Int64Feature(classes), 
        })
    )
    return example, count

## Convert all images and annotations to examples
All of the annotations and images are converted to examples by calling the previously defined function

In [8]:
def annotations_to_examples(annotation_basedir, image_basedir, classmap):
    examples = []
    all_count = 0
    for annotation_name in os.listdir(annotation_basedir):
        annotation_path = os.path.join(annotation_basedir, annotation_name)
        example, count = annotation_to_example(annotation_path, image_basedir, classmap)
        if example:
            examples.append(example)
            all_count += count
        else:
            print(annotation_path)
    print(all_count)
    return examples

## Split the dataset and save them to TFRecords
The dataset is split based on the specified training part and then both are saved to their own TFRecord file.

In [10]:
examples = annotations_to_examples(SRC_ANNOTATION_PATH, SRC_IMAGES_PATH, classmap)
train_index = int(len(examples) * TRAIN_SPLIT)

with tf.python_io.TFRecordWriter(DST_TRAIN_PATH) as writer:
    for example in examples[:train_index]:
        writer.write(example.SerializeToString())
        
with tf.python_io.TFRecordWriter(DST_DEV_PATH) as writer:
    for example in examples[train_index:]:
        writer.write(example.SerializeToString())

('C:\\Users\\Raffaello Baluyot\\Documents\\python\\notebooks\\rpa',
 'rpa_train.tfrecord',
 'rpa_dev.tfrecord')

## Creation of datasets is complete
Check if the protobuf text and the TFRecord files were created successfully.

# Training and Exporting
Choose a pre-trained model and then create a pipeline for it. After the pipeline is created, the training can be executed. The trained model can also be exported.

References:
*   Tensorflow Object Detection Model Zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
*   Creating a Pipeline: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md
*   Training the Model: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md
*   Exporting the Trained Model: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md