Skip to content
Browse files

Final Report

  • Loading branch information...
AkbarShah96 committed Dec 27, 2018
1 parent 55a7f7a commit 1c79843329639964b917a81461dddf9cbdc4b4d8
@@ -1,5 +1,79 @@
# AMOD18 To Be Detected {#demo-objdet status=draft}

<div figure-id="fig:1" figure-caption="">
<img src="TBD_images/Logo.png" style='width: 30em'/>

## Introduction

It is detrimental for the health and safety of the citizens of duckietown, that duckiebots navigate safely in the city. Therefore, the duckiebots must be able to detect and correctly identify road users (duckies and duckiebots) as well as road signals (traffic signs, traffic lights, etc.), furthermore, the apriltags(QR code) around duckietown have been used by other projects. To achieve this goal, a object detection pipeline was created based on a convolutional neural network, which detects the aforementioned objects using the monocular camera only.

A high-level overview of how the detection pipeline works can be seen in the figure below. Because the RaspBerry Pi (RPI) is by no means powerful enough to run the detection pipeline, it has to be run on a laptop.

<div figure-id="fig:2" figure-caption="The logical architecture of our implementation.">
<img src="TBD_images/ObjectDetectionNodes.png" style='width: 30em'/>

The duckiebot runs the ros-picam container, which publishes the image stream from the duckiebot's camera to the detector node on the laptop. The detector node then does its predictions, draws bounding boxes with the appropriate labels and confidence levels and publishes a new stream of images to another topic which can then be visualized in real time through `rqt_image_view`, or a similar tool. *Figure 2* shows the `rqt_graph` where the ROS nodes, topics and their interaction can be visualized when the detection is being run on a stream of images coming from the camera of yanberBot.

<div figure-id="fig:3" figure-caption="The logical architecture of our implementation.">
<img src="TBD_images/rqt_graph.png" style='width: 30em'/>

## Approach

In this section we will elaborate on the steps taken by the team from the start of the project (Nov 12th 2018) to the DEMO that wrapped it up on Dec 20th 2018.

### Definition of objectives and contribution

The first thing to do at the start of any research project is to look at what has been done, identify the gaps where progress can be made and translate this notion of progress into tangible goals and milestones.

This was the first year in the history of the AMOD course that a project was assigned to object detection and pattern recognition. However, detecting objects in some shape or form was of course not new in Duckietown; perception being arguably the most important features of an autonomous robot.

Out of all past projects, the one that we could identify ourselves best with was "Saviours". Their work focussed on obstacle avoidance rather than just detection. Therefore, one of the main requirements for their detection pipeline was speed. Hence why they opted for an approach where they would detect objects using manually extracted features (such as image and colour transforms). Speed, of course, does not come without sacrificing performance. Extracting features using heuristics can be very efficient but is incredibly hard for a large class of objects under varying lighting conditions and environment.

Our research goals were targeted at finding another solution along the "Pareto Barrier" between speed, performance and robustness. In other words, our goal was to use a deep learning approach to significantly outperform the Saviours' detection algorithm while staying as close as possible to their speed. The results shown in the figure below and the table below it show that we have indeed been able to outperform the previous detection algorithm in terms of Intersection over Union (IoU) and accuracy, while identifying 5 more object classes and being robust against cluttered environments and varying lighting conditions. Due to time constraints we have not been able to deploy our inference model on the duckiebot's RPI, which means we do not have a side-by-side comparison for speed performance.

<div figure-id="fig:4" figure-caption="comparison between predictions made by the Saviours' detection algorithm (leftmost image) and our current heavy inference model (the other two images). The Saviours used the Inverse Perspective Mapping Algorithm along with a transformation of the images to the HSV color space to extract features manually while our approach relies fully on a Convolutional Neural Network which is trained on 1800 example images">
<img src="TBD_images/comparison.png" style='width: 30em'/>

### Building your own object detector

In this section, we briefly highlight the steps required to build your own object detector. A collection of images known as data set is required to train the convolu
neural network. The images were collected from the duckietown in the Auto Lab with different lighting conditions in order to train our model to be robust against lighting.

The data was the labeled using an external company ( It is recommended to provide detailed instructions on how you want your images labeled and make good qualifier/honey pot tasks in order to make sure the labeling is done effectively. The labeled images are then used to train the convolutional neural network. The tensorflow object detection API provides an open source framework which makes it easy to deploy object detecton models.

The CNN is then optimized to provide the desired accuracy and speed. The Duckiebot has limited computational resources, therefore it is recommended to have a very light model. The inference model is then containerized using docker. The figure below shows the steps to build an object detector.

<div figure-id="fig:5" figure-caption="Steps to build an object detector">
<img src="TBD_images/flowchart.png" style='width: 30em'/>

### Performance Figures

In this section we present the performance of the two different models. Figure below shows two graphs extracted from Tensorboard after training the two object detection models. On the y-axis, the mean average precision (mAP) is plotted while on the x-axis are the number of learning steps of the CNN optimizer. To calculate mAP, a threshold of IoU=0.5 was set, meaning that an object was classified correctly with respect to the ground truth iff the IoU of the bounding boxes was above 0.5 and the labels matched.The heavier model known as 'rfcn_resnet101' had an inference speed of 92 ms with a mean average precision of 30 percent. The second model was lighter known as 'ssdlite_mobilenet_v2' (only 14 MB), it had an inference speed of 27ms and a mean average precision of 22 percent. The figure below shows the mean average precision for both models. The performance was measured on Nvidia GeForce GTX TITAN X GPU.

<div figure-id="fig:6" figure-caption="left: rfcn_resnet101 , right: ssdlite_mobilenet_v2">
<img src="TBD_images/Performance.png" style='width: 30em'/>

## Demo

This is the demo for object detection using the camera on the Duckiebot. The Duckiebot has been trained to detect duckies, Duckiebots, traffic lights, QR codes, intersection signs, stop signs, and (traffic) signal signs. The joystick demo (or lane following pipeline) is used to navigate the Duckiebot around Duckietown.

@@ -26,17 +100,23 @@ Results: The duckiebot is able to detect objects using its camera.

## Images of expected results {#demo-objdet-expected}
## Video and Images of expected results {#demo-objdet-expected}

<div figure-id="fig:objdet" figure-caption="The video shows the result to be expected if the demo is successful.">
<dtvideo src='vimeo:308298528'/>

<div figure-id="fig:objdet2" figure-caption="The expected result is bounding boxes around objects in duckietown.">
<img src="objdet2.png" style='width: 30em'/>
<div figure-id="fig:8" figure-caption="The expected result is bounding boxes around objects in duckietown.">
<img src="TBD_images/objdet2.png" style='width: 30em'/>

<div figure-id="fig:objdet1" figure-caption="Another example, notice this one also has the duckiebot.">
<img src="objdet1.png" style='width: 30em'/>

<div figure-id="fig:9" figure-caption="Another example, notice this one also has the duckiebot.">
<img src="TBD_images/objdet1.png" style='width: 30em'/>

TODO: A high quality video will be uploaded before the deadline. For now these images are self explanatory of what is to be expected.

## Duckietown setup notes {#demo-objdet-Duckietown-setup}

@@ -53,12 +133,17 @@ The Duckietown used for this demo must have the following characteristics.
* Duckiebots on the road.

No cluttering of objects in one place. Allow enough space between each object. An example image is shown below.
No cluttering of objects in one place. Allow enough space between each object. An example image is shown below. See image below for reference.

<div figure-id="fig:10" figure-caption="Another example, notice this one also has the duckiebot.">
<img src="TBD_images/autolab.jpg" style='width: 30em'/>

## Duckiebot setup notes {#demo-objdet-Duckiebot-setup}

Put a duckie on top of the duckiebot.(Seriously)
No extra setup needed for the duckiebot except put a duckie on top of the duckiebot.(Seriously)

## Pre-flight checklist {#demo-objdet-pre-flight}

@@ -85,6 +170,15 @@ The pre-flight checklist for this demo are:
The following steps must be completed in order to run the object detector on your duckiebot.

If you are lazy, here is a video guiding you through some of the steps.
<div figure-id="fig:objdetdemo" figure-caption="Demo Instructions">
<dtvideo src='vimeo:308461574'/>

**Step 1**: When the Duckiebot is powered on, make sure all the containers required are running. In your laptop, run

laptop $ docker -H ![duckie_bot].local ps
@@ -93,8 +187,8 @@ to check whether the right containers are running or not. You can also check by

Note: If the required containers are running then skip to Step 4.

<div figure-id="fig:Containers" figure-caption="The containers that are required for this demo.">
<img src="Containers.png" style='width: 30em'/>
<div figure-id="fig:12" figure-caption="The containers that are required for this demo.">
<img src="TBD_Images/Containers.png" style='width: 30em'/>

@@ -171,7 +265,7 @@ Symptom: The ros nodes cannot communicate with each other.

Resolution: If you are using docker on Mac OSX, there seems to be an issue with the network of docker containers. We recommend to use docker on Ubuntu 16.04. We have tested it and everything is fine.
(Insert Image).

@@ -207,7 +301,70 @@ Repeat step 4.

## Demo failure demonstration {#demo-objdet-failure}

<div figure-id="fig:objdetfail" figure-caption="The video shows the case when the object detector is not behaving as intended.">
<dtvideo src='vimeo:308295993'/>

Failure is not an option sorry.

## AIDO Challenge (beta version!)

If after watching our object detector in action you cannot wait to build your own, you might want to stick around...

Aside from developing a CNN-based object detector, we have developed a framework that allows you to test the performance of your own object detector. This is analogous to the framework that CircleCi provides for unit tests, except this is targeted at *performance* tests.

We have created an additional (private) repository that contains testing images + labels in addition to an evaluator which compares labels it receives from an inference model with the ground truth. This results in a score which is displayed on the server. In the future, on of the metrics that the evaluator should be able to display is the prediction time and RAM usage which are crucial in the context of object detection in Duckietown.

As the user, once the framework depicted in the figure below is functional, you have to include your submission (inference model) in the /sub-objdet-OD folder within a docker environment, that will automatically get built when you submit to the challenge.

<div figure-id="fig:14" figure-caption="The AIDO object detection module.">
<img src="TBD_Images/AIDO.png" style='width: 30em'/>

Unfortunately we have run into many issues while setting up a local server to test the interaction between submission and evaluator containers, which means little to no testing has been done on whether this pipeline works as expected.

## Future works

* Better training data.

In dark lighting conditions our model would sometimes detect a duckiebot when it saw the white light from its own LEDs in the image. This could be improved by having more robust training data. The images used to train the model were also not labeled as good as required by the company.

* Object detection running on RPI with Movidious stick.

Right now, the object detector is run on the laptop and the duckiebot only provides the images. To run the object detector on the raspberry pi, a movidious stick can be used which provides the computation resources to handle a convolutional neural network.

* Improve speed.

Since computation resources are limited on the duckiebot, it is suggested to make the model as light as possible without compromising on accuracy.

* Aido Challange.

Test the AIDO challenge module and define an official Object Detection challenge for AIDO 2

* Temporal features

Use temporal features - possibly using odometry information

For any questions, please contact any of the team members:

David Oort Alonso (

Mian Akbar Shah (

Yannick Berdou (

Guoxiang Zhou
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown

0 comments on commit 1c79843

Please sign in to comment.
You can’t perform that action at this time.