<p style="text-align: center">
<img src="../../assets/images/dtlogo.png" alt="Duckietown" width="50%">
</p>

# Object Detection

Machine-learned object detection models can be extremely useful. They are faster and often more reliable than traditional computer vision models. Additionally, we can use pretrained model weights to cut down immensely on training time.

Here's an example of what an object detector might output:



<iframe width="800" height="500"
src="https://www.youtube.com/embed/3jD02dxL6gg" 
frameborder="0" 
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" 
allowfullscreen
style="margin: auto; display: block"></iframe>


In this exercise, you will create your own Duckietown object detection dataset. You will learn about the general structure such a dataset should follow. You will train the object detection model on that dataset ([in a subsequent notebook](../03-Training/training.ipynb). Finally, you will integrate the model into a ROS node and test the integration ([in the last notebook](../04-Integration/integration.ipynb), so that your Duckiebot knows how to recognize duckie pedestrians (and thus avoid them). You can test your object detector in simulation and on your real Duckiebot.

### Steps:

1. Setup  
2. Investigation
3. Data collection
4. Training
5. Integration



## 1. Setup

First, we need some global variables. These allow you to change the directory where we store the data you will need. You can also change the image size to reflect what your final model uses, but you can worry about that later.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
DATASET_DIR="/code/object-detection/assets/duckietown_object_detection_dataset"
IMAGE_SIZE = 416
# this is the percentage of real data that will go into the training set (as opposed to the testing set)
REAL_TRAIN_TEST_SPLIT_PERCENTAGE = 0.8

While you will build your own dataset with simulated images in part 2, it would be unreasonable to ask you to build your own dataset of real images. Run the cell below to download a dataset of pre-labelled real images.

In [None]:
from utils.utils import runp

# download dataset
if not os.path.exists(DATASET_DIR):
    runp(f"rm -rf {DATASET_DIR}/*")
    runp(f"mkdir -p {DATASET_DIR}/images")
    runp(f"mkdir -p {DATASET_DIR}/labels")
    runp(f"mkdir -p {DATASET_DIR}/train/images")
    runp(f"mkdir -p {DATASET_DIR}/train/labels")
    runp(f"mkdir -p {DATASET_DIR}/val/images")
    runp(f"mkdir -p {DATASET_DIR}/val/labels")
else:
    print("Folder structure already exists!")

# download dataset
if not (len(os.listdir(f"{DATASET_DIR}/images"))):
    !wget -O /tmp/dataset.zip https://duckietown-public-storage.s3.amazonaws.com/assets/mooc/2022/duckietown_object_detection_dataset.zip
    runp(f"unzip -q /tmp/dataset.zip -d $(dirname {DATASET_DIR})")
    runp(f"rm /tmp/dataset.zip")
else:
    print("Dataset already downloaded!")

These real-world images are not the right size. Run the cell bellow to resize them (and resize the associated bounding boxes accordingly).


In [None]:
import json
import os
import cv2
import numpy as np
from tqdm import tqdm
from utils.utils import xminyminxmaxymax2xywfnormalized, train_test_split, makedirs, runp

with open(f"{DATASET_DIR}/annotation/final_anns.json") as anns:
    annotations = json.load(anns)

In [None]:
npz_index = 0

all_image_names = []
    
def save_img(img, boxes, classes):
    global npz_index
    cv2.imwrite(f"{DATASET_DIR}/images/real_{npz_index}.jpg", img)
    with open(f"{DATASET_DIR}/labels/real_{npz_index}.txt", "w") as f:
        for i in range(len(boxes)):
            f.write(f"{classes[i]} "+" ".join(map(str,boxes[i]))+"\n")
    npz_index += 1
    all_image_names.append(f"real_{npz_index}")

filenames = tqdm(os.listdir(f"{DATASET_DIR}/frames"))
for filename in filenames:
    img = cv2.imread(f"{DATASET_DIR}/frames/{filename}")

    orig_y, orig_x = img.shape[0], img.shape[1]
    scale_y, scale_x = IMAGE_SIZE/orig_y, IMAGE_SIZE/orig_x

    img = cv2.resize(img, (IMAGE_SIZE,IMAGE_SIZE))

    boxes = []
    classes = []

    if filename not in annotations:
        continue

    for detection in annotations[filename]:
        box = detection["bbox"]
        label = detection["cat_name"]

        if label not in ["duckie", "cone"]:
            continue

        orig_x_min, orig_y_min, orig_w, orig_h = box

        x_min = int(np.round(orig_x_min * scale_x))
        y_min = int(np.round(orig_y_min * scale_y))
        x_max = x_min + int(np.round(orig_w * scale_x))
        y_max = y_min + int(np.round(orig_h * scale_y))

        boxes.append([x_min, y_min, x_max, y_max])
        classes.append(1 if label == "duckie" else 2)

    if len(boxes) == 0:
        continue


    boxes = np.array([xminyminxmaxymax2xywfnormalized(box, IMAGE_SIZE) for box in boxes])
    classes = np.array(classes)-1
    
    save_img(img, boxes, classes)



train_test_split(all_image_names, REAL_TRAIN_TEST_SPLIT_PERCENTAGE, DATASET_DIR)

Once that's done, you're all set! We'll explain how the code above worked as you continue through this notebook.

## 2. Investigation

What does an object detection dataset look like? Clearly, the specifics will depend on the convention used by specific models, but the general idea is intuitive:

- We need an image
- This image might have many bounding boxes in it, so we need some sort of list of coordinates
- These bounding boxes must be associated with a class

How are the bounding boxes defined?

![image of a bounding box](../../assets/images/bbox.png)

Some conventions use `x_min y_min width height`, whereas others use `x_min y_min x_max y_max`, and others use `x_center y_center width height`. In this exercise, the model we recommend ([YoloV5](https://github.com/Velythyl/yolov5)) uses `x_center y_center width height`.

And how do we actually obtain these bounding boxes? In real-life applications, you would need to label a dataset of images by hand. But if you have access to a simulator that is able to segment images, you could obtain the bounding boxes directly from the segmented images. 

If you take a look at Pytorch's object detection [tutorial](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html), that is similar to what they do. While their images were segmented by hand, the tutorial uses the same technique that we will use here to obtain the bounding boxes. Their images look like this:

![image with bounding boxes](../../assets/images/FudanPed.png)
<p align="center">\[Source: https://www.cis.upenn.edu/~jshi/ped\_html/\]</p>

And they simply calculate the min and max x and y coordinates of the segmented objects to obtain the bounding box.

We will use the segmented mode in the Duckietown simulator to compute the bounding boxes of non-segmented images.

#### What we want to detect

The goal of this exercise is to make Duckietown safer: we want to be able to detect duckie pedestrians on the road and avoid squishing them. We also want to detect trucks, buses, and cones. Here is the complete list, along with their corresponding IDs:

0. Duckie
1. Cone
2. Truck
3. Bus

## 3. Data collection


### Format

We are going to supplement the data from the real dataset that we already downloaded with data automatically generated from the simulator. 

The script we will use for this is the [data_collection.py](../../packages/utils/data_collection.py) file. 
You will need to edit it in order to change the number of images generated, the map used by the simulator to generate images, and other parameters. More instructions on that later in the notebook. 

The purpose of the [data_collection.py](../../packages/utils/data_collection.py) script is to automatically generate data for you from the simulator. 
In the rest of this activity we will walk step by step through the process. 

Of course, your dataset's format depends heavily on your model. If you want to use the [YoloV5](https://github.com/duckietown/yolov5) model that we suggest, you should closely follow their [guide on how to train using custom data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data).

Your data should follow the following directory structure:

![image of dataset save format](../../assets/images/dataset_format.png)

The dataset is called `duckietown_object_detection_dataset` and is stored inside the `assets/` directory of this learning experience.
We have created two subdirectories in that folder: `train` and `val`. Both these directories should contain two subdirectories, `images` and `labels`. Inside `images`, you must place your images, and inside `labels`, you must place the images' bounding boxes data. Notice that the label files use the same name as their corresponding image files but with a different extension. In other words, the data for `0.jpg` can be found in `0.txt`.

The format for the label files is fairly simple. For each bounding box in the corresponding image, write a row of the form `class x_center y_center width height`. Keep in mind that the pixel data must be 0-to-1 normalized (i.e., you can calculate the usual `x_center y_center width height` in pixel space and divide by the image's size). For example,

    0 0.5 0.5 0.2 0.2
    1 0.60 0.70 0.4 0.2

this says "there is a duckie (class 0) centered in the image, whose width and height are 20% of the image's. There is also a cone (class 1) whose center is at 60% of the image's maximal x value and 70% of the image's maximal y value, and its width is 40% of the image's own while its height is 20%."

Again, it is recommended that you read the guide posted on YoloV5's GitHub: [guide on how to train using custom data](https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data).

### Setting up data collection

After you're done editing the [data_collection.py](../../packages/utils/data_collection.py) file, we will need to run it against the simulator.
We will do that through this activity's virtual desktop environment.

Access this activity's virtual desktop (also known and referred to as `VNC`) by running the following command from the root directory of this activity,

```shell
dts code workbench --simulation
```

Click on the URL that you see on screen to open VNC in your browser. Click the "Data Collection" icon on the desktop. 
This will run your [data_collection.py](../../packages/utils/data_collection.py) script file. 
If you edit the script in this editor, you need to close the application and click on the icon again for the changes to have an effect.

### Generating data

#### 1. Take the segmented image (this is provided to you by the simulator's rendering engine)

In [None]:
from PIL import Image
import numpy as np
import cv2
import matplotlib.pyplot as plt
mapping = {
    "house": "3deb34",
    "bus": "ebd334",
    "truck": "961fad",
    "duckie": "cfa923",
    "cone": "ffa600",
    "floor": "000000",
    "grass": "000000",
    "barrier": "000099"
}
mapping = {
    key:
        [int(h[i:i+2], 16) for i in (0,2,4)]
    for key, h in mapping.items()
}

In [None]:
# Feel free to experiment with a few other files in the images folder. All of the original/segmented pairs are labeled as *_not_seg and *_seg
obs = np.asarray(Image.open('../../assets/images/duckie_not_seg.png'))
obs_seg = np.asarray(Image.open('../../assets/images/duckie_seg.png'))
# define the mapping from objects to colours


In [None]:
plt.imshow(obs)

In [None]:
plt.imshow(obs_seg)

#### 2. Remove colors we are not interested in

The function below removes all colors that do not match the given class name.
We use this to isolate the objects of interest by isolating their respective color first.

In [None]:
from solution.setup_activity import segmented_image_one_class

Let's test it out by removing everything that is not a duckie in the image above.

In [None]:
duckie_masked_image = segmented_image_one_class(np.asarray(obs_seg),"duckie")
plt.imshow(duckie_masked_image)

#### 3. Find bounding boxes around each unique instance within the image

The function below isolates the object by finding the contours of the colored blob in the image above.

This results in the bounding box around the object.

In [None]:
from solution.setup_activity import find_all_bboxes

The function below takes the original image and computed bounding boxes and superimposes the bounding boxes to the image.

In [None]:
def show_image_with_boxes(img, boxes):
    import matplotlib.patches as patches
    fig, ax = plt.subplots()
    ax.imshow(img)
    for box in boxes:
        rect = patches.Rectangle((box[0], box[2]), box[1]-box[0], box[3]-box[2], linewidth=1, edgecolor='w', facecolor='none')
        ax.add_patch(rect)
    plt.show()
    

Let's test these functions out on the image above.

In [None]:
boxes = find_all_bboxes(duckie_masked_image)
show_image_with_boxes(obs,boxes)


#### 4. Let's do that but for all classes

In [None]:
from solution.setup_activity import find_all_boxes_and_classes

In [None]:
all_boxes, all_classes = find_all_boxes_and_classes(obs_seg)
show_image_with_boxes(obs, all_boxes)


Finally we will need to save the non-segmented version of the image, and write its bounding boxes + their classes to a corresponding txt file. This is already implemented in the [data_collection.py](../../packages/utils/data_collection.py) file.

### Combining with the real dataset & training/test set splits

When training supervised learning models, one must worry about overfitting to the training set. If you can keep *some* of your dataset *out* of your training data, you can use it to verify that your model does not overfit to your dataset by *testing* it on the data you left out. We call this chunk of data the *validation set*. 

You can experiment with the `REAL_TRAIN_TEST_SPLIT_PERCENTAGE` variable defined at the top of this notebook. Tune its value to adjust the percentage of the **real** data that is used for training as opposed to testing. There is a similar variable defined in [data_collection.py](../../packages/utils/data_collection.py), called `SIMULATED_TRAIN_SPLIT_PERCENTAGE` which controls the percentage of the **simulated** data that will be used for training.

# Next step

You can continue with the [Training notebook](../03-Training/training.ipynb)