# Object detection
Aim of this notebook is to perform object detection by means of [RetinaNet](https://arxiv.org/abs/1708.02002v2). 
Object detection refers to techniques used to identify objects in images or videos. 
It can be seen as an extension of classification, but instead of classifying the entire image, it detects and localizes an arbitrary number of classes within the image.
The output of object detection techniques consists of bounding boxes that enclose the objects of interest present in the image. For each bounding box, a corresponding class label is provided, as shown in the example depicted in the figure.


![Object Detection](https://raw.githubusercontent.com/davin11/easy-cv-dataset/master/examples/object_detection/objdet.jpg)

We will use the functions of the KerasNub and easy-cv-dataset libraries, to be installed using the following instruction:

In [None]:
!pip install -q --upgrade keras-hub git+https://github.com/davin11/easy-cv-dataset

Now, we will import the stantard libraries, keras_hub, and easy_cv_dataset (with the alias ds).

In [None]:
%reset -f
import numpy as np
import matplotlib.pyplot as plt
import skimage.io as io
import keras
import keras_hub
import easy_cv_dataset as ds

## Data preparation
We will use the dataset [PascalVOC 2007](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/), where we have 20 different classes:
- **Person**: person
- **Animals**: bird, cat, cow, dog, horse, sheep
- **Vehicles**: aeroplane, bicycle, boat, bus, car, motorbike, train
- **Objects**: bottle, chair, dining table, potted plant, sofa, tvmonitor

Write the following instructions to download training, validation and test set for PascalVOC 2007:

In [None]:
SITE="https://raw.githubusercontent.com/davin11/easy-cv-dataset/master"
!wget -nc {SITE}/examples/object_detection/voc2007_objdet_test.csv
!wget -nc {SITE}/examples/object_detection/voc2007_objdet_train.csv
!wget -nc {SITE}/examples/object_detection/voc2007_objdet_val.csv
!wget -nc {SITE}/examples/object_detection/voc2007_download.sh
!bash voc2007_download.sh

At this point, you will find two folders named `voc2007_trainval` and `voc2007_test` that contain the images of the dataset.
Additionally, you will find three CSV (Comma-Separated Values) files: `voc2007_object_train.csv`, `voc2007_object_val.csv`, and `voc2007_object_test.csv`, corresponding to the training, validation and test set, respectively.
CSV files are simple text files that contain a table using a comma (,) as the column delimiter.
These CSV files for Object Detection contain the position and class of the bounding boxes, and they have 6 columns:
 `image` with the file paths of the images,
 `xmin` with the horizontal coordinate of the top-left pixel of each box,
 `ymin` with the vertical coordinate of the top-left pixel of each box,
 `xmax` with the horizontal coordinate of the bottom-right pixel of each box,
 `ymax` with the vertical coordinate of the bottom-right pixel of each box, and
 `class` containing the class information for each box.
Here is an excerpt from the `voc2007_objdet_test.csv` file:

```
image,xmin,ymin,xmax,ymax,class
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/002118.jpg,1,288,189,334,car
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/002118.jpg,215,109,487,211,car
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/002118.jpg,24,88,102,113,car
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/009083.jpg,4,160,497,318,aeroplane
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/005800.jpg,1,108,249,348,bus
./voc2007_test/VOCdevkit/VOC2007/JPEGImages/005800.jpg,23,3,465,375,person
```


If an image has multiple bounding boxes, each box should be indicated on a separate row, as shown in the previous example.
Now, using the function `ds.image_objdetect_dataset_from_dataframe`, 
we can specify how to prepare the images for each set.

In [None]:
BATCH_SIZE=4
IMAGE_SIZE=640
BOX_FORMAT="yxyx"

from keras.layers import Resizing, RandomBrightness, RandomFlip, Pipeline
pre_processing = Resizing(IMAGE_SIZE, IMAGE_SIZE,
                          pad_to_aspect_ratio=True,
						  bounding_box_format=BOX_FORMAT)

augmenter = Pipeline(layers=[
		RandomBrightness(factor=(-0.1, 0.1), value_range=(0, 255), bounding_box_format=BOX_FORMAT),
		RandomFlip("horizontal", bounding_box_format=BOX_FORMAT),
])

In [None]:
print("test-set")
test_ds = ds.image_objdetect_dataset_from_dataframe("voc2007_objdet_test.csv",
      bounding_box_format=BOX_FORMAT,
      pre_batching_processing=pre_processing,
      shuffle=False, batch_size=BATCH_SIZE)
print("trainig-set")
train_ds = ds.image_objdetect_dataset_from_dataframe("voc2007_objdet_train.csv",
      bounding_box_format=BOX_FORMAT,
      pre_batching_processing=pre_processing,
      shuffle=True, batch_size=BATCH_SIZE,
      post_batching_processing=augmenter)
print("validetion-set")
valid_ds = ds.image_objdetect_dataset_from_dataframe("voc2007_objdet_val.csv",
      bounding_box_format=BOX_FORMAT,
      pre_batching_processing=pre_processing,
      shuffle=False, batch_size=BATCH_SIZE)

The function `image_objdetect_dataset_from_dataframe` requires the CSV file as the first parameter.
The second parameter `bounding_box_format` indicates the format in which to format the bounding boxes.
Note that the `bounding_box_format` parameter needs to be specified consistently across all components that process the bounding boxes.
You can find information about the bounding box formats in the official Keras documentation.
The other parameters of the `ds.image_objdetect_dataset_from_dataframe` function are the same of the `ds.image_classification_dataset_from_dataframe` function, which we have seen in other examples.

Please note that all images in the three datasets have been resized to `640 x 640` pixels because the PascalVOC images have different dimensions.
Additionally, for using the RetinaNet architecture, the images need to have dimensions divisible by 64.
Data shuffling and data augmentation operations are only applied to the training set.
To visualize some examples from the test set, you can use the following instructions:


In [None]:
from easy_cv_dataset.visualization import plot_bounding_box_gallery
for images, boxes in test_ds.take(1): # takes the first batch of test-set
    plot_bounding_box_gallery( # function to display image and box
        images, y_true=boxes,
        bounding_box_format=BOX_FORMAT,
        scale=5, font_scale=0.7,
        class_mapping=test_ds.class_names,
    )

## Neural network definition
We use the functions of KerasHib to create a RetinaNet network with a ResNet50 encoder, which is pre-trained on COCO dataset.


In [None]:
from keras_hub.models import RetinaNetBackbone, RetinaNetObjectDetector, RetinaNetObjectDetectorPreprocessor
from keras_hub.layers import RetinaNetImageConverter

pretrained_model = "retinanet_resnet50_fpn_coco"
backbone = RetinaNetBackbone.from_preset(pretrained_model)
normalization = RetinaNetImageConverter.from_preset(pretrained_model, image_size=(IMAGE_SIZE, IMAGE_SIZE))
preprocessor = RetinaNetObjectDetectorPreprocessor(normalization)
model = RetinaNetObjectDetector(
    preprocessor=preprocessor,
    backbone=backbone,
    num_classes=20,
    bounding_box_format=BOX_FORMAT,
)

The parameter `num_classes` indicates the number of classes which is set to 20 and is consistent with the dataset we used.
To reduce the risk of overfitting, we can choose not to train the first layers of the network. To exclude the entire encoder from training, you can use the following code:

In [None]:
model.backbone.trainable = False
model.summary()

## Training
In this example, we use an optimizer called Nadam with gradient clipping.
This is done to address the common problem of gradient explosion during the training of object detection networks.
For the learning rate, we will use a scheduler that reduces it by a factor of 10 after the first 10 epochs and by an additional factor of 10 after another 20 epochs.


In [None]:
base_lr = 0.001
from keras import optimizers
lr_decay = optimizers.schedules.PiecewiseConstantDecay( # schedulatore del lr
  boundaries=[10*len(train_ds), 20*len(train_ds)],
  values=[base_lr, 0.1 * base_lr, 0.01 * base_lr],
)
optimizer = optimizers.Nadam(
  learning_rate=lr_decay, global_clipnorm=10.0
)

For object detection problems, we use the defaout loss functions: the SmoothL1 distance for box localization and the Focal loss for box classification.
Focal loss is a variant of the cross-entropy loss that assigns a higher weight to difficult-to-classify elements.
To define these loss functions, we use the method `compile`:

In [None]:
model.compile(
  box_loss="auto",
  classification_loss="auto",
  optimizer=optimizer)

In [None]:
model.fit(train_ds, epochs=30, validation_data = train_ds, verbose=True)
model.save_weights('net.weights.h5')

Remember to save the weights after training.
You can also load a pre-trained network on PascalVOC 2007 using the following instruction:

In [None]:
!wget -nc "https://huggingface.co/datasets/davin11/VOC2007/resolve/main/retinanet_resnet50_fpn_pascalvoc.weights.h5"
model.load_weights('retinanet_resnet50_fpn_pascalvoc.weights.h5')

We use the following instructions to see the result of the network on some examples from the test set:

In [None]:
from easy_cv_dataset.visualization import plot_bounding_box_gallery
for images, boxes in test_ds.take(4):
  boxes_pred = model.predict(images)
  plot_bounding_box_gallery(
      images, y_pred=boxes_pred, #y_true=boxes,
      scale=5, font_scale=0.7,
      bounding_box_format=BOX_FORMAT,
      class_mapping= test_ds.class_names,
  )

The widely used metric in object detection is mean Average Precision (mAP), but evaluating it during training can be computationally expensive. We can use the function `compute_mAP_metrics` to evaluate the mean Average-Precision (mAP) on the test set.

In [None]:
from easy_cv_dataset.metrics import compute_mAP_metrics
print(compute_mAP_metrics(model, test_ds, BOX_FORMAT))