# <font style="color:blue">4. Loss Function</font>

Let's see how our loss function should look like.

Different from classification and semantic segmentation task, we now have two branches with two independent outputs.

At each iteration, we want to know 2 things - how close are the predicted bounding boxes to the target bounding boxes, and whether their labels are predicted correctly.

That's why our loss function is represented as a sum of two losses: localization and classification loss.

In [1]:
from IPython.display import Code
import inspect

from detection_loss import DetectionLoss

In [2]:
Code(data=inspect.getsource(DetectionLoss)[:797])

## <font style="color:green">4.1. Localization Loss</font>

For localization loss, we choose Smooth L1-loss, following [Faster R-CNN paper](https://arxiv.org/pdf/1506.01497.pdf).

It was invented as a solution for bounding box regression problem that L2 loss suffers from, as it is sensitive to outliers.

Smooth L1-loss can be interpreted as a combination of L1-loss and L2-loss.

It behaves as L1-loss when the absolute value of the argument is high, and like L2-loss when the absolute value of the argument is close to zero.

<img src='https://www.learnopencv.com/wp-content/uploads/2020/03/c3-w8-l1l2smoothl1.png' align='middle'>

In [3]:
Code(data=inspect.getsource(DetectionLoss)[962:1420])

## <font style="color:green">4.2. Classification Loss</font>

As classification loss, we use Cross Entropy Loss as the most popular loss for classification task.

It should be noted, that class imbalance is a very problematic issue for single-stage detectors.
This is because most locations in an image are negatives, that can be easily classified by the detector as background.

We want out network to train on hard examples with positives, which constitute only a small part of all of the locations.

There are different methods on how to deal with that issue. We choose to use Online Hard Example Mining (OHEM) strategy.
It finds hard examples in the batch with the greatest loss values and back-propagates the loss computed over the selected instances.
The amount of hard examples correlates with the number of positive examples and is often chosen as `3:1`.

In [4]:
Code(data=inspect.getsource(DetectionLoss)[1597:3835])

Our final loss will be a weighted sum of localization and classification loss:

In [5]:
Code(data=inspect.getsource(DetectionLoss)[3835:])