# 0- Introduction

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import torch
import tensorflow as tf
import cv2

# 1- Overall architecture
* Architecture: **Yolov8's model in the tranning phase**
<a href="https://ibb.co/6gpLnd3"><img src="https://i.ibb.co/NZkbV48/yolov8-training.png" alt="yolov8-training" border="0"></a>



## 1.1- The input to model and label encoder during the training phase

<a href="https://ibb.co/1J751qC"><img src="https://i.ibb.co/WpGwZyr/yolov8-label-encoder.png" alt="yolov8-label-encoder" border="0"></a>

## 1.2- Model's output
<a href="https://ibb.co/JKL7s4W"><img src="https://i.ibb.co/KXZqrkc/yolov8-output.png" alt="yolov8-output" border="0"></a>


The output is reshaped according to the following steps:
<a href="https://ibb.co/bXtx5Pr"><img src="https://i.ibb.co/m4pm5GH/yolov8-output-reshape.png" alt="yolov8-output-reshape" border="0"></a>

## 1.3- Model's architecture

### 1.3.1- Anchor points
* Yolov8 outputs 3 feature maps of shape: $80 \times 80, 40 \times 40$ and $20 \times 20$.
* **Anchor points** are points on these output feature maps; they are the points of a grid placed on the feature maps by sampling x and y axis at $[0.5, 1.0, 1.5, 2.0, \cdots, size-0.5]$; where $size$ is the feature map's size; i.e., $80, 40$ or $20$.
    * We are able to convert the anchor points to image space by multiplying their coordinates with the stride for each each feature map. The anchor points can be seen as in the following picture (in image space).
    * ```stride = image-size /feature-size```. There are 3 feature maps, their size corresponds to $8, 16$ and $32$.

<a href="https://ibb.co/Yfh2wCJ"><img src="https://i.ibb.co/5sLTw0C/anchorpoints.png" alt="anchorpoints" border="0"></a>

### 1.3.2- Anchor points: Purpose
*  Anchor points are used to **encode** and to **decode** ground-truth boxes and predicted-boxes.
*  Each anchor point is associated with a predicted box and vice versa.
*  The figure below describe the relationship between an anchor point and its predicted boxes.
*  

<a href="https://ibb.co/rFB7KnB"><img src="https://i.ibb.co/vVNwC2N/yolov8-anchor-enc-dec.png" alt="yolov8-anchor-enc-dec" border="0"></a>

### 1.4.2- Meaning of model's output
* Yolov8 predicts the following information for $N_a$ candidate boxes; where,  $N_a$ is the total number of anchor points.
  * (1) **Classes probabilities**: there are $N_c$ classes; so, there are $N_c$ probabilities, each is the range $[0, 1]$ (Yolov8 try to support multilabel) for each boxes.
  * (2) Distribution of $LT.dx; LT.dy; RB.dx; RB.dy$. Yolov8 uses **categorical distribution** to approximate the distribution of those displacement.
      * $N_r$: range of a displacement; i.e., any displacement can have the value within $[0, N_r-1]$
      * Therefore, for each candidate boxes, Yolov8 output $N_r (e.g., 16)$ values for each of $LT.dx; LT.dy; RB.dx; RB.dy$.
      * $N_r$ values for a box's displacement are passed to softmax function to output the distribution.
      * In the case that we want to determine the displacement, we compute the expectation: $\sum_{x=0}^{x=N_r}{x\times p(x)}$

### 1.4.3- The model
<a href="https://ibb.co/dp8rp8W"><img src="https://i.ibb.co/bvcPvcW/Yolov8.png" alt="Yolov8" border="0"></a>

# 2-LossMeter
<a href="https://ibb.co/CQ1ZyjY"><img src="https://i.ibb.co/dDWqvNV/yolov8-loss-meter.png" alt="yolov8-loss-meter" border="0"></a>

### 2.1- Matching ground-truth vs candidate boxes:
<a href="https://ibb.co/sJVSWFv"><img src="https://i.ibb.co/yBQT48n/yolov8-matcher.png" alt="yolov8-matcher" border="0"></a>

## 2.2 Distribution Focal Loss (DFL Loss)
* DFL-Loss:

<a href="https://ibb.co/C7NN8Sn"><img src="https://i.ibb.co/yyGGPx8/yolov8-dfl-loss.png" alt="yolov8-dfl-loss" border="0"></a>
