<a href="https://colab.research.google.com/github/DarekGit/Documents/blob/master/notebooks/03_00_Miary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>




# Metryka detekcji twarzy
Celem tego zadania było przygtowanie jednolitej miary oceny dla wszystkich modeli używanych w pracy do detekcji twarzy.
Należy zwrócić uwagę, iż w większości repozytoriów i konkursów stosowana jest głównie miara mAP, jednak może ona posiadac nieznaczne modyfikacje w zależności od wdrożenia. Różny sposób wdrożenia nie powinien mieć istotnego wpływu na wynik, ale nawet różnice poniżej 1pp mogą przesądzać o wyborze modelu do dalszych prac. Dlatego zależało nam na przygotowaniu jedenj spójnej miary do oceny wszystkich badanych modeli. 


Przygotowany moduł zapewnia bardzo łatwe użycie metryki oraz jej wizualizację.
Metryka bazuje na danych podawanych w postaci list zdetekowanych pól oraz pól zawartych w opisie datasetu (ground truth) 

podstsawowe inforamcje o defincji mAP można znaleźć na stronie:
https://towardsdatascience.com/breaking-down-mean-average-precision-map-ae462f623a52

## Definicje  

### Intersection Over Union (IOU)


Intersection Over Union (IOU) jest miarą bazującą na indeksie Jaccarda oceniającego stopień względny stopień dopasowania pól jako stosunek części wspólnej do pola łącznego. Poprzez zastosowanie IoU możemy określić czy mamy przypadek ważnej detekcji (True Positive) czy nie (False Positive).

$\text{IOU}=\frac{\text{area}\left(B_{p} \cap B_{gt} \right)}{\text{area}\left(B_{p} \cup B_{gt} \right)}$


Obrazek poniżej ilustruje IoU pomiędzy przypisanym polem/obwiednią (w kolorze zielonym - ground truth bounding box) a wykrytym polem/obwiednią (w kolorze czerwonym - detected bounding box.

<!--- IOU --->
<p align="center">
<img src="../Figures/iou.png" align="center"/></p>

### True Positive, False Positive, False Negative and True Negative  

Podstawowe elementy używane w mierze mAP:  

* **True Positive (TP)**: A correct detection. Detection with IOU ≥ _threshold_  
* **False Positive (FP)**: A wrong detection. Detection with IOU < _threshold_  
* **False Negative (FN)**: A ground truth not detected  
* **True Negative (TN)**: Does not apply. It would represent a corrected misdetection. In the object detection task there are many possible bounding boxes that should not be detected within an image. Thus, TN would be all possible bounding boxes that were corrrectly not detected (so many possible boxes within an image). That's why it is not used by the metrics.

_threshold_: depending on the metric, it is usually set to 50%.

### Precision

Precision is the ability of a model to identify **only** the relevant objects. It is the percentage of correct positive predictions and is given by:

<br>
$\text{Precision} = \frac{\text{TP}}{\text{TP}+\text{FP}}=\frac{\text{TP}}{\text{all detections}}$ <br>


### Recall 

Recall is the ability of a model to find all the relevant cases (all ground truth bounding boxes). It is the percentage of true positive detected among all relevant ground truths and is given by:

<br>
$\text{Recall} = \frac{\text{TP}}{\text{TP}+\text{FN}}=\frac{\text{TP}}{\text{all ground truths}}$ <br>

## Metrics

In the topics below there are some comments on the most popular metrics used for object detection.

### Precision x Recall curve

The Precision x Recall curve is a good way to evaluate the performance of an object detector as the confidence is changed by plotting a curve for each object class. An object detector of a particular class is considered good if its precision stays high as recall increases, which means that if you vary the confidence threshold, the precision and recall will still be high. Another way to identify a good object detector is to look for a detector that can identify only relevant objects (0 False Positives = high precision), finding all ground truth objects (0 False Negatives = high recall).  

A poor object detector needs to increase the number of detected objects (increasing False Positives = lower precision) in order to retrieve all ground truth objects (high recall). That's why the Precision x Recall curve usually starts with high precision values, decreasing as recall increases. You can see an example of the Prevision x Recall curve in the next topic (Average Precision). This kind of curve is used by the PASCAL VOC 2012 challenge and is available in our implementation.  

### Average Precision

Another way to compare the performance of object detectors is to calculate the area under the curve (AUC) of the Precision x Recall curve. As AP curves are often zigzag curves going up and down, comparing different curves (different detectors) in the same plot usually is not an easy task - because the curves tend to cross each other much frequently. That's why Average Precision (AP), a numerical metric, can also help us compare different detectors. In practice AP is the precision averaged across all recall values between 0 and 1.  

From 2010 on, the method of computing AP by the PASCAL VOC challenge has changed. Currently, **the interpolation performed by PASCAL VOC challenge uses all data points, rather than interpolating only 11 equally spaced points as stated in their [paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf)**. As we want to reproduce their default implementation, our default code (as seen further) follows their most recent application (interpolating all data points). However, we also offer the 11-point interpolation approach. 

#### 11-point interpolation

The 11-point interpolation tries to summarize the shape of the Precision x Recall curve by averaging the precision at a set of eleven equally spaced recall levels [0, 0.1, 0.2, ... , 1]:

<br>
$\text{AP}=\frac{1}{11} \sum_{r\in \left \{ 0, 0.1, ...,1 \right \}}\rho_{\text{interp}\left ( r \right )}$

with
<br>
$\rho_{\text{interp}} = \max_{\tilde{r}:\tilde{r} \geq r} \rho\left ( \tilde{r} \right )$<br>


where $\rho\left ( \tilde{r} \right )$ is the measured precision at recall $\tilde{r}$ .

Instead of using the precision observed at each point, the AP is obtained by interpolating the precision only at the 11 levels $r$ taking the **maximum precision whose recall value is greater than $r$**.

#### Interpolating all points

Instead of interpolating only in the 11 equally spaced points, you could interpolate through all points $n$ in such way that:

$\sum_{n=0} \left ( r_{n+1} - r_{n} \right ) \rho_{\text{interp}}\left ( r_{n+1} \right )$
 
with

<br>
$\rho_{\text{interp}}\left ( r_{n+1} \right ) = \max_{\tilde{r}:\tilde{r} \ge r_{n+1}} \rho \left ( \tilde{r} \right )$ <br><br>


where $\rho \left ( \tilde{r} \right )$  is the measured precision at recall $\tilde{r}$.

In this case, instead of using the precision observed at only few points, the AP is now obtained by interpolating the precision at **each level**, $r$ taking the **maximum precision whose recall value is greater or equal than $r+1$**. This way we calculate the estimated area under the curve.


## Example
Example of usage of defined mAP is presented in:<br> #[mAP notebook](../notebooks/03_01_mAP.ipynb "mAP notebook")



  


## References

* The Relationship Between Precision-Recall and ROC Curves (Jesse Davis and Mark Goadrich)
Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of
Wisconsin  
http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf

* The PASCAL Visual Object Classes (VOC) Challenge  
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf

* Evaluation of ranked retrieval results (Salton and Mcgill 1986)  
https://www.amazon.com/Introduction-Information-Retrieval-COMPUTER-SCIENCE/dp/0070544840  
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html