## Problem
1. (**0.25p**) Please describe the output tensor and how to interpret it. Especially, explain the meaning of the 5th number, i.e. the objectness score. How is it calculated and related to the class scores?
2. **1p**  Implement the NMS and thresholding yourself.
3. **1p**  Visualize the raw predictions and the predictions after NMS and thresholding.
4. (**0.25p**) Compare the results with the original predictions.
5. (**0.5p**) Vary IoU threshold for NMS and show its effect on detections.
6. (**1p**) Calculate precision, recall, F1 for different IoU thresholds. Calculate `mAP@0.5` (mean Average Precision at an Intersection over Union (IoU) threshold of 0.5) for the model predictions; see https://www.v7labs.com/blog/mean-average-precision
7. (**1p**) Experiment with your own images. Find examples where NMS removes a valid detection (e.g., overlapping people).


### Bonus points:
8. (**1p**) Implement Soft-NMS and compare results; see https://arxiv.org/abs/1704.04503
Does it help in scenes with many overlapping objects?


# Solution

### 1
(**0.25p**) Please describe the output tensor and how to interpret it. Especially, explain the meaning of the 5th number, i.e. the objectness score. How is it calculated and related to the class scores?

Output tensor has size of 85 - it consiste of 4 coordinates of the bounding box, 1 objectness score and 80 class scores.  

The objectness score is a measure of how likely it is that an object is present in the bounding box. It is calculated as the product of the class scores and the confidence score of the bounding box.  
The class scores are the probabilities of each class being present in the bounding box, and the confidence score is a measure of how confident the model is that the bounding box contains an object.  
The objectness score is used to filter out low-confidence detections and keep only high-confidence ones.

### 2
**1p**  Implement the NMS and thresholding yourself.

In [None]:
def non_max_suppression(boxes, scores, threshold):
    """
    Perform non-maximum suppression on bounding boxes.

    Parameters:
    - boxes: List of bounding boxes (x1, y1, x2, y2)
    - scores: List of scores for each bounding box
    - threshold: IoU threshold for suppression

    Returns:
    - List of indices of the remaining boxes after suppression
    """
    if len(boxes) == 0:
        return []

    # Convert to numpy arrays for easier manipulation
    boxes = np.array(boxes)
    scores = np.array(scores)

    # Compute area of the boxes
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = (x2 - x1 + 1) * (y2 - y1 + 1)

    # Sort by scores
    indices = np.argsort(scores)[::-1]

    keep = []
    
    while len(indices) > 0:
        i = indices[0]
        keep.append(i)

        # Compute IoU with the remaining boxes
        xx1 = np.maximum(x1[i], x1[indices[1:]])
        yy1 = np.maximum(y1[i], y1[indices[1:]])
        xx2 = np.minimum(x2[i], x2[indices[1:]])
        yy2 = np.minimum(y2[i], y2[indices[1:]])

        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)
        inter_area = w * h

        iou = inter_area / (area[i] + area[indices[1:]] - inter_area)

        # Keep only boxes with IoU less than the threshold
        indices = indices[np.where(iou <= threshold)[0] + 1]

    return keep