$$
\newcommand{\mat}[1]{\boldsymbol {#1}}
\newcommand{\mattr}[1]{\boldsymbol {#1}^\top}
\newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}}
\newcommand{\vec}[1]{\boldsymbol {#1}}
\newcommand{\vectr}[1]{\boldsymbol {#1}^\top}
\newcommand{\rvar}[1]{\mathrm {#1}}
\newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}}
\newcommand{\diag}{\mathop{\mathrm {diag}}}
\newcommand{\set}[1]{\mathbb {#1}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bb}[1]{\boldsymbol{#1}}
$$
# Part 6: YOLO - Objects Detection
<a id=part6></a>

In this part we will use an object detection architecture called YOLO (You only look once) to detect objects in images. We'll use an already trained model weights (v3) found here: https://github.com/ultralytics/yolov3

In [1]:
import torch
#import torchviz

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the YOLO model
model = torch.hub.load('ultralytics/yolov3', 'yolov3')
model.to(device)
# Images
img1 = 'imgs/DolphinsInTheSky.jpg'  
img2 = 'imgs/cat-shiba-inu-2.jpg' 

Using cache found in C:\Users\maymana/.cache\torch\hub\ultralytics_yolov3_master
YOLOv3  2022-12-15 torch 1.10.1 CPU

Fusing layers... 
Model Summary: 261 layers, 61922845 parameters, 0 gradients
Adding AutoShape... 


## Inference with YOLO
<a id=part6_1></a>

You are provided with 2 images (img1 and img2).
**TODO**:
1. Detect objects using the YOLOv3 model for these 2 images.
2. Print the inference output with bounding boxes.
3. Calculate the number of pixels within a bounding box and the number in the background.
4. Look at the inference results and answer the question below.


In [2]:
#Insert the inference code here.
import cv2

#calculates total number of pixels coverd in the bounding boxes
def calculate_t_bb_n_pixels(xyxy):
    pixels = []
    for i in range(xyxy.shape[0]):
        bb = {}
        bb['xmin'] = round(xyxy[i][0].item())
        bb['ymin'] = round(xyxy[i][1].item())
        bb['xmax'] = round(xyxy[i][2].item())
        bb['ymax'] = round(xyxy[i][3].item())
        pixels += [(x, y) for y in range(bb['ymin'], bb['ymax']+1) for x in range(bb['xmin'], bb['xmax']+1)]
    pixels = list(set(pixels))
    return(len(pixels)) 

def inference(img_path):
    result = model(img_path)
    result.print()
    result.show()
    img = cv2.imread(img_path)
    height, width, _ = img.shape
    p_all_img = height*width
    p_within_bb = calculate_t_bb_n_pixels(result.xyxy[0])
    p_back = p_all_img-p_within_bb
    print(f"The number of pixels within a bounding box is {p_within_bb}")
    print(f"The number of pixels in the background is {p_back}")
    print(f"==>{(100*p_within_bb/p_all_img):.2f}% of the pixels are covered in the bounding boxes")

# img1
print("*****Image 1 Inference*****")
inference(img1)

# img2
print("*****Image 2 Inference*****")
inference(img2)



*****Image 1 Inference*****


image 1/1: 183x275 1 person, 2 birds
Speed: 15.6ms pre-process, 1053.1ms inference, 0.0ms NMS per image at shape (1, 3, 448, 640)


The number of pixels within a bounding box is 12255
The number of pixels in the background is 38070
==>24.35% of the pixels are covered in the bounding boxes
*****Image 2 Inference*****


image 1/1: 750x750 1 cat, 2 dogs
Speed: 15.6ms pre-process, 1520.0ms inference, 0.0ms NMS per image at shape (1, 3, 640, 640)


The number of pixels within a bounding box is 399019
The number of pixels in the background is 163481
==>70.94% of the pixels are covered in the bounding boxes


### Question 1

Analyze the inference results of the 2 images. 
1. How well did the model detect the objects in the pictures?
2. What can possibly be the reason for the model failures? suggest methods to resolve that issue.

In [3]:
from cs236781.answers import display_answer
import hw2.answers

display_answer(hw2.answers.part6_q1)


**Your answer:**

1.The model detected the dolphines images very bad, it didn't detect any dolphine, instead it detected two birds and one person!
The detection of the second image is not good enough too, in that image there is three dogs and one cat, but the model detected only two dogs correctly, and detected the third dog as a cat, and didn't detect the cat!

2.In the dolphine image, the model's failure bassiclly results from that the model used was trained on the COCO dataset, which doesn't incluse a dolphine class, so it is basiclly impossible to classify the dolphins correctly even if their image was clearer bacause the model is supervised and doesn't have such a class option. In addition to that, even if the model was trained on dataset that includes such class, the model still may fail in the detection of this image for reasons related to the image conditions: the right two dolphins appear overlapping which makes detecting them more difficult, illumination - the dolphins losted there colors in the image so detecting them now is based only on there shapes, in addition to that, the image was taken when the dolphins jumped from the see and they completely appear in the sky, this may caused the model to classify two of them as a birds, the third may be classified as a person because of the overlapping which made their bottoms seem as a person's legs. 
These issues might be avoided bassiclly by training the model on larger and more genaral dataset and by providing such images to the training set. And by applying some preprocessing to the image to solve the illumination problem and to give the dolphoines some colors.

For the cat&dogs image, we can see that the left dog and the cat were classified as one object, this may reason from that they are located very close to each other, so the bounding boxes from each cell of them may be overlapped. This issue might be avoided by using smaller bouncing boxes.



### Question 2

**TODO**: Print the computational graph of the model using torchviz.


In [4]:
#Insert code here.



Look at the computational graph and describe the model.

In [5]:
display_answer(hw2.answers.part6_q2)



**Your answer:**


Write your answer using **markdown** and $\LaTeX$:
```python
# A code block
a = 2
```
An equation: $e^{i\pi} -1 = 0$



## Creative Detection Failures

<a id=part6_2></a>

Object detection pitfalls could be, for example: **occlusion** - when the objects are partially occlude, and thus missing important features, **model bias** - when a model learn some bias about an object, it could recognize it as something else in a different setup, and many others like **Deformation**, **Illumination conditions**, **Cluttered** or **textured background and blurring** due to moving objects.

**TODO**: Take pictures and that demonstrates 3 of the above object detection pitfalls, run inference and analyze the results.

In [6]:
#Insert the inference code here.

print("*****My Image 1 Inference*****")
my_img1 = 'imgs/my_img1.jpg' 
inference(my_img1)

print("*****My Image 2 Inference*****")
my_img2 = 'imgs/my_img2.jpg' 
inference(my_img2)

print("*****My Image 3 Inference*****")
my_img3 = 'imgs/my_img3.jpg' 
inference(my_img3)

*****My Image 1 Inference*****


image 1/1: 1024x768 1 cat, 1 dog
Speed: 15.6ms pre-process, 1135.2ms inference, 0.0ms NMS per image at shape (1, 3, 640, 480)


The number of pixels within a bounding box is 134244
The number of pixels in the background is 652188
==>17.07% of the pixels are covered in the bounding boxes
*****My Image 2 Inference*****


image 1/1: 1024x576 1 chair
Speed: 15.6ms pre-process, 934.8ms inference, 0.0ms NMS per image at shape (1, 3, 640, 384)


The number of pixels within a bounding box is 388280
The number of pixels in the background is 201544
==>65.83% of the pixels are covered in the bounding boxes
*****My Image 3 Inference*****


image 1/1: 1024x576 1 spoon, 2 bowls, 1 dining table
Speed: 15.6ms pre-process, 1067.7ms inference, 0.0ms NMS per image at shape (1, 3, 640, 384)


The number of pixels within a bounding box is 551612
The number of pixels in the background is 38212
==>93.52% of the pixels are covered in the bounding boxes


### Question 3

Analyize the results of the inference. 
1. How well did the model detect the objects in the pictures? explain.


In [7]:
display_answer(hw2.answers.part6_q3)



**Your answer:**

The model detection is not good enough, it had some pitfall in detectiong the objects in the images:
1. In the first image (i.e. the 5 newborn cats) the model detected only one cat and a dog! If we look at this picture, we can see that the cats are patially occluded, this may be the main reason for this failure. 
2. The second image (i.e. the two cats setting on thei chair) has illumination issues, this lead the model to detect only the cats' chair, without detecting the two cats setting on it!
3. The third image (i.e. the two ice-cream bowls on patterned table) has textured background, and if we look carfully to the model's result on this image we can see that it didn't detect the top spoon, this maight resulted from the textured background.


## Bonus 
<a id=part6_3></a>

Try improving the model performance over poorly recognized images by changing them. 
Describe the manipulations you did to the pictures.

In [8]:
#insert bonus code here

In [9]:
display_answer(hw2.answers.part6_bonus)



**Your answer:**


Write your answer using **markdown** and $\LaTeX$:
```python
# A code block
a = 2
```
An equation: $e^{i\pi} -1 = 0$

