### SAIC-Cambridge: Distributed AI Group
## ML Engineer - Coding Challenge
Thank you for taking the time to complete this programming assignment and congratulations for reaching this stage of the interview process! 


This assignment is based on `PyTorch` and consists of 5 tasks, all based on the same *object detection* model, provided below. 
The model is obtained from `torchvision` and all code provided is just for reference purposes. You are free to change any parts of the given code, while implementing the requested features.

Keep in mind that:
- High-quality software architecture and coding practices are of high priority. Consider that the code would be used across projects in SAIC-C.
- Detection performance, end-to-end inference latency and efficiency are all of utmost importance across tasks.
- Don't forget to document any assumptions and design choices that you made and for each solution that you provide, as well as comment on its limitations.
- In case you experiment with multiple solutions for a given task, feel free to discuss all the attempts and the underlying trade-offs on your report. 

>The deliverable of this assignment is a .zip file that should include:
>- an interactive report based on a single Jupyter Notebook where all solutions/results will be clearly documented and can be reproduced; and
>- a directory with the source code that implements the assignment, following a structure of your choice.


#### Referene Model

In [None]:
import torch
import torchvision.transforms as T
from torchvision.models import detection
from torchvision.utils import draw_bounding_boxes
from torchvision.transforms.functional import to_pil_image
from torchvision.io.image import read_image

import cv2
import numpy as np
from PIL import Image 


In [None]:
dev= "cuda" if torch.cuda.is_available() else "cpu"

model = detection.fasterrcnn_resnet50_fpn(pretrained=True)
model = model.to(dev)
model.eval()


transforms = []
transforms.append(T.ToTensor())
#transforms.append(T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]))
transforms = T.Compose(transforms)

In [None]:
import wget
wget.download("https://alk15.github.io/home/files/img1.jpg")

import matplotlib.pyplot as plt


x = Image.open('img1.jpg').convert("RGB")
x = transforms(x)
x = x.unsqueeze(0) 
x = x.to(dev)

#Run Inference
with torch.no_grad():
    prediction = model(x)[0]

scores = prediction["scores"].cpu().numpy()
print('Scores:', scores)

img = read_image("img1.jpg")
box = draw_bounding_boxes(img, boxes=prediction["boxes"],
                          colors="red",
                          width=1)
im = to_pil_image(box.detach())
plt.imshow(im)


#### Task1:
For the first task, the provided object detection model should be applied *on all frames* of a video. The model’s predictions need to be *post-processed* to return the bounding box of the **“main” person** in each frame. 

The video file for this task can be downloaded from: 
>https://alk15.github.io/home/files/london-walk.mp4


For the target application of the whole assignment, the output resolution should be `(WxH)=427x240`.

Don’t forget that end-to-end latency plays an important role in this assignment.

#### Task2:
For this task, you will need to quantise the above model to a precision that would maximise efficiency without significant accuracy degradation.

There is no expectation to perform any fine-tuning on the model. 
If you experiment with different quantisation schemes, feel free to report all of them.


#### Task3:
Here you should evaluate the impact of the quantisation scheme you applied in the previous task, in terms of all the aspects of model deployment that have been affected. 
Your analysis can be reported at the presentation format of your choice.



#### Task4:
For this task, wrap the above model so it can read incoming frames from a *webcam stream* and broadcast the results using the *MQTT protocol*. The provided solution should be able to work out-of-the-box across different systems. Inference latency remains important.


#### Task5:
For the final task, develop a client-server distributed system, where the results of the detection model are transmitted to the server (residing on the same machine). On the server side, the object detection results should be visualised through a web interface of your choice. Fancy graphic design is not required. 
