# CSF AI Development Intern S2024

## 1. Processing Video Pipeline
Assume you have a working ML model that can process individual images and identify carrots, how would you adapt that model such that you could feed it live video inside a grocery store and have it create a record of any carrots it sees.

### Solution
In short, you need to feed frames of your live video to the trained image classification model at regular intervals during the broadcast. The static frames (images) are classified by the trained model during the live feed, providing live image classification. 

The interval by which frames are extracted and fed to the model during the broadcast will determine how precisely the model is able to perform live image classification - the smaller the interval (frames are extracted and fed very often, e.g. <1s) will allow the model to identify subjects (carrots) closer to real-time than larger intervals. 

Concretely, video capture can be performed using the opencv library (cv2 in python), and frames have to be preprocessed in the same way the training set was preprocessed for the classification model (e.g. resizing, normalization, mean subtraction, changing to tensor format, etc.). Each detection record should include metadata such as: timestamp, bounding box coordinates, confidence scores, detection ID, frame number, etc. This can be in JSON format. 

If objects need to be 'remembered' in the live feed (i.e. we need to keep track of the number of distinct carrots) duplicate detection handling is necessary. Given that we are performing object detection on static images at certain intervals of the live feed (let's say <1s), the same object can be detected multiple times in different images within a few seconds. In such a case, our model detects multiple carrots, despite it actually being the same carrot in multiple frames all taken around the same point in time. As this case is generally likely - consecutive frames will often contain the same carrot(s) - you need to maintain a cache of your recent detections and their bounding boxes (their positions) & their timestamps. Then you compare future object detections to those in the cache, compare spatial proximity and temporal proximity, and if they fall within user-determined bounds it is defined as the already detected object. 

## Demo
Write a toy implementation of whatever machine learning concept you would like in order to demonstrate your skills. This doesn't need to be in the notebook if you want to use something other than python.

The problems we work on are wholly related to classfication, so your toy implementation should show knowledge of the fundamentals of classification problems.

### Solution
I've actually already implemented an image classification model from scratch (i.e. self-implemented without using ML libraries (TensorFlow, Pytorch, Jax, etc.) in this repository: https://github.com/M-Hardy/MLFS

The repo includes linear and logistic regression model implementations and an MLP classifier achieving >90% accuracy trained on the MNIST dataset. Also features modules for data handling, model testing, model visualization, and IO operations. 