# Object Detection for ML MOOC Challenge
## Object Detection using YOLOv8, Roboflow and Google Colab
#### Author: Julia Bunescu
---

**Table of contents**<a id='toc0_'></a>     
- [The Challenge](#toc1_2_)    
- [Model Selection](#toc1_3_)    
- [Testing the Pretrained Model](#toc1_4_)    
- [Custom Training dataset](#toc1_5_)    
  - [Preparing the dataset](#toc1_5_1_)    
  - [Importing the Dataset](#toc1_5_2_)    
  - [Training the model](#toc1_5_3_)    
  - [Validating the model](#toc1_5_4_)    
  - [Testing the model](#toc1_5_5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_2_'></a>[The Challenge](#toc0_)

We are going to be identifying objects within a series of images. The objects we need to identify are:

	- Phone, 
	- Laptop, 
	- USB stick, 
	- Keyboard
	- Router, 
	- Keys, 
	- Server rack, 
	- Mouse

The goals are:
- [x]  The objective would be to identify these items explicitly in any image.
- [x]  A bonus goal would be to determine the number of each of these items in a picture.
- [ ]  Final super-bonus objective would be identifying other items outside of the list and reporting back. (*future work*)

## <a id='toc1_3_'></a>[Model Selection](#toc0_)

The model chosen for the implementation was YOLO. The latest version of the model is called [YOLOv8](https://github.com/ultralytics/ultralytics) and it was chosen for because it is fast, accurate, and easy to implement in Python.

In [3]:
# checking and installing missing packages
%pip install ultralytics
%pip install IPython

from IPython import display
display.clear_output()

import ultralytics
ultralytics.checks()

Ultralytics YOLOv8.0.105  Python-3.10.4 torch-2.0.1+cpu CPU
Setup complete  (16 CPUs, 15.4 GB RAM, 452.9/934.7 GB disk)


## <a id='toc1_4_'></a>[Testing the Pretrained Model](#toc0_)

Using a pretrained YOLO model to check the performance on the test image. The model was trained on the [COCO](https://cocodataset.org/#home) dataset, and is able to distinguish between 80 object categories.

In [4]:
from ultralytics import YOLO

# Load a model
model = YOLO("../models/yolov8n.pt")  # load a pretrained model (recommended for training)

# Use the model
results = model.predict(source="../test_images/Test Example.png", conf=0.6, save=True, retina_masks=True) 



image 1/1 C:\Users\iulia\Documents\Projects\machine-learning-small-projects\test_images\Test Example.png: 416x640 1 person, 1 apple, 3 mouses, 182.1ms
Speed: 7.4ms preprocess, 182.1ms inference, 12.1ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1mruns\detect\predict3[0m


**Results**

Using the pretrained model with a minimum confidence level of 60%, we are able to detect some of the target objects, but not all of them. We can see the results of the detection here:
<center><img src="../results_images/challenge_test_default_model.png" width="800" /><center/>

## <a id='toc1_5_'></a>[Custom Training dataset](#toc0_)

### <a id='toc1_5_1_'></a>[Preparing the dataset](#toc0_)

A custon dataset was created using [Roboflow](https://universe.roboflow.com/iulia-bunescu-vldcs/specific-electronics-challenge-v2).
The images for the this dataset were gathered using [Fatkun Batch Download Image](https://microsoftedge.microsoft.com/addons/detail/fatkun-batch-download-ima/dammmokdamnimedflemdaoamhldmldff?hl=en-US) Edge extension, where .png and .jpg formats were selected. The results were also filtered decreasingly by size.

After several iterations, the final version of the datset was preprocessed and augmented as following: resized to fit within $1240$ x $720$ with white edges, added rotation between $-15\degree$ and $15\degree$, and $25\%$ of the training datat was grayscaled.

The distribution of the labels on the final dataset can be seen below.

<center><img src="../results_images/label_distribution.jpg" width="400" /><center/>

### <a id='toc1_5_2_'></a>[Importing the Dataset](#toc0_)

In [1]:
%pip install roboflow --quiet

Note: you may need to restart the kernel to use updated packages.


The api key below was removed due to privacy issues. Please go ahead and export the public model by generating your own key [here](https://universe.roboflow.com/iulia-bunescu-vldcs/specific-electronics-challenge-v2/dataset/7).

In [47]:
from roboflow import Roboflow

# download the custom dataset from Roboflow
rf = Roboflow(api_key="$$$$$$$$$$$$$$")
project = rf.workspace("iulia-bunescu-vldcs").project("specific-electronics-challenge-v2")
dataset = project.version(7).download("yolov8")

loading Roboflow workspace...
loading Roboflow project...
Dependency ultralytics<=8.0.20 is required but found version=8.0.105, to fix: `pip install ultralytics<=8.0.20`
Downloading Dataset Version Zip in Specific-Electronics-Challenge-v2-7 to yolov8: 100% [148027312 / 148027312] bytes


Extracting Dataset Version Zip to Specific-Electronics-Challenge-v2-7 in yolov8:: 100%|██████████| 4398/4398 [00:03<00:00, 1119.38it/s]


### <a id='toc1_5_3_'></a>[Training the model](#toc0_)

Training was done using **Google Colab**. The platform was chosen due to its GPU feature. 

Below you can find the commands used for training. As it can be seen, the model was trained for just *30 epochs*, which can be of course increased to obtain better results. However, the obtained model satisfies the requirements of this application, hence the training was stopped there.

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("../models/yolov8n.pt")  # load a pretrained model (recommended for training)

# Use the model to train on the custon
model.train(data='/content/electronics-part-2-7/data.yaml', epochs =30, save_period = 1, imgsz =[720,1240])

### <a id='toc1_5_4_'></a>[Validating the model](#toc0_)

Validation was done after training. Below you can find some of the predictions made in the process:

<center><img src="../results_images/val_batch0_pred.jpg" width="50%" /> <img src="../results_images/val_batch1_pred.jpg" width="50%" /><center/>

Below you can also see the results of the validating metrics. While the model behaves well for most of the object classes, *phones, servers, and keys* have room for improvement.

<center><img src="../results_images/confusion_matrix_normalized.png" width="800"/> <center/>
<center> <img src="../results_images/PR_curve.png" width="800"/><center/>

### <a id='toc1_5_5_'></a>[Testing the model](#toc0_)

Testing the model 3 images: first the test image from the challenge, then a custom image contaning objects from classes which are not in the first, and lastly a more organic environment contaning a laptop setup.

In [58]:
# Load the pretrained model
model = YOLO("../models/dataset2_7.pt") 

In [61]:
# Challenge image
results = model.predict(source="../test_images/Test Example.png", conf=0.5, save=True, retina_masks=True) 


image 1/1 C:\Users\iulia\Documents\Projects\machine-learning-small-projects\test_images\Test Example.png: 800x1248 3 mouses, 4 servers, 1 usb stick, 295.1ms
Speed: 11.0ms preprocess, 295.1ms inference, 2.0ms postprocess per image at shape (1, 3, 1248, 1248)
Results saved to [1mruns\detect\predict34[0m


<center><img src="../results_images/challenge_test_trained_model.png" width="800" /><center/>

In [66]:
# Custom made image
results = model.predict(source="../test_images/test_2.png", conf=0.5, save=True, retina_masks=True) 


image 1/1 C:\Users\iulia\Documents\Projects\machine-learning-small-projects\test_images\test_2.png: 704x1248 1 keyboard, 1 keys, 1 phone, 1 router, 263.2ms
Speed: 10.0ms preprocess, 263.2ms inference, 1.0ms postprocess per image at shape (1, 3, 1248, 1248)
Results saved to [1mruns\detect\predict34[0m


<center><img src="../results_images/test_2.png" width="800" /><center/>

In [68]:
# Organic scenary image
results = model.predict(source="../test_images/Laptop-Desk-Setup.jpg", conf=0.5, save=True, retina_masks=True) 


image 1/1 C:\Users\iulia\Documents\Projects\machine-learning-small-projects\test_images\Laptop-Desk-Setup.jpg: 768x1248 1 keyboard, 1 laptop, 1 mouse, 285.6ms
Speed: 12.0ms preprocess, 285.6ms inference, 1.0ms postprocess per image at shape (1, 3, 1248, 1248)
Results saved to [1mruns\detect\predict34[0m


<center><img src="../results_images/Laptop-Desk-Setup.jpg" width="800" /><center/>