<h1>YOLO V8 Model for Object Detection</h1>

- Official Github link of ultralytics: https://github.com/ultralytics/ultralytics?tab=readme-ov-file

<h2>What is YOLO?</h2>

- YOLO stands for <u><b>You Only Look Once</b></u>.
- It is a popular set of object detection models used for real-time object detection and classification in computer vision.
- It was Originally developed by Joseph Redmon, Ali Farhadi, and Santosh Divvala
- The model family belongs to one-stage object detection models that process an entire image in a single forward pass of a convolutional neural network (CNN).

![image.png](attachment:76818aff-306e-431a-b888-bd542d955db7.png)

<h2>Key Feature of YOLO</h2>

- The key feature of YOLO is its <b><u>single-stage detection approach</u></b>, which is designed to detect objects in real time and with high accuracy.
- Unlike two-stage detection models, such as R-CNN, that first propose regions of interest and then classify these regions.
- <b><u>YOLO processes the entire image in a single pass, making it faster and more efficient.</u></b>
- It uses <u><b>ANCHOR FREE DETECTIONS</b></u>, which is when an object detection model directly predicts the center of an object instead of the offset from a known anchor box.

<h2>YOLOV8 ARCHITECTURE AND DESIGN</h2>

```
NOTE: THERE IS NO OFFICIAL PAPER FOR YOLOV5, YOLOV8 TILL DATE.(6/05/2024)
```

THe YOLO architecture primarily has 3 components:
- Head
- Neck
- Backbone

<h3>Backbone Network</h3>

- YOLOv8 employs a feature-rich backbone network as its foundation.
- The network serves to <b><u>extract hierarchical features from the input image</u></b>, providing a comprehensive representation of the visual information. 
- YOLOv8 utilizes CSPDarknet53, a modified version of the Darknet architecture, as its backbone.
- This modification incorporates Cross Stage Partial networks, enhancing the learning capacity and efficiency.

<h3>Neck Architecture</h3>

- The architecture includes a novel neck structure, which is responsible for <b><u>feature fusion</b></u>.
- This is crucial for combining multi-scale information and improving the model’s ability to detect objects of varying sizes. 
- YOLOv8 introduces <b><u>PANet (Path Aggregation Network)</u></b>, a feature pyramid network that facilitates information flow across different scales.
- <u><b>PANet enhances the model’s ability to handle objects with diverse scales in a more effective manner.</b></u>

<h3>YOLO Head</h3>

- YOLOv8 retains the characteristic feature of the YOLO series – the YOLO head.
- This component generates predictions based on the features extracted by the backbone network and the neck architecture. 
- The YOLO head predicts bounding box coordinates, objectness scores, and class probabilities for each anchor box associated with a grid cell.
- The architecture uses anchor boxes to efficiently predict objects of different shapes and sizes.

<h2>YOLO Architecture</h2>

*As per code understanding of the open source community.

![image.png](attachment:7df6b101-bd6b-4d99-a3d9-9e331e2b3ff9.png)

The above layout was made by <a href = "https://github.com/ultralytics/ultralytics/issues/189">RangeKing on GitHub</a>

<h1>Working with YOLOV8 USING ULTRALYTICS AND ROBOFLOW</h1>

In [None]:
In order to start the code first install `ultralytics`

!pip install ultralytics


In [1]:
import ultralytics

In [2]:
ultralytics.checks()

Ultralytics YOLOv8.2.8  Python-3.10.0 torch-2.2.1+cpu CPU (Intel Core(TM) i5-10210U 1.60GHz)
Setup complete  (8 CPUs, 15.6 GB RAM, 51.8/281.6 GB disk)


In [3]:
from ultralytics import YOLO

# Load a model
# model = YOLO("yolov8n.yaml")  # build a new model from scratch
model = YOLO("yolov8m.pt")  # load a pretrained model (recommended for training)

# Use the model
model.train(data=r"D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\guns data\data.yaml", epochs=3)  # train the model

New https://pypi.org/project/ultralytics/8.2.35 available  Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.2.8  Python-3.10.0 torch-2.2.1+cpu CPU (Intel Core(TM) i5-10210U 1.60GHz)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov8m.pt, data=D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\guns data\data.yaml, epochs=3, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train8, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False

[34m[1mtrain: [0mScanning D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\guns data\trai[0m
[34m[1mval: [0mScanning D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\guns data\valid\[0m


Plotting labels to D:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001667, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005), 83 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mD:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8[0m
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/3         0G      2.263      4.593      2.373         33        640: 100%|██████████| 1/1 [00:54<00:00, 54.81s/
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:07<00:

                   all          4          7       0.05      0.714      0.155     0.0885






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        2/3         0G      2.381      4.961      2.321         28        640: 100%|██████████| 1/1 [00:47<00:00, 47.90s/
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:04<00:

                   all          4          7     0.0676      0.714      0.191     0.0976






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        3/3         0G      1.923      4.096      1.894         32        640: 100%|██████████| 1/1 [00:46<00:00, 46.26s/
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:04<00:

                   all          4          7     0.0847      0.714      0.234      0.117






3 epochs completed in 0.052 hours.
Optimizer stripped from D:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8\weights\last.pt, 52.0MB
Optimizer stripped from D:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8\weights\best.pt, 52.0MB

Validating D:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8\weights\best.pt...
Ultralytics YOLOv8.2.8  Python-3.10.0 torch-2.2.1+cpu CPU (Intel Core(TM) i5-10210U 1.60GHz)
Model summary (fused): 218 layers, 25840918 parameters, 0 gradients, 78.7 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:05<00:


                   all          4          7     0.0862      0.714      0.234      0.117
               HandGun          4          7     0.0862      0.714      0.234      0.117
Speed: 2.7ms preprocess, 1236.9ms inference, 0.0ms loss, 162.3ms postprocess per image
Results saved to [1mD:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train8[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x00000202066FF880>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.0480

In [4]:
metrics = model.val()  # evaluate model performance on the validation set

Ultralytics YOLOv8.2.8  Python-3.10.0 torch-2.2.1+cpu CPU (Intel Core(TM) i5-10210U 1.60GHz)
Model summary (fused): 218 layers, 25840918 parameters, 0 gradients, 78.7 GFLOPs


[34m[1mval: [0mScanning D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\guns data\valid\[0m
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:05<00:


                   all          4          7     0.0862      0.714      0.234      0.117
               HandGun          4          7     0.0862      0.714      0.234      0.117
Speed: 3.7ms preprocess, 1194.3ms inference, 0.0ms loss, 148.3ms postprocess per image
Results saved to [1mD:\Innomatics Research Labs\Vaibhav Saran\runs\detect\train82[0m


In [5]:
results = model("https://wallpapershome.com/images/pages/pic_h/1722.jpg")  # predict on an image
# path = model.export(format="onnx")  # export the model to ONNX format


Found https://wallpapershome.com/images/pages/pic_h/1722.jpg locally at 1722.jpg
image 1/1 D:\Innomatics Research Labs\Vaibhav Saran\RESOURCES\Notes\6) Deep Learning\DL Phase 2\Code\1722.jpg: 384x640 1 HandGun, 624.3ms
Speed: 9.0ms preprocess, 624.3ms inference, 7.0ms postprocess per image at shape (1, 3, 384, 640)


In [6]:
results[0].boxes

ultralytics.engine.results.Boxes object with attributes:

cls: tensor([0.])
conf: tensor([0.3774])
data: tensor([[4.0458e+02, 8.1714e+02, 1.3178e+03, 1.0579e+03, 3.7742e-01, 0.0000e+00]])
id: None
is_track: False
orig_shape: (1080, 1920)
shape: torch.Size([1, 6])
xywh: tensor([[861.2056, 937.5013, 913.2523, 240.7156]])
xywhn: tensor([[0.4485, 0.8681, 0.4757, 0.2229]])
xyxy: tensor([[ 404.5794,  817.1436, 1317.8317, 1057.8591]])
xyxyn: tensor([[0.2107, 0.7566, 0.6864, 0.9795]])

In [7]:
import cv2

img = cv2.imread(r"1722.jpg")
cv2.rectangle(img,(404,817),(1317-404,1057-817),(255,0,0),5)
cv2.imwrite("Gun.jpg",img)

True