## What is Object Detection?

**Object Detection** is a computer vision technique that identifies and locates objects within an image or video. It not only classifies objects (e.g., person, car, dog) but also determines their positions through bounding boxes.
<br>
<img src="od.png" alt="Object Detection Example" width="600">

## Transfer Learning in Object Detection

**Transfer Learning** is a technique where a model developed for a particular task is reused as the starting point for a model on a second task. In the context of object detection, transfer learning allows us to leverage pre-trained models, which have already been trained on large datasets, to detect objects in new datasets with fewer labeled examples.

### Pre-trained Models

**Pre-trained Models** are models that have been previously trained on a large dataset, like ImageNet, and can be fine-tuned for a specific task. These models have learned rich feature representations that can be adapted for new tasks, reducing the need for extensive computational resources and time.

### Why Use Transfer Learning?

- **Efficiency**: Reduces the amount of data and time needed to train a model from scratch.
- **Performance**: Often leads to better performance, especially with limited data, as the model benefits from previously learned features.
- **Flexibility**: Can be applied to different tasks, such as classification, object detection, or segmentation, by fine-tuning the final layers of the model.

By using transfer learning with pre-trained models, we can build accurate and efficient object detection systems, even with limited data.

<img src="transfer_learning.jpg" alt="Object Detection Example" width="600">


## The YOLO Family of Models

**YOLO (You Only Look Once)** is a family of real-time object detection models known for their speed and accuracy. YOLO models are designed to detect objects in images or videos in a single pass, making them highly efficient for tasks requiring fast inference.

### Tasks YOLO is Used For

- **Object Detection**: Identifying and locating objects within an image or video.
- **Object Tracking**: Continuously locating objects across multiple frames in a video.
- **Image Segmentation**: Classifying each pixel of an image into a particular object or background.

### Versions of YOLO

YOLO has evolved through multiple versions, each bringing improvements in speed, accuracy, and efficiency:

- **YOLOv1 to YOLOv3**: Early versions focused on improving detection speed and accuracy, introducing anchor boxes and multi-scale predictions.
- **YOLOv4**: Enhanced performance with better backbone networks, mosaic augmentation, and other tricks.
- **YOLOv5**: Introduced easier usage, faster inference, and smaller model sizes, gaining widespread adoption.
- **YOLOv6 and YOLOv7**: Further optimizations in speed and accuracy, with better handling of edge cases.
- **YOLOv8**: The latest version, featuring advanced architectures, better training pipelines, and a range of model sizes for different use cases.
- **YOLOv10**: The latest version

### YOLO Model Sizes

Each YOLO version offers models in various sizes to balance the trade-off between speed and accuracy:

- **YOLOv8n (Nano)**: Smallest and fastest, ideal for real-time applications on edge devices with limited computational power.
- **YOLOv8s (Small)**: A slightly larger model with improved accuracy, suitable for devices with moderate computational resources.
- **YOLOv8m (Medium)**: Balanced model size, offering a good mix of speed and accuracy for general-purpose tasks.
- **YOLOv8l (Large)**: Large model with the high accuracy, best suited for applications where accuracy is prioritized over speed.

### YOLOv8n Model

In this tutorial, we will be using the **YOLOv8n (Nano)** model. This model is the smallest and fastest in the YOLOv8 family, making it ideal for real-time object detection on resource-constrained devices while maintaining reasonable accuracy.

By understanding the YOLO family and the different versions and sizes available, you can choose the right model for your specific task and computational environment.


<img src="yolov8.jpg" alt="Object Detection Example" width="600">


## License
------sdkflksfd-------

In [11]:
# Package Installation
# !pip install ultralytics
# !pip install opencv-python

In [5]:
from ultralytics import YOLO
import cv2
import math 

In [9]:
# load the model - automatically downloads pretrained weight
model = YOLO("yolov8n.pt")
# Classes this model is trained on
print(model.names)

{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microw

In [8]:
# object classes list defined
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
              "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
              "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
              "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
              "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
              "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
              "carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
              "diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
              "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
              "teddy bear", "hair drier", "toothbrush"
              ]

In [11]:
# start webcam
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)


while True:
    success, img = cap.read()
    results = model(img, stream=True)

    # coordinates
    for r in results:
        boxes = r.boxes

        for box in boxes:
            # bounding box
            x1, y1, x2, y2 = box.xyxy[0]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values

            # put box in cam
            cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)

            # confidence
            confidence = math.ceil((box.conf[0]*100))/100
            print("Confidence --->",confidence)

            # class name
            cls = int(box.cls[0])
            print("Class name -->", classNames[cls])

            # object details
            org = [x1, y1]
            font = cv2.FONT_HERSHEY_SIMPLEX
            fontScale = 1
            color = (255, 0, 0)
            thickness = 2

            cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)

    cv2.imshow('Webcam', img)
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


0: 480x640 1 person, 420.2ms
Confidence ---> 0.87
Class name --> person
Speed: 18.9ms preprocess, 420.2ms inference, 27.6ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 52.2ms
Confidence ---> 0.84
Class name --> person
Speed: 26.1ms preprocess, 52.2ms inference, 4.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 24.0ms
Confidence ---> 0.84
Class name --> person
Speed: 4.0ms preprocess, 24.0ms inference, 4.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 24.0ms
Confidence ---> 0.85
Class name --> person
Speed: 4.1ms preprocess, 24.0ms inference, 5.1ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 27.0ms
Confidence ---> 0.87
Class name --> person
Speed: 4.9ms preprocess, 27.0ms inference, 2.9ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 28.0ms
Confidence ---> 0.88
Class name --> person
Speed: 4.0ms preprocess, 28.0ms inference, 4.0ms postprocess per image at shap


0: 480x640 1 person, 18.1ms
Confidence ---> 0.84
Class name --> person
Speed: 2.0ms preprocess, 18.1ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 15.0ms
Confidence ---> 0.84
Class name --> person
Speed: 2.1ms preprocess, 15.0ms inference, 4.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 15.0ms
Confidence ---> 0.84
Class name --> person
Speed: 2.0ms preprocess, 15.0ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 17.0ms
Confidence ---> 0.81
Class name --> person
Speed: 2.0ms preprocess, 17.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 15.4ms
Confidence ---> 0.84
Class name --> person
Speed: 1.9ms preprocess, 15.4ms inference, 3.4ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 17.0ms
Confidence ---> 0.84
Class name --> person
Speed: 2.0ms preprocess, 17.0ms inference, 2.0ms postprocess per image at shape (1,


0: 480x640 1 person, 13.0ms
Confidence ---> 0.85
Class name --> person
Speed: 2.9ms preprocess, 13.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 13.0ms
Confidence ---> 0.83
Class name --> person
Speed: 2.0ms preprocess, 13.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 13.0ms
Confidence ---> 0.82
Class name --> person
Speed: 2.0ms preprocess, 13.0ms inference, 0.9ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 13.0ms
Confidence ---> 0.82
Class name --> person
Speed: 2.0ms preprocess, 13.0ms inference, 5.2ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.8
Class name --> person
Speed: 2.0ms preprocess, 12.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.81
Class name --> person
Speed: 1.0ms preprocess, 12.0ms inference, 2.0ms postprocess per image at shape (1, 


0: 480x640 1 person, 1 remote, 13.0ms
Confidence ---> 0.85
Class name --> person
Confidence ---> 0.57
Class name --> remote
Speed: 3.0ms preprocess, 13.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 11.5ms
Confidence ---> 0.86
Class name --> person
Confidence ---> 0.42
Class name --> remote
Speed: 2.9ms preprocess, 11.5ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 12.0ms
Confidence ---> 0.88
Class name --> person
Confidence ---> 0.61
Class name --> remote
Speed: 3.0ms preprocess, 12.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 13.5ms
Confidence ---> 0.9
Class name --> person
Confidence ---> 0.61
Class name --> remote
Speed: 2.9ms preprocess, 13.5ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 13.0ms
Confidence ---> 0.9
Class name --> person
Confidence ---> 0.72
Class name 

Confidence ---> 0.94
Class name --> person
Confidence ---> 0.56
Class name --> remote
Confidence ---> 0.3
Class name --> cup
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 1 remote, 10.0ms
Confidence ---> 0.94
Class name --> person
Confidence ---> 0.56
Class name --> remote
Confidence ---> 0.3
Class name --> cup
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 10.0ms
Confidence ---> 0.94
Class name --> person
Confidence ---> 0.63
Class name --> remote
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 10.0ms
Confidence ---> 0.93
Class name --> person
Confidence ---> 0.75
Class name --> remote
Speed: 2.0ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 remote, 10.7ms
Confidence ---> 0.92
Class 


0: 480x640 1 person, 12.0ms
Confidence ---> 0.88
Class name --> person
Speed: 0.9ms preprocess, 12.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.87
Class name --> person
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 11.0ms
Confidence ---> 0.89
Class name --> person
Confidence ---> 0.65
Class name --> cup
Speed: 0.9ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 11.0ms
Confidence ---> 0.87
Class name --> person
Confidence ---> 0.37
Class name --> cup
Speed: 0.9ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 bowl, 11.0ms
Confidence ---> 0.86
Class name --> person
Confidence ---> 0.27
Class name --> bowl
Speed: 1.9ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 

Confidence ---> 0.9
Class name --> person
Speed: 1.9ms preprocess, 9.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 9.0ms
Confidence ---> 0.89
Class name --> person
Speed: 2.0ms preprocess, 9.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.89
Class name --> person
Speed: 2.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.9
Class name --> person
Speed: 1.9ms preprocess, 12.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.92
Class name --> person
Speed: 2.0ms preprocess, 12.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.1ms
Confidence ---> 0.88
Class name --> person
Speed: 2.0ms preprocess, 10.1ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person


0: 480x640 1 person, 2 donuts, 11.0ms
Confidence ---> 0.94
Class name --> person
Confidence ---> 0.48
Class name --> donut
Confidence ---> 0.39
Class name --> donut
Speed: 2.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 hot dog, 1 cell phone, 11.0ms
Confidence ---> 0.9
Class name --> person
Confidence ---> 0.32
Class name --> hot dog
Confidence ---> 0.32
Class name --> cell phone
Speed: 3.1ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.9
Class name --> person
Speed: 2.2ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 donut, 10.0ms
Confidence ---> 0.92
Class name --> person
Confidence ---> 0.3
Class name --> donut
Speed: 2.1ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 9.0ms
Confidence ---> 0.93
Class name --> pe


0: 480x640 2 persons, 1 hot dog, 1 cell phone, 10.0ms
Confidence ---> 0.92
Class name --> person
Confidence ---> 0.61
Class name --> hot dog
Confidence ---> 0.38
Class name --> person
Confidence ---> 0.32
Class name --> cell phone
Speed: 2.0ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.92
Class name --> person
Speed: 2.0ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 9.0ms
Confidence ---> 0.92
Class name --> person
Speed: 2.0ms preprocess, 9.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.93
Class name --> person
Speed: 1.9ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.93
Class name --> person
Speed: 3.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 64


0: 480x640 1 person, 11.1ms
Confidence ---> 0.83
Class name --> person
Speed: 1.9ms preprocess, 11.1ms inference, 0.9ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.3ms
Confidence ---> 0.83
Class name --> person
Speed: 2.0ms preprocess, 12.3ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.5ms
Confidence ---> 0.87
Class name --> person
Speed: 2.0ms preprocess, 10.5ms inference, 1.1ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.88
Class name --> person
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 donut, 10.0ms
Confidence ---> 0.86
Class name --> person
Confidence ---> 0.59
Class name --> donut
Speed: 2.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 donut, 12.0ms
Confidence ---> 0.87
Class name --> person
Confidence ---> 0.63
Cl


0: 480x640 1 person, 11.1ms
Confidence ---> 0.9
Class name --> person
Speed: 1.0ms preprocess, 11.1ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.5ms
Confidence ---> 0.91
Class name --> person
Speed: 1.0ms preprocess, 12.5ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.9
Class name --> person
Speed: 2.0ms preprocess, 12.0ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.9
Class name --> person
Speed: 2.5ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.1ms
Confidence ---> 0.91
Class name --> person
Speed: 2.0ms preprocess, 10.1ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.5ms
Confidence ---> 0.91
Class name --> person
Speed: 0.9ms preprocess, 11.5ms inference, 1.0ms postprocess per image at shape (1, 3,

Confidence ---> 0.85
Class name --> person
Confidence ---> 0.28
Class name --> bowl
Speed: 2.5ms preprocess, 10.2ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 14.0ms
Confidence ---> 0.85
Class name --> person
Speed: 3.0ms preprocess, 14.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.87
Class name --> person
Speed: 2.5ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 bowl, 11.0ms
Confidence ---> 0.84
Class name --> person
Confidence ---> 0.27
Class name --> bowl
Speed: 2.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.0ms
Confidence ---> 0.83
Class name --> person
Speed: 1.0ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 bowl, 12.0ms
Confidence ---> 0.85
Class name --> person
Confidence ---


0: 480x640 1 person, 11.0ms
Confidence ---> 0.81
Class name --> person
Speed: 0.9ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.7ms
Confidence ---> 0.78
Class name --> person
Speed: 1.0ms preprocess, 11.7ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cell phone, 12.0ms
Confidence ---> 0.85
Class name --> person
Confidence ---> 0.33
Class name --> cell phone
Speed: 2.0ms preprocess, 12.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.9
Class name --> person
Speed: 1.0ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 12.0ms
Confidence ---> 0.9
Class name --> person
Confidence ---> 0.45
Class name --> cup
Speed: 1.9ms preprocess, 12.0ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 13.0ms
Confidence --

Class name --> person
Confidence ---> 0.7
Class name --> cup
Speed: 1.0ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 10.0ms
Confidence ---> 0.89
Class name --> person
Confidence ---> 0.77
Class name --> cup
Speed: 3.1ms preprocess, 10.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 10.0ms
Confidence ---> 0.89
Class name --> person
Confidence ---> 0.85
Class name --> cup
Speed: 1.9ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 10.5ms
Confidence ---> 0.89
Class name --> person
Confidence ---> 0.85
Class name --> cup
Speed: 0.9ms preprocess, 10.5ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 1 hot dog, 12.0ms
Confidence ---> 0.89
Class name --> person
Confidence ---> 0.87
Class name --> cup
Confidence ---> 0.27
Class name --> hot dog
Speed: 3.2ms preproc


0: 480x640 1 person, 1 cup, 13.0ms
Confidence ---> 0.92
Class name --> person
Confidence ---> 0.84
Class name --> cup
Speed: 3.0ms preprocess, 13.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 12.8ms
Confidence ---> 0.92
Class name --> cup
Confidence ---> 0.89
Class name --> person
Speed: 3.0ms preprocess, 12.8ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 11.0ms
Confidence ---> 0.86
Class name --> person
Confidence ---> 0.85
Class name --> cup
Speed: 2.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 cup, 1 donut, 10.0ms
Confidence ---> 0.83
Class name --> person
Confidence ---> 0.71
Class name --> cup
Confidence ---> 0.3
Class name --> donut
Speed: 2.0ms preprocess, 10.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 1 donut, 1 remote, 11.0ms
Confidence ---> 0.76
Class name --> pe

Confidence ---> 0.91
Class name --> person
Speed: 2.0ms preprocess, 15.1ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 10.5ms
Confidence ---> 0.9
Class name --> person
Speed: 2.4ms preprocess, 10.5ms inference, 1.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.92
Class name --> person
Speed: 2.0ms preprocess, 11.0ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.5ms
Confidence ---> 0.91
Class name --> person
Speed: 2.4ms preprocess, 11.5ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.9ms
Confidence ---> 0.91
Class name --> person
Speed: 2.1ms preprocess, 11.9ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.3ms
Confidence ---> 0.9
Class name --> person
Speed: 3.0ms preprocess, 12.3ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 per


0: 480x640 1 person, 11.9ms
Confidence ---> 0.93
Class name --> person
Speed: 1.0ms preprocess, 11.9ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.7ms
Confidence ---> 0.93
Class name --> person
Speed: 2.0ms preprocess, 12.7ms inference, 2.1ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.92
Class name --> person
Speed: 2.0ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.0ms
Confidence ---> 0.93
Class name --> person
Speed: 2.0ms preprocess, 11.0ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 11.2ms
Confidence ---> 0.93
Class name --> person
Speed: 2.0ms preprocess, 11.2ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 1 person, 12.0ms
Confidence ---> 0.94
Class name --> person
Speed: 1.9ms preprocess, 12.0ms inference, 2.0ms postprocess per image at shape (1,

## Custom YOLOv8 Model : Fine tuning pre-trained model

Before we start fine-tuning YOLOv8, we need the following prerequisites:

- **YOLOv8 Source Code/Architecture:** We need the ultralytics library installed, which has all the model architecture files and associated utilities &#x2714;
- **Pre-trained Weights:** Download the pre-trained YOLOv8 weights from the official repository to initialize your model &#x2714;
- **Training Data:** A well-annotated dataset with images and corresponding bounding box annotations for the objects of interest.
- **Configuration Files:** YOLOv8 relies on configuration files (.yaml) to specify model settings. Make sure we have the default configuration file and a modified version tailored to our dataset

### Hand Gesture Detection Dataset
Download Link : onedrive-link
<br>
<img src="dataset_format.png" align="left" alt="Object Detection Example" width="150">


How do we annotate an Image?

## labelImg: An Image Annotation Tool

**labelImg** is a graphical image annotation tool used for labeling objects in images to create datasets for object detection. Developed in Python with the PyQt5 library, it provides an easy-to-use interface for annotating images with bounding boxes and object classes.

### Key Features

- **User-Friendly Interface**: Allows users to quickly draw bounding boxes around objects and assign class labels.
- **Supports Multiple Formats**: Can save annotations in popular formats such as Pascal VOC XML and YOLO TXT, which are compatible with various object detection frameworks.
- **Image Navigation**: Facilitates the browsing and annotation of large datasets efficiently.
- **Custom Labels**: Users can define and use their own class labels for different objects.

### How It Works

1. **Load Images**: Open your image dataset in labelImg.
2. **Draw Bounding Boxes**: Click and drag to create bounding boxes around objects.
3. **Assign Labels**: Assign a class label to each bounding box.
4. **Save Annotations**: Export annotations in your preferred format for use in training object detection models.


In [13]:
# !pip install labelImg
# DEMO

In [14]:
# Prepared data.yaml files 

In [3]:
# from ultralytics import YOLO

# Load a model
model = YOLO('yolov8n.pt')  # Load a pretrained YOLOv8n model
yaml_file = "data.yaml"
# Train the model
results = model.train(
    data=yaml_file,  # path to the custom YAML file
    epochs=50,  # number of training epochs
    imgsz=640,  # image size (can be adjusted)
    batch=4,  # batch size
    name='custom_hand_gesture',  # name of the training run
    device=0  # set to 'cpu' or GPU index (e.g., 0)
)


Ultralytics YOLOv8.2.81  Python-3.9.13 torch-2.0.1 CUDA:0 (NVIDIA GeForce MX570 A, 2048MiB)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=yolov8n.pt, data=data.yaml, epochs=50, time=None, patience=100, batch=4, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=custom_hand_gesture, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=

[34m[1mtrain: [0mScanning C:\Users\Anshuman\OneDrive\Desktop\CodeProjects\Secondment\Secondment\datasets\Hand-Gesture-Recognition-6\train\labe[0m
[34m[1mval: [0mScanning C:\Users\Anshuman\OneDrive\Desktop\CodeProjects\Secondment\Secondment\datasets\Hand-Gesture-Recognition-6\valid\labels[0m


Plotting labels to runs\detect\custom_hand_gesture\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001111, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1mruns\detect\custom_hand_gesture[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/50     0.973G      1.259      3.171      1.671          7        640: 100%|██████████| 147/147 [00:35<00:00,  4.19it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  4.52

                   all        167        167      0.143      0.667      0.205       0.13






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/50     0.973G      1.309      2.735      1.686          4        640: 100%|██████████| 147/147 [00:29<00:00,  4.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:03<00:00,  5.70

                   all        167        167      0.181       0.52      0.244      0.155






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/50     0.971G      1.277      2.521      1.625         11        640: 100%|██████████| 147/147 [00:33<00:00,  4.40it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  4.83


                   all        167        167      0.562       0.24      0.254      0.135

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/50     0.973G       1.32      2.472      1.657          5        640: 100%|██████████| 147/147 [00:32<00:00,  4.50it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:03<00:00,  5.67

                   all        167        167      0.242      0.469      0.308      0.191






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/50     0.973G      1.279      2.339      1.627          5        640: 100%|██████████| 147/147 [00:32<00:00,  4.47it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  4.31

                   all        167        167      0.309      0.494       0.36      0.233






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/50     0.973G      1.245      2.189      1.605          8        640: 100%|██████████| 147/147 [00:33<00:00,  4.41it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:03<00:00,  5.32

                   all        167        167      0.378        0.5      0.417      0.276






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/50     0.973G      1.207      2.091      1.556          6        640: 100%|██████████| 147/147 [00:32<00:00,  4.54it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:03<00:00,  5.64

                   all        167        167      0.421      0.525      0.468      0.308






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/50     0.973G      1.248      2.054      1.585          8        640: 100%|██████████| 147/147 [00:30<00:00,  4.80it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  5.12

                   all        167        167      0.321      0.621      0.429      0.253






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/50     0.973G      1.237      1.977      1.561          8        640: 100%|██████████| 147/147 [00:31<00:00,  4.63it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:03<00:00,  6.07

                   all        167        167      0.404      0.638      0.498      0.339






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/50     0.973G       1.18      1.989       1.55          5        640: 100%|██████████| 147/147 [00:32<00:00,  4.56it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  5.19

                   all        167        167      0.415      0.655      0.523      0.363






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/50     0.971G      1.187       1.88      1.516          5        640: 100%|██████████| 147/147 [00:33<00:00,  4.39it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  5.22

                   all        167        167      0.457       0.65      0.577      0.418






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/50     0.971G      1.152      1.775      1.489          3        640: 100%|██████████| 147/147 [00:34<00:00,  4.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 21/21 [00:04<00:00,  4.66

                   all        167        167      0.502      0.553      0.544      0.381






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/50     0.971G       1.17      1.842      1.534          7        640:  22%|██▏       | 33/147 [00:07<00:26,  4.27it/s]


KeyboardInterrupt: 

# Q&A

--- EXTRA


## What is YOLO?

**YOLO (You Only Look Once)** is a groundbreaking object detection model introduced in the research paper "You Only Look Once: Unified, Real-Time Object Detection" by Joseph Redmon et al. YOLO revolutionized object detection by reframing the task as a single regression problem, rather than a series of classification and localization tasks.

### Key Concepts from the YOLO Research Paper

- **Unified Architecture**: YOLO employs a single neural network that processes the entire image in one pass. This network divides the image into a grid and predicts bounding boxes and probabilities for each grid cell.

- **Single-Stage Detection**: Unlike traditional object detectors that use a two-stage approach (region proposal followed by classification), YOLO is a single-stage detector. This means that it directly predicts the class probabilities and bounding boxes in one step, leading to faster inference times.

- **Real-Time Detection**: YOLO was designed with real-time applications in mind. Its architecture is optimized for speed, making it possible to process images at high frame rates, which is crucial for tasks like video surveillance and autonomous driving.

### YOLO Architecture Overview

1. **Input Image**: The input image is divided into an \( S \times S \) grid. Each grid cell is responsible for detecting objects whose center falls within it.

2. **Bounding Boxes**: For each grid cell, YOLO predicts a fixed number of bounding boxes. Each bounding box includes coordinates, width, height, and a confidence score indicating the likelihood of an object being present.

3. **Class Prediction**: Along with bounding boxes, YOLO predicts class probabilities for each grid cell, indicating what kind of object is present.

4. **Loss Function**: The YOLO model uses a custom loss function that penalizes both classification and localization errors. The loss function balances precision in detecting the correct objects with accurately predicting their locations.

### Advantages of YOLO

- **Speed**: YOLO's single-stage detection allows it to perform object detection much faster than two-stage detectors like R-CNN, making it suitable for real-time applications.
- **Global Context**: YOLO processes the entire image, which helps in understanding the global context and reduces false positives in predictions.
- **High Accuracy**: Despite its speed, YOLO maintains competitive accuracy with other state-of-the-art object detection models.

### Limitations

- **Localization Errors**: YOLO tends to make more localization errors, especially with small objects, compared to region-based methods.
- **Struggles with Close Objects**: YOLO may struggle with detecting objects that are very close to each other because it assigns one bounding box per grid cell.

### Evolution of YOLO

Since the original paper, YOLO has gone through several iterations (YOLOv2, YOLOv3, etc.), each improving on the speed, accuracy, and robustness of the model. The architecture has evolved, but the core principle of single-stage detection remains central to its design.

By understanding the fundamental architecture and principles of YOLO, you can appreciate why it has become a cornerstone in the field of object detection.
