# YOLOv8 Object Detection – Complete Beginner Walkthrough 

This notebook explains and implements **YOLOv8 (You Only Look Once)** for object detection.  

We will cover:  

- What is YOLO? Why use it?  
- Difference between YOLO and traditional OpenCV Haar Cascades  
- Pretrained models & datasets (COCO)  
- Installing and setting up Ultralytics  
- Running detection on images and videos  
- Drawing bounding boxes with OpenCV  


#  What is YOLO (You Only Look Once)?

- **YOLO** is a state-of-the-art, real-time object detection algorithm.  
- It divides an image into grids and predicts bounding boxes & class probabilities in a **single forward pass**.  
- That’s why it’s **fast and accurate** → suitable for real-time detection.  

> Use cases: self-driving cars, surveillance, medical imaging, quality inspection in factories, etc.


#  YOLO vs Haar Cascade (OpenCV)

- **Haar Cascades**:  

  - Old method in OpenCV for object detection (e.g., face detection).  
  - Works by scanning the image with sliding windows & handcrafted features.  
  -  Limitations → only works well for simple tasks like faces; not robust for complex real-world detection.

- **YOLO (Deep Learning-based)**: 
                                  
  - Learns automatically from huge datasets (like COCO).  
  -  Handles multiple objects, complex scenes, variations in lighting, orientation, etc.  
  - Much more accurate & flexible than Haar Cascades.  

> Conclusion: For modern applications → **YOLO is far better**.



#  Pretrained Models and Datasets

- Ultralytics provides **pretrained YOLO models** on the **COCO dataset** (80 object classes: person, car, dog, etc.).  
- Different YOLOv8 model sizes: 

  - `yolov8n.pt` → Nano (fastest, lightest)  
  - `yolov8s.pt` → Small  
  - `yolov8m.pt` → Medium  
  - `yolov8l.pt` → Large  
  - `yolov8x.pt` → Extra large (most accurate but heavy)  

You can start with pretrained weights and fine-tune them on your custom dataset later.



#  Setup Environment
**This checks:**

- Python version  
- Torch (PyTorch) version  
- GPU/CPU availability  

In [None]:
# Install YOLO library
!pip install ultralytics

In [1]:
# Import and check environment
import ultralytics
ultralytics.checks()

Ultralytics 8.3.196  Python-3.13.5 torch-2.8.0+cpu CPU (Intel Core i3-5005U 2.00GHz)
Setup complete  (4 CPUs, 7.9 GB RAM, 139.2/237.8 GB disk)


# Load a Pretrained YOLO Model

In [2]:
from ultralytics import YOLO
import numpy as np

# Load a pretrained YOLOv8 Nano model
model = YOLO("yolov8n.pt", "v8")

# Run Prediction on an Image

- `conf=0.25` → Minimum confidence threshold  
- `save=True` → Saves results with bounding boxes to `runs/detect` folder  


In [3]:
detection_output = model.predict(
    source=r"C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\Test_Image_and_Video\Image.JPG",
    conf=0.25,
    save=True
)

# Display raw tensor output
print(detection_output)

# Convert first detection result to NumPy
print(detection_output[0].numpy())


image 1/1 C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\Test_Image_and_Video\Image.JPG: 416x640 30 persons, 10 cars, 1 bus, 3 trucks, 4 traffic lights, 1061.3ms
Speed: 55.6ms preprocess, 1061.3ms inference, 26.7ms postprocess per image at shape (1, 3, 416, 640)
Results saved to [1mC:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\runs\detect\predict[0m
[ultralytics.engine.results.Results object with attributes:

boxes: ultralytics.engine.results.Boxes object
keypoints: None
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee',

## YOLOv8 Detection Output (Image Detection)

This is the detection result for Uploaded Image:

![YOLO Image Result](Output/Image.jpg)


# Load Class Labels and Colors

In [4]:
import random

# Load COCO class names
with open(r"C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\coco.txt", "r") as f:
    class_list = f.read().split("\n")

# Assign random colors for bounding boxes
detection_colors = [(random.randint(0,255), random.randint(0,255), random.randint(0,255)) for _ in class_list]

# Run YOLO detection on a different image
results = model.predict(
    source=r"C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\Test_Image_and_Video\Image_1.JPG",
    conf=0.25,
    save=True
)



image 1/1 C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\Test_Image_and_Video\Image_1.JPG: 448x640 1 cup, 1 tv, 1 mouse, 1 keyboard, 2 books, 681.0ms
Speed: 14.7ms preprocess, 681.0ms inference, 12.0ms postprocess per image at shape (1, 3, 448, 640)
Results saved to [1mC:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\runs\detect\predict[0m


## YOLOv8 Detection Output (Image Detection)

This is the detection result for **Any Image** using COCO class labels:

![YOLO Image Result](Output/Image_1.jpg)


# Object Detection on Video

In [11]:
import cv2
import random
from ultralytics import YOLO

# Load YOLO model
model = YOLO("yolov8n.pt", "v8")

# Load COCO classes
with open(r"C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\coco.txt", "r", encoding="utf-8") as f:
    class_list = f.read().splitlines()

# Assign random unique colors for each class
detection_colors = [(random.randint(0,255), random.randint(0,255), random.randint(0,255)) for _ in class_list]

# Open a video file
cap = cv2.VideoCapture(r"C:\Users\Lenovo\OneDrive\Desktop\Python Everyday work\Github work\Computer_Vision\YOLO\Test_Image_and_Video\Video.mp4")

if not cap.isOpened():
    print("Cannot open video")
    exit()

# Create a resizable window with minimize/maximize/close buttons
cv2.namedWindow("YOLOv8 Object Detection", cv2.WINDOW_NORMAL)

# Optionally maximize window (depends on OS support)
cv2.setWindowProperty("YOLOv8 Object Detection", cv2.WND_PROP_AUTOSIZE, cv2.WINDOW_NORMAL)

while True:
    ret, frame = cap.read()
    if not ret:
        print("End of stream")
        break

    # Run YOLO prediction
    results = model.predict(source=[frame], conf=0.45, save=False)
    detections = results[0].numpy()

    if len(detections) != 0:
        for i in range(len(results[0])):
            boxes = results[0].boxes
            box = boxes[i]  
            clsID = int(box.cls.numpy()[0])
            conf = box.conf.numpy()[0]
            bb = box.xyxy.numpy()[0]

            # Different color for each object class
            color = detection_colors[clsID]

            # Draw bounding box
            cv2.rectangle(frame, (int(bb[0]), int(bb[1])), (int(bb[2]), int(bb[3])), color, 2)

            # Put class name + confidence
            cv2.putText(frame, f"{class_list[clsID]} {round(conf,2)}", 
                        (int(bb[0]), int(bb[1]) - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2, cv2.LINE_AA)

    # Show video in a resizable window (with minimize/maximize/close)
    cv2.imshow("YOLOv8 Object Detection", frame)

    # Exit on ESC key
    if cv2.waitKey(1) == 27:
        break

cap.release()
cv2.destroyAllWindows()



0: 384x640 5 persons, 2 chairs, 1 laptop, 588.6ms
Speed: 17.5ms preprocess, 588.6ms inference, 13.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 persons, 2 chairs, 1 laptop, 511.7ms
Speed: 15.8ms preprocess, 511.7ms inference, 13.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 persons, 2 chairs, 1 laptop, 535.4ms
Speed: 12.5ms preprocess, 535.4ms inference, 13.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 2 chairs, 1 laptop, 611.9ms
Speed: 17.4ms preprocess, 611.9ms inference, 11.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 2 chairs, 4 laptops, 585.3ms
Speed: 18.8ms preprocess, 585.3ms inference, 12.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 2 chairs, 1 laptop, 461.9ms
Speed: 16.3ms preprocess, 461.9ms inference, 11.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 2 chairs, 2 laptops, 526.2ms
Speed: 18.5ms preprocess, 526.2ms inferen

## YOLOv8 Detection Output (Video Detection)

This is the detection result for the video (screenshot saved as **Video.png**):

![YOLO Video Result](Output/Video.PNG)


#  Conclusion – Day82 YOLOv8 Project

In this notebook, we successfully:

-  Installed and set up **Ultralytics YOLOv8**  
-  Compared YOLO with Haar Cascades (OpenCV)  
-  Used **pretrained models on COCO dataset**  
-  Detected objects in **images** and **videos**  
-  Displayed results directly inside the notebook  

## Key Insights:

- YOLOv8 is **faster and more accurate** than traditional methods like Haar Cascades.  
- Pretrained models on COCO allow instant use for 80+ classes.  
- With simple code, YOLOv8 can handle both **images** and **video streams**.  

##  Next Steps:

- Try YOLOv8 with **live webcam detection**  
- Fine-tune YOLOv8 on a **custom dataset** for your own project  
- Explore advanced tasks like **segmentation** and **pose estimation**  

Thanks for following along! 
