
# Part 4 — Real‑Time Object Detection (YOLOv8, SSD, OpenCV DNN)

**Course:** CV Lab II — Machine Learning with OpenCV  
**Author:** _Tsion Bizuayehu  
**Last updated:** 2025-09-07 17:48 UTC  

In this notebook you will implement and compare **three** real-time object detection pipelines:

1. **YOLOv8 (Ultralytics API)** — easiest to start, great accuracy/speed.  
2. **MobileNet‑SSD (Caffe) via OpenCV DNN** — lightweight baseline.  
3. **YOLOv8 (ONNX) via OpenCV DNN** — framework‑agnostic deployment path.

You will run them on images, webcam, and videos; and benchmark FPS.

**What you’ll learn**
- Set up environment, verify CUDA, and manage models
- Run YOLOv8 with Ultralytics in a few lines
- Run MobileNet‑SSD with OpenCV DNN
- Export YOLOv8 → ONNX and run with OpenCV DNN (post‑processing + NMS)
- Measure FPS and save annotated videos

_Tip:_ If you’re in Jupyter **Lab/Notebook**, windows opened by OpenCV (`cv2.imshow`) may appear as OS windows.
Press **q** in the window to quit loops.



## 0. Environment Setup

Run the next cell to install the required packages.  
If you're offline, install these with `pip` first:

```txt
opencv-python
ultralytics
numpy
onnx
onnxruntime  # for ONNX CPU
onnxruntime-gpu # (optional) for ONNX CUDA
```


In [4]:

# If needed, uncomment to install (internet required)
# !pip install -U pip
!pip install opencv-python ultralytics numpy onnx onnxruntime onnxruntime-gpu


Defaulting to user installation because normal site-packages is not writeable


## 1. Imports & Version Check

In [6]:

import sys, time, os, math, json, pathlib
from pathlib import Path
import numpy as np
import cv2

try:
    import torch
    torch_version = torch.__version__
except Exception as e:
    torch, torch_version = None, None

try:
    from ultralytics import YOLO
    import ultralytics
    yolo_version = ultralytics.__version__
except Exception as e:
    YOLO, yolo_version = None, None

print("Python:", sys.version.split()[0])
print("OpenCV:", cv2.__version__)
print("PyTorch:", torch_version)
print("Ultralytics:", yolo_version)

if torch is not None:
    print("CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
        print("CUDA device:", torch.cuda.get_device_name(0))


Python: 3.12.7
OpenCV: 4.11.0
PyTorch: 2.7.0+cpu
Ultralytics: 8.3.195
CUDA available: False


## 2. Utility Helpers

In [8]:

from typing import Tuple, List

def ensure_dir(p: str):
    Path(p).mkdir(parents=True, exist_ok=True)

def put_label(img, text, org, color=(0, 255, 0)):
    cv2.putText(img, text, org, cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2, cv2.LINE_AA)

def fps_counter():
    prev = time.time()
    while True:
        now = time.time()
        fps = 1.0 / (now - prev) if now != prev else 0.0
        prev = now
        yield fps

# COCO 80 classes (YOLOv8 default)
COCO_NAMES = [
    "person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light",
    "fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow",
    "elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee",
    "skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard",
    "tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple",
    "sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch",
    "potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone",
    "microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear",
    "hair drier","toothbrush"
]

# Letterbox resize like YOLO for ONNX -> OpenCV pipeline
def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=False, scaleFill=False, scaleup=True):
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:
        r = min(r, 1.0)

    # Compute padding
    new_unpad = (int(round(shape[1] * r)), int(round(shape[0] * r)))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # width, height padding
    if auto:
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # 64-pt stride-multiple padding
    dw /= 2; dh /= 2

    # resize
    if shape[::-1] != new_unpad:
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh-0.1)), int(round(dh+0.1))
    left, right = int(round(dw-0.1)), int(round(dw+0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
    return img, r, (dw, dh)



## 3. YOLOv8 — Ultralytics API

### 3.1 Inference on a single image
Put a test image under `data/images/your_image.jpg` and set the path below.


In [26]:

ensure_dir("/images/faces/"); ensure_dir("outputs")

image_path = "/images/faces/face_sample.jpg"  # <-- change to your image
model_name = "yolov8n.pt"           # 'n' is fast; try 's','m' for accuracy

if YOLO is None:
    raise RuntimeError("Ultralytics not installed. Run the pip cell above.")

model = YOLO(model_name)  # auto-downloads weights on first use
img = cv2.imread(image_path)
assert img is not None, f"Image not found at {image_path}. Place an image and re-run."

results = model(img, conf=0.5)
# Plot and save
for i, r in enumerate(results):
    plotted = r.plot()
    out_path = f"outputs/yolov8_image_{i}.jpg"
    cv2.imwrite(out_path, plotted)
    print("Saved:", out_path)


PermissionError: [Errno 13] Permission denied: '/images/faces/face_sample.jpg'

In [21]:
import os

print("faces directory exists?", os.path.isdir("images/faces/"))
print("files inside faces/:" if os.path.isdir("images/faces/") else "no faces/ folder")

if os.path.isdir("faces"):
    print(os.listdir("faces"))


faces directory exists? True
files inside faces/:



### 3.2 Webcam / Video inference (press **q** to quit)

- Set `source = 0` for default webcam, or provide a video path.
- Set `save_vid = True` to write an output video under `outputs/`.


In [None]:

source = 0  # 0 = default webcam; or e.g., "data/videos/input.mp4"
save_vid = True
out_path = "outputs/yolov8_ultralytics_out.mp4"

model = YOLO("yolov8n.pt")

# If source is a file, grab size from the file. If webcam, use runtime framesize.
cap = cv2.VideoCapture(source)
assert cap.isOpened(), f"Cannot open source: {source}"
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) or 640
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) or 480
fps_out = cap.get(cv2.CAP_PROP_FPS) or 30

writer = None
if save_vid:
    ensure_dir("outputs")
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    writer = cv2.VideoWriter(out_path, fourcc, fps_out, (w, h))

for r in model.track(source=source, show=True, stream=True, conf=0.5):
    frame = r.plot()
    if writer is not None:
        writer.write(frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
if writer is not None:
    writer.release()
cv2.destroyAllWindows()
print("Done. Saved to:", out_path if save_vid else "(not saved)")



## 4. MobileNet‑SSD — OpenCV DNN

We'll use the classic Caffe MobileNet‑SSD pretrained on VOC.  
The next cell will **download** the model files (internet required). If you're offline, manually place them under `models/ssd/`:

- `MobileNetSSD_deploy.prototxt`
- `MobileNetSSD_deploy.caffemodel`


In [None]:

ensure_dir("models/ssd")
import urllib.request

files = {
    "models/ssd/MobileNetSSD_deploy.prototxt":
        "https://raw.githubusercontent.com/chuanqi305/MobileNet-SSD/master/MobileNetSSD_deploy.prototxt",
    "models/ssd/MobileNetSSD_deploy.caffemodel":
        "https://github.com/chuanqi305/MobileNet-SSD/raw/master/MobileNetSSD_deploy.caffemodel",
}

for dst, url in files.items():
    if not Path(dst).exists():
        try:
            print("Downloading", url, "->", dst)
            urllib.request.urlretrieve(url, dst)
        except Exception as e:
            print("Could not download:", url, "| Error:", e)
    else:
        print("Exists:", dst)

# VOC 20 classes used by the original MobileNet-SSD
VOC_CLASSES = ["background","aeroplane","bicycle","bird","boat","bottle","bus","car","cat","chair",
               "cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep","sofa",
               "train","tvmonitor"]


### 4.1 SSD — Image inference

In [None]:

proto = "models/ssd/MobileNetSSD_deploy.prototxt"
caffe = "models/ssd/MobileNetSSD_deploy.caffemodel"
net = cv2.dnn.readNetFromCaffe(proto, caffe)

img_path = "data/images/test.jpg"  # reuse your image
img = cv2.imread(img_path); assert img is not None, "Place an image at data/images/test.jpg"

h, w = img.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 0.007843, (300, 300), 127.5)
net.setInput(blob)
dets = net.forward()

for i in range(dets.shape[2]):
    conf = float(dets[0, 0, i, 2])
    if conf >= 0.5:
        cls_id = int(dets[0, 0, i, 1])
        x1, y1, x2, y2 = (dets[0, 0, i, 3:7] * np.array([w, h, w, h])).astype(int)
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        label = f"{VOC_CLASSES[cls_id]} {conf:.2f}"
        put_label(img, label, (x1, max(0, y1-6)))

out_path = "outputs/ssd_image.jpg"
ensure_dir("outputs")
cv2.imwrite(out_path, img)
print("Saved:", out_path)


### 4.2 SSD — Webcam/Video (press **q** to quit)

In [None]:

source = 0  # webcam or path to mp4
cap = cv2.VideoCapture(source)
assert cap.isOpened(), f"Cannot open source: {source}"

writer = None; out_path = "outputs/ssd_out.mp4"
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) or 640
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) or 480
fps_out = cap.get(cv2.CAP_PROP_FPS) or 30
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
writer = cv2.VideoWriter(out_path, fourcc, fps_out, (w, h))

net = cv2.dnn.readNetFromCaffe("models/ssd/MobileNetSSD_deploy.prototxt",
                               "models/ssd/MobileNetSSD_deploy.caffemodel")

fps_gen = fps_counter()
while True:
    ret, frame = cap.read()
    if not ret: break

    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)
    net.setInput(blob)
    dets = net.forward()

    for i in range(dets.shape[2]):
        conf = float(dets[0, 0, i, 2])
        if conf >= 0.5:
            cls_id = int(dets[0, 0, i, 1])
            hF, wF = frame.shape[:2]
            x1, y1, x2, y2 = (dets[0, 0, i, 3:7] * np.array([wF, hF, wF, hF])).astype(int)
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            put_label(frame, f"{VOC_CLASSES[cls_id]}", (x1, max(0, y1-6)))

    fps = next(fps_gen)
    put_label(frame, f"FPS: {fps:.1f}", (10, 30), (0, 255, 255))

    writer.write(frame)
    cv2.imshow("SSD DNN", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release(); writer.release(); cv2.destroyAllWindows()
print("Saved:", out_path)



## 5. YOLOv8 → ONNX (export) and OpenCV DNN

### 5.1 Export weights to ONNX
Run this once to generate `yolov8n.onnx` in `models/yolo/`. You need Ultralytics installed.


In [None]:

ensure_dir("models/yolo")
if YOLO is None:
    raise RuntimeError("Ultralytics not installed. Run the pip cell above.")
model = YOLO("yolov8n.pt")
onnx_path = "models/yolo/yolov8n.onnx"
model.export(format="onnx", opset=12, simplify=True, imgsz=640, dynamic=False, half=False, device=None, name=onnx_path)
print("Exported:", onnx_path)


### 5.2 OpenCV DNN inference (ONNX) — helpers

In [None]:

def yolo_postprocess(outputs: np.ndarray, img_shape, conf_thres=0.25, iou_thres=0.45):
    # outputs: (batch, num, 85) or (num, 85); we assume (1, N, 85)
    if outputs.ndim == 3:
        outputs = outputs[0]  # (N, 85)
    boxes = []
    scores = []
    class_ids = []

    h, w = img_shape[:2]
    for i in range(outputs.shape[0]):
        row = outputs[i]
        obj = row[4] if row.shape[0] >= 6 else 1.0  # compatibility
        class_scores = row[5:]
        if class_scores.size == 0:
            # Some exports pack x,y,w,h + class confs (no obj)
            class_scores = row[4:]
            obj = 1.0
        cls_id = int(np.argmax(class_scores))
        conf = class_scores[cls_id] * obj
        if conf >= conf_thres:
            # xywh -> xyxy in original image scale is handled outside; here we keep raw
            boxes.append(row[:4])
            scores.append(float(conf))
            class_ids.append(cls_id)

    if len(boxes) == 0:
        return [], [], []

    boxes = np.array(boxes)
    scores = np.array(scores)

    # NMS pre: convert from cx,cy,w,h to x1,y1,x2,y2 (in the resized image space 640x640)
    cx, cy, bw, bh = boxes[:,0], boxes[:,1], boxes[:,2], boxes[:,3]
    x1 = cx - bw/2; y1 = cy - bh/2; x2 = cx + bw/2; y2 = cy + bh/2
    boxes_xyxy = np.stack([x1, y1, x2, y2], axis=1)

    # OpenCV NMS
    idxs = cv2.dnn.NMSBoxes(
        bboxes=boxes_xyxy.tolist(),
        scores=scores.tolist(),
        score_threshold=conf_thres,
        nms_threshold=iou_thres
    )
    idxs = idxs.flatten().tolist() if len(idxs) > 0 else []
    return boxes_xyxy[idxs], scores[idxs].tolist(), [class_ids[i] for i in idxs]

def scale_coords(resized_shape, boxes_xyxy, original_shape, ratio_pad):
    # Map boxes from letterboxed image space back to original image space
    (_, _), (dw, dh) = ((0,0), ratio_pad)
    gain = min(resized_shape[0] / original_shape[0], resized_shape[1] / original_shape[1])
    boxes = boxes_xyxy.copy()
    boxes[:, [0,2]] -= dw*2
    boxes[:, [1,3]] -= dh*2
    boxes[:, :4] /= gain
    # clip
    h, w = original_shape[:2]
    boxes[:, [0,2]] = boxes[:, [0,2]].clip(0, w-1)
    boxes[:, [1,3]] = boxes[:, [1,3]].clip(0, h-1)
    return boxes


### 5.3 ONNX — Image inference

In [None]:

onnx_path = "models/yolo/yolov8n.onnx"
net = cv2.dnn.readNetFromONNX(onnx_path)

img_path = "data/images/test.jpg"
img0 = cv2.imread(img_path); assert img0 is not None, "Place an image at data/images/test.jpg"
img, r, (dw, dh) = letterbox(img0, (640, 640))
blob = cv2.dnn.blobFromImage(img, 1/255.0, (640, 640), swapRB=True, crop=False)
net.setInput(blob)
out = net.forward()  # shape: (1, N, 85) for YOLOv8

boxes_xyxy, scores, class_ids = yolo_postprocess(out, img.shape, conf_thres=0.35, iou_thres=0.5)
if len(boxes_xyxy):
    # scale back to original image
    boxes_xyxy = scale_coords((640,640), boxes_xyxy, img0.shape, (dw, dh)).astype(int)
    for (x1,y1,x2,y2), s, cid in zip(boxes_xyxy, scores, class_ids):
        cv2.rectangle(img0, (x1,y1), (x2,y2), (0,255,0), 2)
        cls = COCO_NAMES[cid] if cid < len(COCO_NAMES) else str(cid)
        put_label(img0, f"{cls} {s:.2f}", (x1, max(0, y1-6)))

ensure_dir("outputs")
out_path = "outputs/yolov8_onnx_image.jpg"
cv2.imwrite(out_path, img0)
print("Saved:", out_path)


### 5.4 ONNX — Webcam/Video (press **q** to quit)

In [None]:

source = 0  # webcam or path to mp4
cap = cv2.VideoCapture(source)
assert cap.isOpened(), f"Cannot open source: {source}"
out_path = "outputs/yolov8_onnx_out.mp4"

w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) or 640
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) or 480
fps_out = cap.get(cv2.CAP_PROP_FPS) or 30

net = cv2.dnn.readNetFromONNX("models/yolo/yolov8n.onnx")

writer = cv2.VideoWriter(out_path, cv2.VideoWriter_fourcc(*"mp4v"), fps_out, (w, h))
fps_gen = fps_counter()

while True:
    ret, frame = cap.read()
    if not ret: break

    img, r, (dw, dh) = letterbox(frame, (640, 640))
    blob = cv2.dnn.blobFromImage(img, 1/255.0, (640, 640), swapRB=True, crop=False)
    net.setInput(blob)
    out = net.forward()

    boxes_xyxy, scores, class_ids = yolo_postprocess(out, img.shape, conf_thres=0.35, iou_thres=0.5)
    if len(boxes_xyxy):
        boxes_xyxy = scale_coords((640,640), boxes_xyxy, frame.shape, (dw, dh)).astype(int)
        for (x1,y1,x2,y2), s, cid in zip(boxes_xyxy, scores, class_ids):
            cv2.rectangle(frame, (x1,y1), (x2,y2), (0,255,0), 2)
            cls = COCO_NAMES[cid] if cid < len(COCO_NAMES) else str(cid)
            put_label(frame, f"{cls} {s:.2f}", (x1, max(0, y1-6)))

    fps = next(fps_gen)
    put_label(frame, f"FPS: {fps:.1f}", (10, 30), (0, 255, 255))

    writer.write(frame)
    cv2.imshow("YOLOv8 ONNX (OpenCV DNN)", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release(); writer.release(); cv2.destroyAllWindows()
print("Saved:", out_path)


## 6. Simple FPS Benchmark (Image batch)

In [None]:

import glob

# Put a few images under data/images/
imgs = [cv2.imread(p) for p in sorted(glob.glob("data/images/*"))[:8]]
imgs = [im for im in imgs if im is not None]
assert imgs, "Add some images into data/images/"

# YOLOv8 (Ultralytics) single-image timing
model = YOLO("yolov8n.pt")
t0 = time.time()
for im in imgs:
    _ = model.predict(source=im, conf=0.5, verbose=False)
t1 = time.time()
print(f"YOLOv8 Ultralytics: {(len(imgs)/(t1-t0)):.2f} FPS (images/sec)")

# SSD (OpenCV) timing
net = cv2.dnn.readNetFromCaffe("models/ssd/MobileNetSSD_deploy.prototxt",
                               "models/ssd/MobileNetSSD_deploy.caffemodel")
t0 = time.time()
for im in imgs:
    blob = cv2.dnn.blobFromImage(cv2.resize(im, (300, 300)), 0.007843, (300, 300), 127.5)
    net.setInput(blob); _ = net.forward()
t1 = time.time()
print(f"MobileNet-SSD DNN: {(len(imgs)/(t1-t0)):.2f} FPS (images/sec)")

# YOLOv8 ONNX (OpenCV DNN) timing
net = cv2.dnn.readNetFromONNX("models/yolo/yolov8n.onnx")
t0 = time.time()
for im in imgs:
    img, r, (dw, dh) = letterbox(im, (640, 640))
    blob = cv2.dnn.blobFromImage(img, 1/255.0, (640, 640), swapRB=True, crop=False)
    net.setInput(blob); _ = net.forward()
t1 = time.time()
print(f"YOLOv8 ONNX (DNN): {(len(imgs)/(t1-t0)):.2f} FPS (images/sec)")



## 7. Wrap‑Up & Next Steps

- Try different YOLOv8 sizes: `yolov8s.pt`, `yolov8m.pt` for accuracy vs speed
- Replace COCO names with your dataset and fine‑tune a custom model (Ultralytics `model.train(...)`)
- Compare FPS on CPU vs GPU (Torch vs OpenCV DNN vs ONNX Runtime)

**Suggested Exercises**
1. Replace YOLOv8 with YOLOv5 or YOLOv7 and compare FPS/accuracy.
2. Benchmark SSD vs YOLOv8 on your machine (CPU/GPU) and record results in a table.
3. Train a YOLO model on a small custom dataset and test in real‑time.
4. Use OpenCV DNN to run COCO-trained ONNX **without** Ultralytics API (done above!).

**Notes**
- This notebook follows the CV Lab manual’s Chapter 11 outline (Real-Time Object Detection).  
- If `cv2.imshow` windows do not appear in your environment, run in a local Python script or enable GUI backend.
