# Week 8: Computer Vision III ‚Äî Object Detection
## COMP 9130 ‚Äî Applied Artificial Intelligence
## **Student Activity Notebook**

---

**Business Context:** CityView Traffic Analytics ‚Äî a municipal transportation department building an automated traffic monitoring system to analyze intersection safety, count vehicles, and detect pedestrians.

**Tool:** YOLO26 via Ultralytics

**Task 1 Dataset:** Sample traffic scene images + pre-trained YOLO26 (COCO weights ‚Äî already detects cars, trucks, buses, pedestrians, traffic lights, stop signs)

**Task 2‚Äì3 Dataset:** Road sign detection dataset from Roboflow (~700 images, 4 classes: traffic light, stop, speed limit, crosswalk)

**‚è∞ TIMING GUIDE:**
| Time | Activity | Points |
|------|----------|--------|
| 0:00‚Äì0:20 | Quiz 7 (Object Detection Concepts from Prep) | 5 pts |
| 0:20‚Äì0:30 | Setup & Installation | ‚Äî |
| 0:30‚Äì1:20 | Task 1: Pre-trained YOLO for Traffic Monitoring | 5 pts |
| 1:20‚Äì1:30 | Break | ‚Äî |
| 1:30‚Äì2:20 | Task 2: Explore Object Detection Datasets & Annotations | 5 pts |
| 2:20‚Äì2:55 | Task 3: Fine-tune YOLO26 on Custom Data | 5 pts |
| 2:55‚Äì3:00 | Wrap-up & Mini Project 7 Introduction | ‚Äî |

**üéØ KEY LEARNING GOALS:**
1. Understand the difference between classification, detection, and segmentation
2. Use a pre-trained YOLO model for inference on real images
3. Understand YOLO annotation format (class, x_center, y_center, width, height)
4. Fine-tune YOLO26 on a custom dataset and evaluate with IoU and mAP

**‚ö†Ô∏è COMMON STUDENT STRUGGLES:**
1. Confusing classification (one label per image) vs. detection (multiple boxes per image)
2. YOLO annotation format ‚Äî normalized coordinates, center-based, one .txt per image
3. Understanding confidence thresholds ‚Äî too high misses objects, too low gives false positives
4. IoU concept ‚Äî students may confuse it with accuracy
5. mAP calculation ‚Äî it's averaged over IoU thresholds AND classes
6. Dataset directory structure ‚Äî YOLO expects images/ and labels/ folders in specific layout

**üìå KEY DIFFERENCE FROM WEEKS 6‚Äì7:**
- Weeks 6‚Äì7: Classification = one label per image, trained with Keras/TensorFlow
- Week 8: Detection = multiple bounding boxes per image, trained with Ultralytics/PyTorch
- Students will notice the API is completely different ‚Äî that's intentional!

---

## Setup & Installation

**‚ö†Ô∏è IMPORTANT:** If students see warnings about `albumentations` or `wandb`, those are optional and can be ignored.

In [None]:
# ============================================
# SETUP & INSTALLATION
# ============================================

# Install required packages
!pip install -q ultralytics roboflow

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import os
import glob
import random
import yaml
from pathlib import Path
from collections import Counter

# YOLO imports
from ultralytics import YOLO

# Set random seeds for reproducibility
np.random.seed(42)
random.seed(42)

# Verify environment
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
else:
    print("‚ö†Ô∏è No GPU detected ‚Äî training will be slow!")
    print("Go to: Runtime ‚Üí Change runtime type ‚Üí T4 GPU")

---

# Task 1: Pre-trained YOLO for Traffic Monitoring

**Key concepts to emphasize:**
1. YOLO = "You Only Look Once" ‚Äî processes the entire image in one forward pass
2. Each detection has: bounding box (x, y, w, h), class label, confidence score
3. The model outputs MANY candidate boxes ‚Üí Non-Maximum Suppression (NMS) filters them
4. Confidence threshold controls the trade-off between precision and recall

---

## Part A: Load Pre-trained YOLO26

In [None]:
# ============================================
# LOAD PRE-TRAINED YOLO26 MODEL
# ============================================

# yolo26n (nano)    ‚Äî 2.4M params, fastest, least accurate
# yolo26s (small)   ‚Äî 9.5M params
# yolo26m (medium)  ‚Äî 20.4M params
# yolo26l (large)   ‚Äî 25.3M params
# yolo26x (xlarge)  ‚Äî 59.1M params, slowest, most accurate
# We use 'n' (nano) for speed in class.

model_pretrained = YOLO('yolo26n.pt')  # Downloads automatically

# Explore model info
print("üìã Model Summary:")
print(f"  Model type: YOLO26n (Nano)")
print(f"  Task: Object Detection")
print(f"  Number of classes: {len(model_pretrained.names)}")
print(f"\nüè∑Ô∏è COCO Classes (first 20):")
for i, name in list(model_pretrained.names.items())[:20]:
    print(f"  {i}: {name}")

# Show traffic-relevant classes
traffic_classes = ['car', 'truck', 'bus', 'motorcycle', 'bicycle',
                   'person', 'traffic light', 'stop sign']
print(f"\nüöó Traffic-relevant COCO classes:")
for name in traffic_classes:
    class_id = [k for k, v in model_pretrained.names.items() if v == name]
    if class_id:
        print(f"  Class {class_id[0]}: {name}")

## Part B: Download Sample Traffic Images



In [None]:
# ============================================
# DOWNLOAD SAMPLE TRAFFIC IMAGES
# ============================================

# These reliably download and contain traffic-relevant objects.

os.makedirs('sample_images', exist_ok=True)

sample_urls = [
    "https://ultralytics.com/images/bus.jpg",
    "https://ultralytics.com/images/zidane.jpg",
]

import urllib.request
downloaded_images = []
for url in sample_urls:
    filename = os.path.join('sample_images', os.path.basename(url))
    try:
        urllib.request.urlretrieve(url, filename)
        downloaded_images.append(filename)
        print(f"‚úÖ Downloaded: {filename}")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not download {url}: {e}")

# Display the sample images
fig, axes = plt.subplots(1, len(downloaded_images),
                          figsize=(7 * len(downloaded_images), 7))
if len(downloaded_images) == 1:
    axes = [axes]
for ax, img_path in zip(axes, downloaded_images):
    img = Image.open(img_path)
    ax.imshow(img)
    ax.set_title(os.path.basename(img_path), fontsize=12)
    ax.axis('off')
plt.suptitle('Sample Images (Before Detection)', fontsize=14)
plt.tight_layout()
plt.show()

## Part C: Run Detection & Visualize Results



In [None]:
# ============================================
# RUN DETECTION ON SAMPLE IMAGES
# ============================================

# Each result contains boxes, confidence scores, and class IDs.
# conf=0.25 means only show detections with >25% confidence.

results = model_pretrained(downloaded_images, conf=0.25)

# Display results with detailed breakdown
for i, result in enumerate(results):
    print(f"\n{'=' * 60}")
    print(f"Image: {os.path.basename(downloaded_images[i])}")
    print(f"{'=' * 60}")

    boxes = result.boxes
    print(f"\nüì¶ Detections found: {len(boxes)}")
    print(f"{'‚îÄ' * 55}")
    print(f"{'Class':<20} {'Confidence':<12} {'Box (x1,y1,x2,y2)'}")
    print(f"{'‚îÄ' * 55}")

    for box in boxes:
        cls_id = int(box.cls[0])
        conf = float(box.conf[0])
        xyxy = box.xyxy[0].cpu().numpy()
        class_name = model_pretrained.names[cls_id]
        print(f"{class_name:<20} {conf:<12.3f} "
              f"[{xyxy[0]:.0f}, {xyxy[1]:.0f}, {xyxy[2]:.0f}, {xyxy[3]:.0f}]")

    # Plot with bounding boxes
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    annotated = result.plot()  # Returns BGR numpy array
    ax.imshow(annotated[..., ::-1])  # Convert BGR to RGB
    ax.set_title(f"Detections: {os.path.basename(downloaded_images[i])}",
                 fontsize=14)
    ax.axis('off')
    plt.tight_layout()
    plt.show()

## Part D: Explore Confidence Thresholds



In [None]:
# ============================================
# CONFIDENCE THRESHOLD COMPARISON
# ============================================

test_image = downloaded_images[0]  # Use the bus image
thresholds = [0.10, 0.25, 0.50, 0.75]

fig, axes = plt.subplots(1, 4, figsize=(24, 6))

for ax, thresh in zip(axes, thresholds):
    result = model_pretrained(test_image, conf=thresh, verbose=False)
    annotated = result[0].plot()
    ax.imshow(annotated[..., ::-1])
    n_detections = len(result[0].boxes)
    ax.set_title(f"Confidence ‚â• {thresh}\n({n_detections} detections)",
                 fontsize=12)
    ax.axis('off')

plt.suptitle('Effect of Confidence Threshold on Detection Count',
             fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nüí° Key Insight for CityView:")
print("  Low threshold (0.10): More detections, some false positives")
print("  High threshold (0.75): Fewer detections, but very confident")
print("  ‚Ä¢ Pedestrian safety ‚Üí lower threshold (don't miss anyone!)")
print("  ‚Ä¢ Vehicle counting ‚Üí higher threshold (accuracy matters more)")

## Part E: Filter for Traffic-Relevant Classes



In [None]:
# ============================================
# FILTER DETECTIONS BY CLASS
# ============================================

# COCO class IDs for traffic objects
traffic_class_ids = [0, 1, 2, 3, 5, 7, 9, 11]
# 0=person, 1=bicycle, 2=car, 3=motorcycle, 5=bus, 7=truck,
# 9=traffic light, 11=stop sign

# Compare filtered vs unfiltered
result_all = model_pretrained(test_image, conf=0.25, verbose=False)
result_traffic = model_pretrained(test_image, conf=0.25,
                                   classes=traffic_class_ids,
                                   verbose=False)

fig, axes = plt.subplots(1, 2, figsize=(18, 7))

axes[0].imshow(result_all[0].plot()[..., ::-1])
axes[0].set_title(f"All COCO classes ({len(result_all[0].boxes)} detections)",
                   fontsize=13)
axes[0].axis('off')

axes[1].imshow(result_traffic[0].plot()[..., ::-1])
axes[1].set_title(f"Traffic classes only ({len(result_traffic[0].boxes)} "
                   f"detections)", fontsize=13)
axes[1].axis('off')

plt.suptitle('CityView: Filtering for Traffic-Relevant Objects', fontsize=14)
plt.tight_layout()
plt.show()

# Traffic counting report
print("\nüìä CityView Traffic Report:")
print("=" * 40)
class_counts = Counter()
for box in result_traffic[0].boxes:
    cls_name = model_pretrained.names[int(box.cls[0])]
    class_counts[cls_name] += 1
for cls_name, count in class_counts.most_common():
    print(f"  {cls_name}: {count}")
print(f"  Total traffic objects: {sum(class_counts.values())}")

‚úÖ Task 1 Analysis
Answer these questions based on YOUR detection results:

Q1. How many objects did YOLO26 detect in the bus image at conf=0.25? List the classes and counts.

5 objects, 1 bus and 4 peoples

Q2. When you changed the confidence threshold from 0.10 to 0.75, how did the number of detections change? Relate this to the precision-recall trade-off from Week 3.

Low threshold (0.10): More detections, some false positives

High threshold (0.75): Fewer detections, but very confident

Q3. For CityView's pedestrian safety application, would you use a high or low confidence threshold? Why?

Pedestrian safety ‚Üí lower threshold (don't miss anyone!)

Q4. Name at least 3 differences between image classification (Weeks 6‚Äì7) and object detection (today).

output format - image classification outputs a single class probability distribution for the entire image. Ojbect detecction outpus
number of objects - lassification generally assigns one primary label to an image (or a set of labels to the whole scene). Detection identifies, separates, and counts multiple individual objects within the exact same image.
spatial information (localization) - Classification tells you what is in the image but not where it is. Object detection provides exact spatial localization, drawing a boundary around the exact pixels the object occupies.
Call instructor to verify completion!

---

# Task 2: Explore Object Detection Datasets & Annotations

**Key concepts to emphasize:**
1. YOLO format: each image has a matching .txt file with one line per object
2. Each line: `class_id x_center y_center width height` (all normalized 0‚Äì1)
3. Dataset must be split into train/val (and optionally test)
4. data.yaml tells YOLO where to find images/labels and what classes exist

---

## Part A: Download Road Sign Dataset from Roboflow

In [None]:
# ============================================
# DOWNLOAD ROAD SIGN DATASET
# ============================================

# road sign detection dataset in YOLO format.
#
# SETUP BEFORE CLASS:
# 1. Go to https://app.roboflow.com ‚Üí sign up (free)
# 2. Go to Settings ‚Üí API Keys ‚Üí copy your key
# 3. Replace the API_KEY below
# 4. Dataset: Road Signs from RF100 benchmark (public)

from roboflow import Roboflow

# ‚ö†Ô∏è INSTRUCTOR: Replace with your Roboflow API key
ROBOFLOW_API_KEY = "EPeijRTxqpNY9q8JNtH1"

rf = Roboflow(api_key=ROBOFLOW_API_KEY)
project = rf.workspace("roboflow-100").project("road-signs-6ih4y")
version = project.version(2)
dataset = version.download("yolov8")

DATASET_DIR = dataset.location
print(f"\n‚úÖ Dataset downloaded to: {DATASET_DIR}")

In [None]:
# ============================================
# FALLBACK: If Roboflow is unavailable
# ============================================

# This downloads from a backup URL or uses a local copy.

# import zipfile
# !wget -q "BACKUP_URL_HERE" -O road_signs.zip
# with zipfile.ZipFile('road_signs.zip', 'r') as z:
#     z.extractall('road_signs_dataset')
# DATASET_DIR = 'road_signs_dataset'
# print(f"‚úÖ Fallback dataset extracted to: {DATASET_DIR}")

## Part B: Explore Dataset Structure



In [None]:
# ============================================
# EXPLORE DATASET STRUCTURE
# ============================================

# Show directory tree
print("üìÇ Dataset Structure:")
print("=" * 50)
for root, dirs, files in os.walk(DATASET_DIR):
    level = root.replace(DATASET_DIR, '').count(os.sep)
    indent = '  ' * level
    folder_name = os.path.basename(root)
    if level <= 2:
        n_files = len(files)
        print(f"{indent}üìÅ {folder_name}/ ({n_files} files)")

# Count images per split
print("\nüìä Dataset Statistics:")
print("=" * 50)
for split in ['train', 'valid', 'test']:
    img_dir = os.path.join(DATASET_DIR, split, 'images')
    if os.path.exists(img_dir):
        n_images = len(glob.glob(os.path.join(img_dir, '*')))
        print(f"  {split:>8}: {n_images} images")

# Read and display data.yaml
yaml_path = os.path.join(DATASET_DIR, 'data.yaml')
with open(yaml_path, 'r') as f:
    data_config = yaml.safe_load(f)

print(f"\nüìã data.yaml Configuration:")
print("=" * 50)
for key, value in data_config.items():
    print(f"  {key}: {value}")

## Part C: Understand YOLO Annotation Format

```
class_id  x_center  y_center  width  height
```

All coordinates are **normalized** (0 to 1). Draw this on the whiteboard:
```
(0,0) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ (1,0)
  |                   |
  |    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê      |
  |    ‚îÇ(cx,cy)‚îÇ      |
  |    ‚îÇ   ¬∑   ‚îÇ      |
  |    ‚îî‚îÄ‚îÄ‚îÄw‚îÄ‚îÄ‚îÄ‚îò      |
  |        h          |
(0,1) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ (1,1)
```

In [None]:
# ============================================
# EXPLORE YOLO ANNOTATION FORMAT
# ============================================

# Get class names from data.yaml
class_names = data_config['names']
print(f"üè∑Ô∏è Classes in this dataset:")
for i, name in enumerate(class_names):
    print(f"  {i}: {name}")

# Pick sample images from training set
train_img_dir = os.path.join(DATASET_DIR, 'train', 'images')
train_lbl_dir = os.path.join(DATASET_DIR, 'train', 'labels')
sample_images = sorted(glob.glob(os.path.join(train_img_dir, '*')))[:5]

# Show raw annotation file contents
for img_path in sample_images[:2]:
    img_name = os.path.splitext(os.path.basename(img_path))[0]
    label_path = os.path.join(train_lbl_dir, img_name + '.txt')

    print(f"\nüìÑ Image: {os.path.basename(img_path)}")
    print(f"üìÑ Label: {os.path.basename(label_path)}")
    print("‚îÄ" * 65)

    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            lines = f.readlines()
        print(f"  Objects in this image: {len(lines)}")
        print(f"  {'class_id':<10} {'x_center':<10} {'y_center':<10} "
              f"{'width':<10} {'height':<10} {'class_name'}")
        print(f"  {'‚îÄ' * 60}")
        for line in lines:
            parts = line.strip().split()
            cls_id = int(parts[0])
            x_c, y_c, w, h = [float(p) for p in parts[1:]]
            name = class_names[cls_id] if cls_id < len(class_names) else 'unknown'
            print(f"  {cls_id:<10} {x_c:<10.4f} {y_c:<10.4f} "
                  f"{w:<10.4f} {h:<10.4f} {name}")
    else:
        print("  ‚ö†Ô∏è No label file found")

## Part D: Visualize Annotations on Images



In [None]:
# ============================================
# HELPER: Convert YOLO format to pixel coordinates
# ============================================

def yolo_to_pixel(x_center, y_center, width, height, img_w, img_h):
    """
    Convert YOLO normalized coordinates to pixel coordinates.
    YOLO: (x_center, y_center, width, height) ‚Äî all 0 to 1
    Pixel: (x1, y1, x2, y2) ‚Äî top-left and bottom-right corners
    """
    x1 = (x_center - width / 2) * img_w
    y1 = (y_center - height / 2) * img_h
    x2 = (x_center + width / 2) * img_w
    y2 = (y_center + height / 2) * img_h
    return x1, y1, x2, y2


def visualize_yolo_annotations(img_path, label_path, class_names, ax=None):
    """Draw YOLO bounding boxes on an image."""
    img = Image.open(img_path)
    img_w, img_h = img.size

    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(10, 8))

    ax.imshow(img)
    colors = plt.cm.Set1(np.linspace(0, 1, max(len(class_names), 1)))

    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            for line in f.readlines():
                parts = line.strip().split()
                cls_id = int(parts[0])
                x_c, y_c, w, h = [float(p) for p in parts[1:]]
                x1, y1, x2, y2 = yolo_to_pixel(x_c, y_c, w, h, img_w, img_h)

                color = colors[cls_id % len(colors)]
                rect = patches.Rectangle(
                    (x1, y1), x2 - x1, y2 - y1,
                    linewidth=2, edgecolor=color, facecolor='none'
                )
                ax.add_patch(rect)

                name = class_names[cls_id] if cls_id < len(class_names) else f'cls_{cls_id}'
                ax.text(x1, y1 - 5, name, fontsize=10, fontweight='bold',
                       color='white',
                       bbox=dict(boxstyle='round,pad=0.2',
                                 facecolor=color, alpha=0.8))
    ax.axis('off')
    return ax

print("‚úÖ Helper functions defined.")

In [None]:
# ============================================
# VISUALIZE ANNOTATIONS ON SAMPLE IMAGES
# ============================================

# Find images that have annotations
annotated_samples = []
for img_path in sorted(glob.glob(os.path.join(train_img_dir, '*'))):
    img_name = os.path.splitext(os.path.basename(img_path))[0]
    label_path = os.path.join(train_lbl_dir, img_name + '.txt')
    if os.path.exists(label_path) and os.path.getsize(label_path) > 0:
        annotated_samples.append((img_path, label_path))
    if len(annotated_samples) >= 6:
        break

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
for idx, (img_path, label_path) in enumerate(annotated_samples):
    ax = axes[idx // 3, idx % 3]
    visualize_yolo_annotations(img_path, label_path, class_names, ax=ax)
    ax.set_title(os.path.basename(img_path), fontsize=10)

plt.suptitle('Road Sign Dataset ‚Äî Ground Truth Annotations', fontsize=14)
plt.tight_layout()
plt.show()

## Part E: Dataset Statistics



In [None]:
# ============================================
# DATASET STATISTICS
# ============================================

# Count objects per class across training set
class_counter = {name: 0 for name in class_names}
total_objects = 0
objects_per_image = []

for label_path in glob.glob(os.path.join(train_lbl_dir, '*.txt')):
    with open(label_path, 'r') as f:
        lines = f.readlines()
    objects_per_image.append(len(lines))
    for line in lines:
        cls_id = int(line.strip().split()[0])
        if cls_id < len(class_names):
            class_counter[class_names[cls_id]] += 1
        total_objects += 1

print("üìä Training Set Statistics:")
print("=" * 50)
print(f"  Images with annotations: {len(objects_per_image)}")
print(f"  Total objects: {total_objects}")
print(f"  Avg objects/image: {np.mean(objects_per_image):.1f}")
print(f"  Max objects in one image: {max(objects_per_image) if objects_per_image else 0}")
print(f"\nüè∑Ô∏è Objects per Class:")
for name, count in sorted(class_counter.items(), key=lambda x: -x[1]):
    bar = '‚ñà' * (count // 5)
    print(f"  {name:<20} {count:>5}  {bar}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

names_list = list(class_counter.keys())
counts_list = list(class_counter.values())
colors = plt.cm.Set2(np.linspace(0, 1, len(names_list)))
axes[0].barh(names_list, counts_list, color=colors)
axes[0].set_xlabel('Number of Annotations')
axes[0].set_title('Class Distribution (Training Set)')
for i, (n, c) in enumerate(zip(names_list, counts_list)):
    axes[0].text(c + 2, i, str(c), va='center', fontsize=10)

axes[1].hist(objects_per_image,
             bins=range(0, max(objects_per_image) + 2),
             edgecolor='black', alpha=0.7, color='steelblue')
axes[1].set_xlabel('Objects per Image')
axes[1].set_ylabel('Number of Images')
axes[1].set_title('Objects per Image Distribution')

plt.tight_layout()
plt.show()

‚úÖ Task 2 Analysis
Answer these questions based on YOUR dataset exploration:

Q1. What does "normalized coordinates" mean in YOLO format? Why is this useful?

Normalized coordinates means to map the range to [0, 1] for the x,y,w,h values by dividing by the image width and height. It is more efficient for the model to use smaller, normalized numbers for calculations and there isn't a need to train on images of different resolutions.

Q2. Look at the class distribution. Is the dataset balanced or imbalanced? How might this affect model performance?

The dataset is imbalanced as some classes have fewer (30) images compared to the rest (70+)

Q3. How many files does a YOLO dataset need for each image? What if an image has no objects?

YOLO dataset needs a .txt file describing the type and location of objects in the image. If the image has no objects, it needs an empty .txt file.

Q4. What is data.yaml and what key information does it contain?

The data.yaml file tells the YOLO algorithm where to find the data and how to interpret the numbers in the label files. It contains:

(names of classes in order):

names: ['bus_stop', 'do_not_enter', 'do_not_stop', 'do_not_turn_l', 'do_not_turn_r', 'do_not_u_turn', 'enter_left_lane', 'green_light', 'left_right_lane', 'no_parking', 'parking', 'ped_crossing', 'ped_zebra_cross', 'railway_crossing', 'red_light', 'stop', 't_intersection_l', 'traffic_light', 'u_turn', 'warning', 'yellow_light']

(number of classes)

nc: 21

(versioning information)

roboflow: {'license': 'CC BY 4.0', 'project': 'road-signs-6ih4y', 'url': 'https://universe.roboflow.com/roboflow-100/road-signs-6ih4y/dataset/2', 'version': 2, 'workspace': 'roboflow-100'}

(file paths)

test: ../test/images train: ../train/images val: ../valid/images

Call instructor to verify completion!

---

# Task 3: Fine-tune YOLO26 on Custom Data

**Key points:**
1. Start from COCO pre-trained weights (transfer learning!)
2. Train for 10 epochs in class (more for mini project)
3. IoU = Intersection over Union ‚Äî how well boxes match ground truth
4. mAP = Mean Average Precision ‚Äî the standard detection metric

**‚ö†Ô∏è TIME:** Training takes ~2‚Äì5 min for 10 epochs on T4 GPU. Reduce to 5 epochs if behind.

**Draw IoU on the whiteboard:**
```
IoU = Area of Overlap / Area of Union

   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
   ‚îÇ Predicted ‚îÇ
   ‚îÇ    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
   ‚îÇ    ‚îÇINTER ‚îÇ      ‚îÇ
   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò      ‚îÇ
        ‚îÇ Ground Truth ‚îÇ
        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

IoU = 0.0 ‚Üí no overlap (terrible)
IoU = 0.5 ‚Üí partial overlap (threshold for "correct")
IoU = 1.0 ‚Üí perfect overlap (ideal)
```

---

## Part A: Train the Model

In [None]:
# ============================================
# FINE-TUNE YOLO26 ON ROAD SIGNS
# ============================================

# but for detection. Key parameters to explain:
# - epochs: passes through training data
# - imgsz: YOLO resizes all images to this
# - batch: adjust if GPU memory errors (try 8)
# - patience: early stopping

model = YOLO('yolo26n.pt')  # Fresh pre-trained model

results = model.train(
    data=os.path.join(DATASET_DIR, 'data.yaml'),
    epochs=20,        # Low for class; use 50+ for real projects
    imgsz=640,        # Standard YOLO input size
    batch=16,         # Reduce to 8 if memory error
    patience=5,       # Early stopping
    lr0=0.005,         # Initial learning rate
    verbose=True,
    project='runs',
    name='road_signs',
    exist_ok=True,
)

# Capture the actual save directory for use in later cells
SAVE_DIR = str(results.save_dir)
print(f"\n‚úÖ Training complete! Results saved to: {SAVE_DIR}")

## Part B: Visualize Training Progress



In [None]:
# ============================================
# VISUALIZE TRAINING CURVES
# ============================================

import pandas as pd

results_csv = os.path.join(SAVE_DIR, 'results.csv')
if os.path.exists(results_csv):
    df = pd.read_csv(results_csv)
    df.columns = df.columns.str.strip()

    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # Box loss
    axes[0, 0].plot(df['epoch'], df['train/box_loss'],
                     label='Train', color='blue')
    axes[0, 0].plot(df['epoch'], df['val/box_loss'],
                     label='Validation', color='red')
    axes[0, 0].set_title('Box Loss (Lower is Better)')
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)

    # Classification loss
    axes[0, 1].plot(df['epoch'], df['train/cls_loss'],
                     label='Train', color='blue')
    axes[0, 1].plot(df['epoch'], df['val/cls_loss'],
                     label='Validation', color='red')
    axes[0, 1].set_title('Classification Loss (Lower is Better)')
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)

    # mAP50
    axes[1, 0].plot(df['epoch'], df['metrics/mAP50(B)'],
                     color='green', linewidth=2)
    axes[1, 0].set_title('mAP@50 (Higher is Better)')
    axes[1, 0].set_xlabel('Epoch')
    axes[1, 0].set_ylim(0, 1)
    axes[1, 0].grid(True, alpha=0.3)

    # mAP50-95
    axes[1, 1].plot(df['epoch'], df['metrics/mAP50-95(B)'],
                     color='purple', linewidth=2)
    axes[1, 1].set_title('mAP@50-95 (Higher is Better, Stricter)')
    axes[1, 1].set_xlabel('Epoch')
    axes[1, 1].set_ylim(0, 1)
    axes[1, 1].grid(True, alpha=0.3)

    plt.suptitle('YOLO26 Training ‚Äî Road Sign Detection', fontsize=14)
    plt.tight_layout()
    plt.show()

    # Print final metrics
    last = df.iloc[-1]
    print("\nüìä Final Metrics:")
    print("=" * 40)
    print(f"  mAP@50:    {last['metrics/mAP50(B)']:.4f}")
    print(f"  mAP@50-95: {last['metrics/mAP50-95(B)']:.4f}")
    print(f"  Box loss:  {last['val/box_loss']:.4f}")
    print(f"  Cls loss:  {last['val/cls_loss']:.4f}")
else:
    print("‚ö†Ô∏è results.csv not found.")

## Part C: Evaluate on Validation Set

In [None]:
# ============================================
# EVALUATE ON VALIDATION SET
# ============================================

best_model = YOLO(os.path.join(SAVE_DIR, 'weights', 'best.pt'))

val_results = best_model.val(
    data=os.path.join(DATASET_DIR, 'data.yaml'),
    verbose=True
)

print("\nüìä Validation Results:")
print("=" * 50)
print(f"  mAP@50:    {val_results.box.map50:.4f}")
print(f"  mAP@50-95: {val_results.box.map:.4f}")
print(f"  Precision: {val_results.box.mp:.4f}")
print(f"  Recall:    {val_results.box.mr:.4f}")

print(f"\nüè∑Ô∏è Per-Class mAP@50:")
print(f"  {'Class':<20} {'mAP@50':<10}")
print(f"  {'‚îÄ' * 30}")
for i, name in enumerate(class_names):
    if i < len(val_results.box.ap50):
        print(f"  {name:<20} {val_results.box.ap50[i]:.4f}")

## Part D: Predictions on Validation Images

In [None]:
# ============================================
# PREDICTIONS ON VALIDATION IMAGES
# ============================================

val_img_dir = os.path.join(DATASET_DIR, 'valid', 'images')
val_images = sorted(glob.glob(os.path.join(val_img_dir, '*')))[:6]

pred_results = best_model(val_images, conf=0.25, verbose=False)

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
for idx, (img_path, result) in enumerate(zip(val_images, pred_results)):
    ax = axes[idx // 3, idx % 3]
    annotated = result.plot()
    ax.imshow(annotated[..., ::-1])
    ax.set_title(f"{os.path.basename(img_path)} "
                 f"({len(result.boxes)} detections)", fontsize=10)
    ax.axis('off')

plt.suptitle('Fine-tuned YOLO26 ‚Äî Predictions on Validation Set',
             fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# visualize predictions and bounding boxes from the model on given classes

import math
# Define the class IDs you want to filter for
# Based on your previous chart:
# Run this to see the mapping
print(best_model.names)

# 1. Map your desired names to IDs automatically
# This prevents errors if the class indices change
target_names = ['red_light', 'yellow_light', 'railway_crossing', 't_intersection_l'] # Change these to the signs you want
name_to_id = {v: k for k, v in best_model.names.items()}
desired_class_ids = [name_to_id[name] for name in target_names if name in name_to_id]

print(f"Filtering for IDs: {desired_class_ids} ({target_names})")

# 2. Run inference on your validation folder
# We use 'best_model' (your fine-tuned one) and the actual path variable 'val_img_dir'
results = best_model.predict(
    source=val_img_dir,
    classes=desired_class_ids,
    conf=0.5,
    # exist_ok=True
)

filtered_results = []
for r in results:
    detected_ids = r.boxes.cls.cpu().numpy().astype(int)
    if any(cls_id in desired_class_ids for cls_id in detected_ids):
        filtered_results.append(r)

# 4. Set up the 3-row grid display
num_images = len(filtered_results[:30]) # Limit to first 30 matches for the grid
cols = math.ceil(num_images / 3)
rows = 3

fig, axes = plt.subplots(rows, cols, figsize=(30, 15))
axes = axes.flatten() # Flatten to iterate easily

for i in range(num_images):
    res = filtered_results[i]
    ann_img = res.plot() # Annotate image with boxes

    # Convert BGR to RGB and show
    axes[i].imshow(ann_img[:, :, ::-1])
    axes[i].set_title(f"Image: {res.path.split('/')[-1]}")
    axes[i].axis('off')

# Hide any empty subplots if num_images is odd
for j in range(i + 1, len(axes)):
    axes[j].axis('off')

plt.tight_layout()
plt.show()


## Part E: Compare Pre-trained vs. Fine-tuned



In [None]:
# ============================================
# COMPARE: PRE-TRAINED COCO vs. FINE-TUNED
# ============================================

compare_images = val_images[:15]

fig, axes = plt.subplots(len(compare_images), 2,
                          figsize=(16, 6 * len(compare_images)))

for idx, img_path in enumerate(compare_images):
    # Pre-trained COCO
    res_coco = model_pretrained(img_path, conf=0.25, verbose=False)
    axes[idx, 0].imshow(res_coco[0].plot()[..., ::-1])
    axes[idx, 0].set_title(
        f"Pre-trained COCO ({len(res_coco[0].boxes)} det.)", fontsize=12)
    axes[idx, 0].axis('off')

    # Fine-tuned
    res_ft = best_model(img_path, conf=0.25, verbose=False)
    axes[idx, 1].imshow(res_ft[0].plot()[..., ::-1])
    axes[idx, 1].set_title(
        f"Fine-tuned Road Signs ({len(res_ft[0].boxes)} det.)", fontsize=12)
    axes[idx, 1].axis('off')

plt.suptitle('Pre-trained COCO vs. Fine-tuned Road Sign Model', fontsize=14)
plt.tight_layout()
plt.show()

print("üí° The COCO model knows generic 'stop sign' but not specific")
print("   road sign types. Fine-tuning teaches YOUR specific classes!")

‚úÖ Task 3 Analysis
Answer these questions based on YOUR training results:

Q1. What is your model's mAP@50? What would mAP=1.0 mean?

mAP%50 is 0.7429 and the IoU >= 50%; The mAP=1.0 means perfect object detection which implies that every single object in the dataset was detected, its bounding box perfectly matched the ground truth (IoU = 1.0), and it was classified 100% confidence with no false positives or false negatives.

Q2. Which road sign class does the model detect best? Which is hardest? Why?

Best Class: t_intersection-l with an incredible mAP@50 of 0.956 (95.6%), closely followed by warning (0.929).

Hardest Cass: red_light (0.365) and yellow_light (0.367) are the hardest. no_parking (0.433) also struggled significantly.

Why: Traffic lights are notoriously difficult for object detection. They are physically very small in the frame, often blend into complex backgrounds (like trees or city lights), and the model has to distinguish which tiny colored circle is illuminated. no_parking signs are likely difficult because they rely on reading fine text and smaller symbols, which get easily blurred at lower resolutions or greater distances.

Q3. Name 2 similarities and 2 differences between fine-tuning for detection vs. classification (Week 7).

Similarities:

Transfer learning base - both start with a model pre-trained on a massive dataset and update its weights to learn the specific features of our custom dataset.
training mechanices - both utilize a training loop with similar hyperparameters and evaluate performance on a validation set to monitor for overfitting.
Differences:

annotation/data format - image classification just needs one lable per image while object detection requires complex annotations for every image, detailing the exact spatial coordinates of every single object present.
model ouput & loss function - a classification model ouputs a single probability score for the shole image and calculates simple classification loss, while a detecion model outpus mltiple bounding boxes and class probabilities simultaneously, requiring a complex, multi-part loss function that calculates both localization loss and classification loss.
Q4. Looking at training curves, would more epochs help? What signs indicate overfitting?

The validation curves for box loss stays relatively the same but the training loss goes down which means it is a sign of overfitting. More epochs most likely won't help in this case. The classification loss curves also plateau so more epoches will also not help.

Call instructor to verify completion!

# Task
Please provide a list of target road sign class names that you want to visualize. For example, `['railway_crossing', 'ped_crossing']`.

Here are the available class names from the dataset: `['bus_stop', 'do_not_enter', 'do_not_stop', 'do_not_turn_l', 'do_not_turn_r', 'do_not_u_turn', 'enter_left_lane', 'green_light', 'left_right_lane', 'no_parking', 'parking', 'ped_crossing', 'ped_zebra_cross', 'railway_crossing', 'red_light', 'stop', 't_intersection_l', 'traffic_light', 'u_turn', 'warning', 'yellow_light']`

## Define Target Road Signs

### Subtask:
Define a list of specific road sign class names and their corresponding class IDs that you want to visualize predictions for.


**Reasoning**:
The subtask requires defining target road sign names and their corresponding class IDs for visualization. I will create a Python code block to define these lists, mapping the names to IDs using the previously defined `class_names`.



In [None]:
TARGET_ROAD_SIGNS = ['railway_crossing', 'ped_crossing', 'stop', 'traffic_light']

TARGET_CLASS_IDS = [class_names.index(name) for name in TARGET_ROAD_SIGNS if name in class_names]

print(f"Target Road Signs: {TARGET_ROAD_SIGNS}")
print(f"Corresponding Class IDs: {TARGET_CLASS_IDS}")

## Select Validation Images for Prediction

### Subtask:
Choose a set of validation images from the dataset to run inference on. These will be the images where we look for predictions of the target road signs.


**Reasoning**:
First, I'll define the directory containing the validation images. Then, I'll use `glob.glob` to retrieve all image paths within this directory and sort them to ensure consistent selection. Finally, I'll select a subset of these images for prediction and visualization as per the instructions.



In [None]:
val_img_dir = os.path.join(DATASET_DIR, 'valid', 'images')

# Get all image paths in the validation directory
all_val_images = sorted(glob.glob(os.path.join(val_img_dir, '*')))

# Select a subset of images (e.g., the first 9) for prediction
selected_val_images = all_val_images[:9]

print(f"Selected {len(selected_val_images)} validation images for prediction:")
for img_path in selected_val_images:
    print(f"- {os.path.basename(img_path)}")

**Reasoning**:
Now that the validation images are selected and target class IDs are defined, I will use the fine-tuned model (`best_model`) to make predictions on these images. I will filter the predictions to only show the `TARGET_CLASS_IDS` and set a confidence threshold for clearer visualization.



In [None]:
import matplotlib.pyplot as plt

# Run predictions on the selected validation images, filtering by target class IDs
# Use a reasonable confidence threshold, e.g., 0.5
pred_results_filtered = best_model(selected_val_images, conf=0.5, classes=TARGET_CLASS_IDS, verbose=False)

# Visualize the filtered predictions
fig, axes = plt.subplots(3, 3, figsize=(18, 18))
axes = axes.flatten() # Flatten the 2x3 array of axes for easy iteration

for idx, (img_path, result) in enumerate(zip(selected_val_images, pred_results_filtered)):
    ax = axes[idx]
    annotated = result.plot() # Ultralytics .plot() method draws boxes and labels
    ax.imshow(annotated[..., ::-1]) # Convert BGR to RGB for matplotlib
    ax.set_title(f"Predictions: {os.path.basename(img_path)} ({len(result.boxes)} detections)", fontsize=10)
    ax.axis('off')

plt.suptitle('Fine-tuned YOLO26 - Filtered Predictions on Validation Set', fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

print(f"Displayed predictions for target classes: {TARGET_ROAD_SIGNS}")
print("Confidence threshold for display: 0.5")

## Summarize Detected Objects

### Subtask:
Provide a summary of the number of target road signs detected by the fine-tuned model across the visualized images.


## Summary:

### Q&A
The target road sign class names selected for visualization are 'railway\_crossing', 'ped\_crossing', 'stop', and 'traffic\_light'.

### Data Analysis Key Findings
*   **Target Road Signs and IDs:** The target road signs for visualization were set to \['railway\_crossing', 'ped\_crossing', 'stop', 'traffic\_light'\], corresponding to class IDs \[13, 11, 15, 17\].
*   **Validation Image Selection:** Nine validation images were selected from the `valid/images` directory to perform inference.
*   **Filtered Predictions:** Predictions were successfully run on the selected validation images, focusing only on the defined target road signs with a confidence threshold of 0.5.
*   **Prediction Visualization:** The filtered predictions were visualized, displaying annotated images with bounding boxes for the detected target road signs.

### Insights or Next Steps
*   The process successfully demonstrated the fine-tuned model's capability to identify and visualize specific road signs, indicating a functional object detection pipeline for the chosen classes.
*   To gain a deeper understanding of model performance, the next step should involve quantifying the total number of detections for each target road sign across all analyzed images.


In [None]:
import os
import glob
import matplotlib.pyplot as plt
from ultralytics import YOLO
import yaml
import math
import random  # Import random for sampling

# 1. Load your fine-tuned model weights
model = best_model

# 2. Get dataset configuration
with open(os.path.join(DATASET_DIR, 'data.yaml'), 'r') as f:
    data_config = yaml.safe_load(f)
class_names = data_config['names']

val_img_dir = os.path.join(DATASET_DIR, 'valid', 'images')
val_lbl_dir = os.path.join(DATASET_DIR, 'valid', 'labels')

# 3. Analyze ALL images for False Negatives
missed_samples = []
image_paths = sorted(glob.glob(os.path.join(val_img_dir, '*')))

for img_path in image_paths:
    img_name = os.path.splitext(os.path.basename(img_path))[0]
    label_path = os.path.join(val_lbl_dir, img_name + '.txt')

    if not os.path.exists(label_path):
        continue

    with open(label_path, 'r') as f:
        gt_classes = set(int(line.split()[0]) for line in f.readlines())

    results = model.predict(img_path, conf=0.5, verbose=False)
    pred_classes = set(results[0].boxes.cls.cpu().numpy().astype(int))

    missed_ids = gt_classes - pred_classes

    if missed_ids:
        missed_names = [class_names[cid] for cid in missed_ids]
        missed_samples.append((results[0], missed_names))

# 4. RANDOM SAMPLING
num_to_display = 10
if len(missed_samples) > 0:
    # Randomly pick images from the list of failures
    # If we found fewer failures than num_to_display, take all of them
    current_sample_size = min(len(missed_samples), num_to_display)
    random_missed_samples = random.sample(missed_samples, current_sample_size)

    # 5. Set up the 2-row grid display
    cols = math.ceil(current_sample_size / 2)
    rows = 2

    fig, axes = plt.subplots(rows, cols, figsize=(28, 14))
    axes = axes.flatten()

    for i in range(current_sample_size):
        res, missed_list = random_missed_samples[i]
        ann_img = res.plot()

        axes[i].imshow(ann_img[..., ::-1])
        axes[i].set_title(f"FILE: {os.path.basename(res.path)}\nMISSED: {', '.join(missed_list)}",
                          color='red', fontsize=12, fontweight='bold')
        axes[i].axis('off')

    for j in range(i + 1, len(axes)):
        axes[j].axis('off')

    plt.tight_layout(pad=3.0)
    plt.show()
else:
    print("Zero False Negatives found!")