# 5.2.2 實時物體檢測 (Real-time Object Detection)

**WBS 5.2.2**: YOLO 系列與實時檢測技術

本模組深入探討實時物體檢測技術，重點學習 YOLO (You Only Look Once) 系列算法，從基礎原理到生產級部署。

## 學習目標
- 理解實時物體檢測的核心要求與挑戰
- 掌握 YOLO 系列算法的架構演進
- 實作 YOLOv3/v4 完整檢測流程
- 學習性能優化與多線程處理技術
- 實現生產級實時視訊檢測系統

## 前置知識
- OpenCV DNN 模組 (WBS 5.2.1)
- 圖像預處理技術 (Stage 3)
- Python 多線程基礎
- 基礎深度學習概念

## 課程大綱
1. 實時檢測基礎 (Real-time Detection Basics) - 5%
2. YOLO 系列介紹 (YOLO Series Introduction) - 15%
3. YOLOv3/v4 實作 (YOLOv3/v4 Implementation) - 20%
4. 性能優化技巧 (Performance Optimization) - 15%
5. 多線程處理 (Multi-threading) - 10%
6. 批次推理 (Batch Inference) - 10%
7. 實時視訊檢測 (Real-time Video Detection) - 15%
8. 檢測結果追蹤 (Object Tracking) - 5%
9. 實戰練習 (Hands-on Exercises) - 3%
10. 總結與部署 (Summary & Deployment) - 2%

In [None]:
# Import required libraries
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import time
import urllib.request
from collections import defaultdict, deque
import threading
from queue import Queue

# Configure matplotlib for Chinese display
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS', 'SimHei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.figsize'] = (14, 8)

# Disable warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully")
print(f"OpenCV version: {cv2.__version__}")
print(f"NumPy version: {np.__version__}")

# Check CUDA availability
try:
    cuda_devices = cv2.cuda.getCudaEnabledDeviceCount()
    print(f"CUDA devices available: {cuda_devices}")
except:
    print("CUDA not available - will use CPU")

## 1. 實時檢測基礎 (Real-time Detection Basics) - 5%

### 什麼是實時物體檢測？

**實時物體檢測**是指能夠在視訊流中以足夠高的幀率（通常 ≥30 FPS）檢測並定位多個物體的技術。

### 核心要求

#### 1. 速度要求 (Speed Requirements)
- **實時標準**: ≥30 FPS (每幀 ≤33ms)
- **流暢體驗**: ≥20 FPS (每幀 ≤50ms)
- **可接受**: ≥10 FPS (每幀 ≤100ms)
- **不流暢**: <10 FPS

#### 2. 準確度要求 (Accuracy Requirements)
- **mAP (mean Average Precision)**: 通用評估指標
- **IoU Threshold**: 通常 0.5 或 0.75
- **Recall**: 檢測到的物體比例
- **Precision**: 正確檢測的比例

#### 3. 延遲要求 (Latency Requirements)
- **端到端延遲**: 從圖像輸入到結果輸出
- **處理延遲**: 模型推理時間
- **通信延遲**: 數據傳輸時間

### 性能權衡 (Speed-Accuracy Trade-off)

```
High Accuracy (Slower)           Real-time (Faster)
        ↓                              ↓
  Mask R-CNN  →  Faster R-CNN  →  SSD  →  YOLO
    5 FPS          7 FPS        25 FPS   45+ FPS
    mAP: 39%       mAP: 37%     mAP: 31%  mAP: 33%
```

### 實時檢測挑戰

1. **計算資源限制**: CPU vs GPU vs 邊緣設備
2. **模型大小**: 內存佔用與載入時間
3. **多尺度檢測**: 小物體與大物體同時檢測
4. **遮擋與擁擠**: 重疊物體的分離
5. **光照與視角變化**: 魯棒性要求

### 應用場景

✅ **適合實時檢測**:
- 自動駕駛 (車輛、行人、交通標誌)
- 智能監控 (入侵檢測、異常行為)
- 機器人視覺 (導航、抓取)
- AR/VR 應用 (物體識別與追蹤)
- 工業檢測 (缺陷檢測)

❌ **不適合實時檢測**:
- 醫療影像分析 (高精度優先)
- 衛星圖像分析 (離線處理)
- 藝術品修復 (質量優先)

In [None]:
# Performance requirements comparison
performance_data = {
    "Application": [
        "Autonomous Driving",
        "Smart Surveillance",
        "Robot Navigation",
        "AR/VR",
        "Industrial Inspection",
        "Face Recognition"
    ],
    "Required FPS": [30, 20, 25, 60, 15, 30],
    "Accuracy Priority": ["High", "Medium", "Medium", "Medium", "Very High", "High"],
    "Latency Budget (ms)": [33, 50, 40, 16, 67, 33]
}

print("Real-time Detection Requirements by Application")
print("=" * 85)
print(f"{'Application':<25} {'Required FPS':<15} {'Accuracy':<20} {'Latency (ms)'}")
print("=" * 85)

for i in range(len(performance_data["Application"])):
    print(f"{performance_data['Application'][i]:<25} "
          f"{performance_data['Required FPS'][i]:<15} "
          f"{performance_data['Accuracy Priority'][i]:<20} "
          f"{performance_data['Latency Budget (ms)'][i]}")
print("=" * 85)

# Visualize FPS requirements
fig, ax = plt.subplots(figsize=(12, 6))
apps = performance_data["Application"]
fps = performance_data["Required FPS"]

colors = ['#FF6B6B' if f >= 30 else '#4ECDC4' if f >= 20 else '#95E1D3' for f in fps]
bars = ax.barh(apps, fps, color=colors, edgecolor='black', linewidth=1.5)

# Add FPS threshold lines
ax.axvline(x=30, color='red', linestyle='--', linewidth=2, label='Real-time (30 FPS)', alpha=0.7)
ax.axvline(x=20, color='orange', linestyle='--', linewidth=2, label='Smooth (20 FPS)', alpha=0.7)

# Add value labels
for i, (app, f) in enumerate(zip(apps, fps)):
    ax.text(f + 1, i, f'{f} FPS', va='center', fontsize=10, fontweight='bold')

ax.set_xlabel('Frames Per Second (FPS)', fontsize=12, fontweight='bold')
ax.set_title('FPS Requirements by Application Domain', fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='lower right', fontsize=10)
ax.grid(axis='x', alpha=0.3, linestyle=':')
ax.set_xlim(0, 70)

plt.tight_layout()
plt.show()

print("\n💡 Key Insight: Most real-time applications require ≥20 FPS for smooth operation")

## 2. YOLO 系列介紹 (YOLO Series Introduction) - 15%

### YOLO: You Only Look Once

**YOLO** 是一種革命性的單階段物體檢測算法，由 Joseph Redmon 於 2015 年提出。核心思想是將物體檢測視為**回歸問題**，直接從圖像像素預測邊界框和類別概率。

### 核心創新

#### 1. 單階段檢測 (Single-Shot Detection)
- **傳統兩階段**: Region Proposal → Classification (R-CNN 系列)
- **YOLO 單階段**: 一次前向傳播完成檢測
- **優勢**: 速度快，端到端訓練

#### 2. 全局上下文 (Global Context)
- 看整張圖像，而非局部區域
- 減少背景誤檢測
- 更好的物體關係理解

#### 3. 統一架構 (Unified Architecture)
```
Input Image → CNN Backbone → Detection Head → Outputs
                                               ↓
                          [Bounding Boxes + Class Probabilities]
```

### YOLO 版本演進

#### YOLOv1 (2015) - 開創者
- **架構**: 24 convolutional layers + 2 fully connected layers
- **輸入**: 448×448
- **速度**: 45 FPS (base), 155 FPS (fast)
- **mAP**: 63.4% (VOC 2007)
- **缺點**: 小物體檢測差，定位精度低

#### YOLOv2 / YOLO9000 (2016) - 改進版
- **創新**: 
  - Batch Normalization
  - Anchor Boxes (借鑒 Faster R-CNN)
  - Darknet-19 backbone (19 layers)
  - Multi-scale training
- **速度**: 40-90 FPS
- **mAP**: 78.6% (VOC 2007)
- **特點**: 可檢測 9000+ 類別

#### YOLOv3 (2018) - 多尺度檢測
- **架構**: Darknet-53 (53 layers)
- **創新**:
  - 3 個尺度的特徵圖 (13×13, 26×26, 52×52)
  - 每個尺度 3 個 anchor boxes
  - Residual connections
  - 改進的小物體檢測
- **速度**: 20-30 FPS (416×416)
- **mAP**: 57.9% (COCO)
- **狀態**: 本模組重點

#### YOLOv4 (2020) - 性能巔峰
- **架構**: CSPDarknet53 + SPP + PANet
- **創新**:
  - CSPNet (Cross Stage Partial Network)
  - Mish activation
  - DropBlock regularization
  - Mosaic data augmentation
  - CIoU loss
- **速度**: 65 FPS (Tesla V100)
- **mAP**: 43.5% (COCO)
- **特點**: 最佳速度/精度權衡

#### YOLOv5 (2020) - PyTorch 重寫
- **特點**: PyTorch 實現，易用性高
- **版本**: YOLOv5s/m/l/x (不同大小)
- **速度**: 140 FPS (YOLOv5s)
- **mAP**: 37.4-50.7% (COCO)
- **生態**: 部署工具豐富

#### YOLOv6-v8 (2022-2023) - 持續演進
- **YOLOv6**: 工業界優化 (美團)
- **YOLOv7**: 架構創新 (Edge TPU 優化)
- **YOLOv8**: Ultralytics 最新版
  - 支持檢測、分割、分類、姿態估計
  - 統一 API 設計
  - mAP: 53.9% (YOLOv8x)

### YOLO vs 其他檢測器

| Model | Type | FPS | mAP (COCO) | Year |
|-------|------|-----|------------|------|
| **R-CNN** | Two-stage | 0.05 | - | 2014 |
| **Fast R-CNN** | Two-stage | 0.5 | - | 2015 |
| **Faster R-CNN** | Two-stage | 7 | 42.7% | 2015 |
| **YOLOv1** | One-stage | 45 | - | 2015 |
| **SSD300** | One-stage | 46 | 25.1% | 2016 |
| **YOLOv2** | One-stage | 67 | 21.6% | 2016 |
| **YOLOv3** | One-stage | 30 | 33.0% | 2018 |
| **YOLOv4** | One-stage | 65 | 43.5% | 2020 |
| **EfficientDet** | One-stage | 30 | 51.0% | 2020 |
| **YOLOv8** | One-stage | 80+ | 53.9% | 2023 |

In [None]:
# YOLO evolution visualization
yolo_evolution = {
    "Version": ["v1", "v2", "v3", "v4", "v5", "v8"],
    "Year": [2015, 2016, 2018, 2020, 2020, 2023],
    "FPS": [45, 67, 30, 65, 140, 80],
    "mAP": [0.0, 21.6, 33.0, 43.5, 37.4, 53.9],  # COCO dataset
    "Backbone": ["Custom", "Darknet-19", "Darknet-53", "CSPDarknet53", "CSPDarknet", "CSPDarknet"],
    "Key Innovation": [
        "Single-stage detection",
        "Anchor boxes",
        "Multi-scale FPN",
        "CSPNet + Bag of Freebies",
        "PyTorch implementation",
        "Unified architecture"
    ]
}

# Create comparison visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Speed evolution
versions = yolo_evolution["Version"]
fps = yolo_evolution["FPS"]
years = yolo_evolution["Year"]

colors_fps = plt.cm.viridis(np.linspace(0, 1, len(versions)))
ax1.plot(years, fps, marker='o', linewidth=3, markersize=12, color='blue', alpha=0.7)
for i, (year, f, ver) in enumerate(zip(years, fps, versions)):
    ax1.scatter(year, f, s=300, c=[colors_fps[i]], edgecolor='black', linewidth=2, zorder=5)
    ax1.text(year, f + 8, f'YOLO{ver}\n{f} FPS', ha='center', fontsize=9, fontweight='bold')

ax1.set_xlabel('Year', fontsize=12, fontweight='bold')
ax1.set_ylabel('Speed (FPS)', fontsize=12, fontweight='bold')
ax1.set_title('YOLO Speed Evolution', fontsize=14, fontweight='bold', pad=20)
ax1.grid(True, alpha=0.3, linestyle=':')
ax1.set_ylim(0, 160)

# Plot 2: Accuracy evolution
map_scores = yolo_evolution["mAP"]
# Filter out v1 (no COCO mAP)
valid_indices = [i for i, m in enumerate(map_scores) if m > 0]
valid_years = [years[i] for i in valid_indices]
valid_map = [map_scores[i] for i in valid_indices]
valid_versions = [versions[i] for i in valid_indices]

colors_map = plt.cm.plasma(np.linspace(0, 1, len(valid_versions)))
ax2.plot(valid_years, valid_map, marker='s', linewidth=3, markersize=12, color='red', alpha=0.7)
for i, (year, m, ver) in enumerate(zip(valid_years, valid_map, valid_versions)):
    ax2.scatter(year, m, s=300, c=[colors_map[i]], edgecolor='black', linewidth=2, zorder=5)
    ax2.text(year, m + 2.5, f'YOLO{ver}\n{m:.1f}%', ha='center', fontsize=9, fontweight='bold')

ax2.set_xlabel('Year', fontsize=12, fontweight='bold')
ax2.set_ylabel('mAP on COCO (%)', fontsize=12, fontweight='bold')
ax2.set_title('YOLO Accuracy Evolution', fontsize=14, fontweight='bold', pad=20)
ax2.grid(True, alpha=0.3, linestyle=':')
ax2.set_ylim(0, 60)

plt.tight_layout()
plt.show()

# Print detailed comparison
print("\nYOLO Version Comparison")
print("=" * 100)
print(f"{'Version':<10} {'Year':<8} {'FPS':<8} {'mAP (COCO)':<15} {'Key Innovation':<50}")
print("=" * 100)
for i in range(len(yolo_evolution["Version"])):
    ver = yolo_evolution["Version"][i]
    year = yolo_evolution["Year"][i]
    fps = yolo_evolution["FPS"][i]
    map_score = yolo_evolution["mAP"][i]
    innovation = yolo_evolution["Key Innovation"][i]
    
    map_str = f"{map_score:.1f}%" if map_score > 0 else "N/A"
    print(f"YOLO{ver:<5} {year:<8} {fps:<8} {map_str:<15} {innovation:<50}")
print("=" * 100)

print("\n💡 Trend Analysis:")
print("  - Speed: 3.1x improvement (v1 to v5)")
print("  - Accuracy: 2.5x improvement (v2 to v8 on COCO)")
print("  - Architecture: Increasingly complex backbones with better feature extraction")
print("  - Focus: Balancing speed and accuracy for practical deployment")

### YOLO 工作原理

#### 1. 網格劃分 (Grid Division)
將輸入圖像劃分為 S×S 網格（如 13×13）

#### 2. Anchor Boxes
每個網格單元預測 B 個邊界框（通常 B=3）

#### 3. 預測輸出
每個邊界框預測：
- **位置**: (x, y, w, h) - 中心坐標和寬高
- **信心度**: objectness score - 包含物體的概率
- **類別**: class probabilities - 各類別的概率

#### 4. 輸出張量
```
Shape: S × S × B × (5 + C)
      ↓   ↓   ↓   ↓    ↓
    Grid Anchors (x,y,w,h,conf) + Classes
```

#### 5. 後處理
- **Confidence Filtering**: 過濾低信心度檢測
- **NMS (Non-Maximum Suppression)**: 移除重複檢測

### 為什麼選擇 YOLOv3/v4？

✅ **YOLOv3 優勢**:
- OpenCV DNN 完美支援
- 預訓練模型豐富（COCO 80 classes）
- 速度與精度平衡良好
- 文檔和社群支援完善
- 適合生產環境部署

✅ **YOLOv4 優勢**:
- 性能提升顯著（+10% mAP）
- 仍保持實時速度
- 更好的小物體檢測
- 先進的訓練技巧

In [None]:
# YOLO architecture visualization
print("YOLOv3 Architecture Overview")
print("=" * 80)
print("\nInput: 416×416×3 RGB Image")
print("  ↓")
print("Darknet-53 Backbone (53 convolutional layers)")
print("  ↓")
print("Feature Pyramid Network (FPN)")
print("  ├─ Scale 1: 13×13 grid (large objects)")
print("  │   └─ 3 anchors × (5 + 80 classes) = 255 channels")
print("  ├─ Scale 2: 26×26 grid (medium objects)")
print("  │   └─ 3 anchors × (5 + 80 classes) = 255 channels")
print("  └─ Scale 3: 52×52 grid (small objects)")
print("      └─ 3 anchors × (5 + 80 classes) = 255 channels")
print("  ↓")
print("Post-processing")
print("  ├─ Confidence filtering (threshold: 0.5)")
print("  └─ Non-Maximum Suppression (NMS threshold: 0.4)")
print("  ↓")
print("Output: Detected objects with [class, confidence, x, y, w, h]")
print("=" * 80)

# Calculate total predictions
scales = [(13, 13), (26, 26), (52, 52)]
anchors_per_scale = 3
total_predictions = sum(w * h * anchors_per_scale for w, h in scales)

print(f"\nTotal anchor boxes per image: {total_predictions:,}")
print(f"  - 13×13 scale: {13*13*3:,} predictions")
print(f"  - 26×26 scale: {26*26*3:,} predictions")
print(f"  - 52×52 scale: {52*52*3:,} predictions")
print(f"\nAfter NMS: Typically 1-100 final detections per image")

# Visualize multi-scale detection
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
scales_info = [(13, "Large Objects"), (26, "Medium Objects"), (52, "Small Objects")]

for ax, (grid_size, obj_type) in zip(axes, scales_info):
    # Create grid visualization
    img = np.ones((grid_size, grid_size, 3), dtype=np.uint8) * 255
    
    # Draw grid
    cell_size = 10
    img_large = cv2.resize(img, (grid_size * cell_size, grid_size * cell_size), 
                          interpolation=cv2.INTER_NEAREST)
    
    # Draw grid lines
    for i in range(grid_size + 1):
        cv2.line(img_large, (0, i * cell_size), (grid_size * cell_size, i * cell_size), 
                (200, 200, 200), 1)
        cv2.line(img_large, (i * cell_size, 0), (i * cell_size, grid_size * cell_size), 
                (200, 200, 200), 1)
    
    # Highlight a few cells
    sample_cells = np.random.choice(range(1, grid_size-1), size=min(5, grid_size-2), replace=False)
    for cell in sample_cells:
        cv2.rectangle(img_large, 
                     (cell * cell_size, cell * cell_size),
                     ((cell + 1) * cell_size, (cell + 1) * cell_size),
                     (100, 149, 237), -1)
    
    ax.imshow(img_large)
    ax.set_title(f'{grid_size}×{grid_size} Grid\n{obj_type}\n'
                f'{grid_size*grid_size*3:,} predictions',
                fontsize=11, fontweight='bold')
    ax.axis('off')

plt.suptitle('YOLOv3 Multi-Scale Detection', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print("\n💡 Multi-scale detection allows YOLO to detect objects of various sizes effectively")

## 3. YOLOv3/v4 實作 (Implementation) - 20%

### 準備 YOLO 模型文件

要運行 YOLOv3，我們需要以下文件：

1. **yolov3.cfg** - 網絡架構配置
2. **yolov3.weights** - 預訓練權重 (~240MB)
3. **coco.names** - COCO 數據集類別名稱

### COCO 數據集 80 類

COCO (Common Objects in Context) 包含 80 個常見物體類別。

In [None]:
# Setup YOLO model directory
YOLO_MODEL_DIR = Path('../assets/models/yolo')
YOLO_MODEL_DIR.mkdir(parents=True, exist_ok=True)

# COCO class names (80 classes)
COCO_CLASSES = [
    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',
    'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',
    'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
    'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
    'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',
    'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
    'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator',
    'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# Save class names to file
coco_names_path = YOLO_MODEL_DIR / 'coco.names'
with open(coco_names_path, 'w') as f:
    f.write('\n'.join(COCO_CLASSES))

print(f"✅ Saved COCO class names to {coco_names_path}")
print(f"\nCOCO Dataset: {len(COCO_CLASSES)} classes")
print("\nCategories:")
categories = {
    "People & Animals": COCO_CLASSES[0:1] + COCO_CLASSES[14:24],
    "Vehicles": COCO_CLASSES[1:9],
    "Outdoor Objects": COCO_CLASSES[9:14],
    "Accessories": COCO_CLASSES[24:33],
    "Sports": COCO_CLASSES[33:39],
    "Kitchen": COCO_CLASSES[39:56],
    "Furniture": COCO_CLASSES[56:62] + [COCO_CLASSES[63]],
    "Electronics": COCO_CLASSES[62:63] + COCO_CLASSES[64:70],
    "Appliances": COCO_CLASSES[70:75],
    "Indoor Objects": COCO_CLASSES[75:80]
}

for category, items in categories.items():
    print(f"  {category}: {len(items)} classes")
    print(f"    {', '.join(items[:5])}{'...' if len(items) > 5 else ''}")

# Generate colors for each class
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(COCO_CLASSES), 3), dtype=np.uint8)

print(f"\n✅ Generated {len(COLORS)} unique colors for visualization")

In [None]:
# Check for YOLOv3 model files
yolov3_cfg = YOLO_MODEL_DIR / 'yolov3.cfg'
yolov3_weights = YOLO_MODEL_DIR / 'yolov3.weights'

# Alternative: YOLOv3-tiny (faster but less accurate)
yolov3_tiny_cfg = YOLO_MODEL_DIR / 'yolov3-tiny.cfg'
yolov3_tiny_weights = YOLO_MODEL_DIR / 'yolov3-tiny.weights'

# Alternative: YOLOv4
yolov4_cfg = YOLO_MODEL_DIR / 'yolov4.cfg'
yolov4_weights = YOLO_MODEL_DIR / 'yolov4.weights'

print("Checking for YOLO model files...\n")

models_status = [
    ("YOLOv3", yolov3_cfg, yolov3_weights, "237 MB"),
    ("YOLOv3-tiny", yolov3_tiny_cfg, yolov3_tiny_weights, "33 MB"),
    ("YOLOv4", yolov4_cfg, yolov4_weights, "245 MB")
]

available_models = []

for model_name, cfg, weights, size in models_status:
    cfg_exists = cfg.exists()
    weights_exists = weights.exists()
    
    status = "✅" if (cfg_exists and weights_exists) else "⚠️"
    print(f"{status} {model_name}:")
    print(f"   Config (.cfg):   {cfg.name:<25} {'Found' if cfg_exists else 'Not found'}")
    print(f"   Weights (.weights): {weights.name:<22} {'Found' if weights_exists else 'Not found'} ({size})")
    print()
    
    if cfg_exists and weights_exists:
        available_models.append((model_name, cfg, weights))

if len(available_models) == 0:
    print("\n⚠️  No YOLO models found. Please download:")
    print("\n📥 Download YOLOv3 (recommended):")
    print("   Config:  https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg")
    print("   Weights: https://pjreddie.com/media/files/yolov3.weights")
    print("\n📥 Download YOLOv3-tiny (faster, for testing):")
    print("   Config:  https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-tiny.cfg")
    print("   Weights: https://pjreddie.com/media/files/yolov3-tiny.weights")
    print("\n📥 Download YOLOv4 (best performance):")
    print("   Config:  https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg")
    print("   Weights: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights")
    print(f"\nSave files to: {YOLO_MODEL_DIR}/")
    YOLO_MODEL_AVAILABLE = False
else:
    print(f"\n✅ Found {len(available_models)} YOLO model(s) ready to use")
    YOLO_MODEL_AVAILABLE = True
    
    # Use the first available model
    DEFAULT_MODEL_NAME, DEFAULT_CFG, DEFAULT_WEIGHTS = available_models[0]
    print(f"\n🎯 Will use {DEFAULT_MODEL_NAME} for demonstrations")

### 載入 YOLO 模型

In [None]:
def load_yolo_model(cfg_path, weights_path, backend=cv2.dnn.DNN_BACKEND_OPENCV, 
                   target=cv2.dnn.DNN_TARGET_CPU):
    """
    Load YOLO model using OpenCV DNN
    
    Parameters:
    -----------
    cfg_path : str or Path
        Path to .cfg file
    weights_path : str or Path
        Path to .weights file
    backend : int
        DNN backend (default: OpenCV)
    target : int
        DNN target device (default: CPU)
        
    Returns:
    --------
    net : cv2.dnn_Net
        Loaded YOLO network
    output_layers : list
        Names of output layers
    """
    print(f"Loading YOLO model...")
    print(f"  Config: {Path(cfg_path).name}")
    print(f"  Weights: {Path(weights_path).name}")
    
    start_time = time.time()
    
    # Load network
    net = cv2.dnn.readNetFromDarknet(str(cfg_path), str(weights_path))
    
    # Set backend and target
    net.setPreferableBackend(backend)
    net.setPreferableTarget(target)
    
    # Get output layer names
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
    
    load_time = time.time() - start_time
    
    print(f"✅ Model loaded in {load_time:.2f}s")
    print(f"   Total layers: {len(layer_names)}")
    print(f"   Output layers: {output_layers}")
    
    return net, output_layers


# Load model if available
if YOLO_MODEL_AVAILABLE:
    yolo_net, yolo_output_layers = load_yolo_model(DEFAULT_CFG, DEFAULT_WEIGHTS)
    print(f"\n🎯 {DEFAULT_MODEL_NAME} ready for inference")
else:
    print("\n⚠️  Skipping model loading - no YOLO models available")
    yolo_net = None
    yolo_output_layers = None

### YOLO 檢測函數

實現完整的 YOLO 檢測流程：
1. 圖像預處理（blobFromImage）
2. 前向傳播
3. 解析輸出
4. 信心度過濾
5. Non-Maximum Suppression (NMS)

In [None]:
def detect_objects_yolo(image, net, output_layers, classes=COCO_CLASSES,
                       confidence_threshold=0.5, nms_threshold=0.4, 
                       input_size=(416, 416)):
    """
    Detect objects using YOLO
    
    Parameters:
    -----------
    image : np.ndarray
        Input image (BGR)
    net : cv2.dnn_Net
        YOLO network
    output_layers : list
        Output layer names
    classes : list
        Class names
    confidence_threshold : float
        Minimum confidence (0.0-1.0)
    nms_threshold : float
        NMS IoU threshold (0.0-1.0)
    input_size : tuple
        Network input size (width, height)
        Common sizes: (320, 320), (416, 416), (608, 608)
        
    Returns:
    --------
    detections : list
        List of (class_id, class_name, confidence, x, y, w, h) tuples
    inference_time : float
        Inference time in seconds
    """
    height, width = image.shape[:2]
    
    # Prepare blob from image
    # YOLO expects normalized input (1/255 scaling)
    blob = cv2.dnn.blobFromImage(
        image,
        scalefactor=1/255.0,
        size=input_size,
        mean=(0, 0, 0),
        swapRB=True,  # BGR to RGB
        crop=False
    )
    
    # Forward pass
    net.setInput(blob)
    start_time = time.time()
    layer_outputs = net.forward(output_layers)
    inference_time = time.time() - start_time
    
    # Parse detections
    boxes = []
    confidences = []
    class_ids = []
    
    for output in layer_outputs:
        for detection in output:
            # Detection format: [x, y, w, h, objectness, class1_prob, class2_prob, ...]
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            
            if confidence > confidence_threshold:
                # YOLO returns center coordinates and dimensions
                # Scale to image size
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                
                # Convert to top-left corner coordinates
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)
    
    # Apply Non-Maximum Suppression
    indices = cv2.dnn.NMSBoxes(boxes, confidences, confidence_threshold, nms_threshold)
    
    # Prepare final detections
    detections = []
    
    if len(indices) > 0:
        for i in indices.flatten():
            x, y, w, h = boxes[i]
            class_id = class_ids[i]
            confidence = confidences[i]
            class_name = classes[class_id] if class_id < len(classes) else "unknown"
            
            detections.append((class_id, class_name, confidence, x, y, w, h))
    
    return detections, inference_time


def draw_yolo_detections(image, detections, colors=COLORS, thickness=2, font_scale=0.5):
    """
    Draw YOLO detection results
    
    Parameters:
    -----------
    image : np.ndarray
        Input image
    detections : list
        Detection results from detect_objects_yolo()
    colors : np.ndarray
        Color array for each class
    thickness : int
        Box line thickness
    font_scale : float
        Label font scale
        
    Returns:
    --------
    output : np.ndarray
        Image with drawn detections
    """
    output = image.copy()
    
    for (class_id, class_name, confidence, x, y, w, h) in detections:
        # Get color for this class
        color = colors[class_id].tolist() if class_id < len(colors) else [0, 255, 0]
        
        # Draw bounding box
        cv2.rectangle(output, (x, y), (x + w, y + h), color, thickness)
        
        # Prepare label
        label = f"{class_name}: {confidence:.2f}"
        
        # Get label size
        (label_w, label_h), baseline = cv2.getTextSize(
            label, cv2.FONT_HERSHEY_SIMPLEX, font_scale, 1
        )
        
        # Draw label background
        cv2.rectangle(output,
                     (x, y - label_h - baseline - 5),
                     (x + label_w, y),
                     color, -1)
        
        # Draw label text
        cv2.putText(output, label,
                   (x, y - baseline - 2),
                   cv2.FONT_HERSHEY_SIMPLEX,
                   font_scale, (255, 255, 255), 1, cv2.LINE_AA)
    
    return output


print("✅ YOLO detection functions defined")
print("\nFunctions:")
print("  - detect_objects_yolo(): Complete detection pipeline")
print("  - draw_yolo_detections(): Visualize results with bounding boxes")

### 測試 YOLO 檢測

In [None]:
if YOLO_MODEL_AVAILABLE and yolo_net is not None:
    # Load test image
    test_image_paths = [
        '../assets/images/basic/assassin.jpg',
        '../assets/images/basic/1.jpg',
        '../assets/images/basic/2.jpg',
        '../assets/images/objects/street.jpg'
    ]
    
    test_img = None
    for path in test_image_paths:
        if Path(path).exists():
            test_img = cv2.imread(path)
            print(f"📷 Loaded test image: {path}")
            print(f"   Size: {test_img.shape[1]}×{test_img.shape[0]}")
            break
    
    if test_img is None:
        # Create demo image
        test_img = np.ones((480, 640, 3), dtype=np.uint8) * 200
        cv2.putText(test_img, "No test image found", (150, 240),
                   cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 2)
        print("⚠️  Created placeholder image")
    
    # Detect objects
    print("\n🔍 Running YOLO detection...")
    detections, inf_time = detect_objects_yolo(
        test_img, yolo_net, yolo_output_layers,
        confidence_threshold=0.5,
        nms_threshold=0.4,
        input_size=(416, 416)
    )
    
    # Draw results
    result_img = draw_yolo_detections(test_img, detections)
    
    # Display
    fig, axes = plt.subplots(1, 2, figsize=(16, 7))
    
    axes[0].imshow(cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB))
    axes[0].set_title('Original Image', fontsize=14, fontweight='bold')
    axes[0].axis('off')
    
    axes[1].imshow(cv2.cvtColor(result_img, cv2.COLOR_BGR2RGB))
    fps = 1.0 / inf_time if inf_time > 0 else 0
    axes[1].set_title(f'{DEFAULT_MODEL_NAME} Detection\n'
                     f'Objects: {len(detections)} | '
                     f'Time: {inf_time*1000:.1f}ms | '
                     f'FPS: {fps:.1f}',
                     fontsize=14, fontweight='bold')
    axes[1].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Print detection details
    print(f"\n📊 Detection Results:")
    print(f"   Inference time: {inf_time*1000:.2f}ms")
    print(f"   Throughput: {fps:.2f} FPS")
    print(f"   Objects detected: {len(detections)}\n")
    
    if len(detections) > 0:
        print("   Detected objects:")
        # Group by class
        class_counts = defaultdict(list)
        for (cid, cname, conf, x, y, w, h) in detections:
            class_counts[cname].append(conf)
        
        for cname, confs in sorted(class_counts.items()):
            avg_conf = np.mean(confs)
            print(f"     - {cname}: {len(confs)} instance(s), avg conf: {avg_conf:.3f}")
    else:
        print("   No objects detected. Try:")
        print("     - Lowering confidence_threshold (e.g., 0.3)")
        print("     - Using a different test image")
        
else:
    print("⚠️  Skipping YOLO detection demo - model not available")
    print("    Please download YOLO model files first")