# üèóÔ∏è Task 7: YOLO Architecture Deep Dive

## üéØ Objective
Understand the complete YOLOv11 architecture - from input image to detection output.

---

## üìö YOLO Evolution

| Version | Year | Key Innovation |
|---------|------|----------------|
| YOLOv1 | 2016 | Single-shot detection |
| YOLOv3 | 2018 | Multi-scale predictions |
| YOLOv5 | 2020 | PyTorch, easy training |
| YOLOv8 | 2023 | Anchor-free, decoupled head |
| **YOLOv11** | 2024 | C2PSA, improved efficiency |

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

PROJECT_ROOT = Path(r"D:\het\SELF\RP\YOLO-V11-PRO")
print("‚úÖ Libraries imported!")

---

# Part 1: YOLO High-Level Architecture

## üèóÔ∏è Three Main Components

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                        YOLOv11                              ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ    BACKBONE    ‚îÇ      NECK      ‚îÇ          HEAD             ‚îÇ
‚îÇ   (CSPDarknet) ‚îÇ  (PANet+FPN)   ‚îÇ   (Decoupled Head)        ‚îÇ
‚îÇ                ‚îÇ                ‚îÇ                           ‚îÇ
‚îÇ  Feature       ‚îÇ  Feature       ‚îÇ  Classification +         ‚îÇ
‚îÇ  Extraction    ‚îÇ  Fusion        ‚îÇ  Box Regression           ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
        ‚Üì                ‚Üì                      ‚Üì
   Low‚ÜíHigh          Multi-scale         Final Predictions
   Features          Features            [class, x, y, w, h]
```

### 1. Backbone: Feature Extraction
- Extracts hierarchical features from input image
- Uses CSPDarknet (Cross Stage Partial)
- Output: Feature maps at different scales

### 2. Neck: Feature Fusion
- Combines features from different scales
- PANet (Path Aggregation Network) + FPN (Feature Pyramid)
- Enables detecting objects of various sizes

### 3. Head: Detection
- Decoupled head (separate cls/reg branches)
- Anchor-free design
- Outputs: Class probabilities + Bounding box coordinates

---

# Part 2: Backbone - CSPDarknet

## üìê Key Concepts

### Cross Stage Partial (CSP) Network
Splits feature map into two parts:
- One part goes through dense layers
- Other part skips directly
- Both merged at the end

**Benefits:**
- Reduces computation
- Reduces memory usage
- Maintains accuracy

```
Input Feature Map
       ‚îÇ
   ‚îå‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îê
   ‚îÇ Split ‚îÇ
   ‚îî‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îò
   ‚îå‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îê
   ‚Üì       ‚Üì
Part 1   Part 2
   ‚îÇ       ‚îÇ
Dense    Skip
Layers   Connection
   ‚îÇ       ‚îÇ
   ‚îî‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
   Concatenate
       ‚îÇ
   Output
```

In [None]:
# ============================================================
# VISUALIZE BACKBONE FEATURE EXTRACTION
# ============================================================

def visualize_backbone_concept():
    """Visualize how backbone extracts multi-scale features."""
    
    fig, axes = plt.subplots(1, 5, figsize=(16, 4))
    fig.suptitle('üîç Backbone: Multi-Scale Feature Extraction', fontsize=14, fontweight='bold')
    
    # Simulate feature map sizes at different stages
    stages = [
        ('Input\n640√ó640√ó3', 640, 'lightblue'),
        ('Stage 1\n320√ó320√ó64', 320, 'lightgreen'),
        ('Stage 2\n160√ó160√ó128', 160, 'yellow'),
        ('Stage 3\n80√ó80√ó256', 80, 'orange'),
        ('Stage 4\n40√ó40√ó512', 40, 'red'),
    ]
    
    for ax, (label, size, color) in zip(axes, stages):
        # Draw rectangle representing feature map
        rect = plt.Rectangle((0.1, 0.1), 0.8, 0.8, fill=True, color=color, alpha=0.6)
        ax.add_patch(rect)
        ax.text(0.5, 0.5, label, ha='center', va='center', fontsize=10, fontweight='bold')
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)
        ax.set_aspect('equal')
        ax.axis('off')
    
    # Draw arrows
    for i in range(4):
        plt.annotate('', xy=(0.22 + i*0.2, 0.5), xytext=(0.18 + i*0.2, 0.5),
                    xycoords='figure fraction',
                    arrowprops=dict(arrowstyle='->', color='black', lw=2))
    
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'backbone_stages.png', dpi=150)
    plt.show()

visualize_backbone_concept()

---

# Part 3: Neck - Feature Pyramid + PANet

## üìê Feature Pyramid Network (FPN)

Combines features from different scales:

```
Backbone Output:          FPN:
                    
P5 (40√ó40)  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫  P5 ‚îÄ‚îÄ‚îÄ‚îê
                              ‚Üì upsample
P4 (80√ó80)  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫  P4 ‚îÄ‚îÄ‚îÄ‚î§ + lateral
                              ‚Üì upsample  
P3 (160√ó160) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫  P3 ‚îÄ‚îÄ‚îÄ‚î§ + lateral
```

## Path Aggregation Network (PANet)

Adds bottom-up path after FPN:

```
FPN Output:           PANet:

P3 (large) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ N3 ‚îÄ‚îÄ‚îÄ‚îê
                             ‚Üì downsample
P4 (medium) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ N4 ‚îÄ‚îÄ‚îÄ‚î§ + lateral
                             ‚Üì downsample
P5 (small) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ N5 ‚îÄ‚îÄ‚îÄ‚îò
```

**Why both directions?**
- Top-down (FPN): Semantic information flows down
- Bottom-up (PANet): Localization information flows up

In [None]:
# Visualize FPN + PANet
def visualize_neck():
    """Visualize the Neck architecture (FPN + PANet)."""
    
    fig, ax = plt.subplots(figsize=(12, 8))
    ax.set_xlim(0, 12)
    ax.set_ylim(0, 10)
    
    # Backbone outputs
    backbone_boxes = [
        (1, 7, 1.5, 1.5, 'P5\n40√ó40', 'lightcoral'),
        (1, 4.5, 2, 2, 'P4\n80√ó80', 'lightyellow'),
        (1, 1.5, 2.5, 2.5, 'P3\n160√ó160', 'lightgreen'),
    ]
    
    # FPN outputs
    fpn_boxes = [
        (5, 7, 1.5, 1.5, 'F5', 'coral'),
        (5, 4.5, 2, 2, 'F4', 'yellow'),
        (5, 1.5, 2.5, 2.5, 'F3', 'green'),
    ]
    
    # PANet outputs
    panet_boxes = [
        (9, 7, 1.5, 1.5, 'N5', 'darkred'),
        (9, 4.5, 2, 2, 'N4', 'goldenrod'),
        (9, 1.5, 2.5, 2.5, 'N3', 'darkgreen'),
    ]
    
    # Draw boxes
    for boxes, label in [(backbone_boxes, 'Backbone'), (fpn_boxes, 'FPN'), (panet_boxes, 'PANet')]:
        for x, y, w, h, text, color in boxes:
            rect = plt.Rectangle((x, y), w, h, fill=True, color=color, alpha=0.7, edgecolor='black')
            ax.add_patch(rect)
            ax.text(x + w/2, y + h/2, text, ha='center', va='center', fontsize=10, fontweight='bold')
    
    # Labels
    ax.text(2, 9.5, 'Backbone', ha='center', fontsize=12, fontweight='bold')
    ax.text(6, 9.5, 'FPN (Top-Down)', ha='center', fontsize=12, fontweight='bold')
    ax.text(10, 9.5, 'PANet (Bottom-Up)', ha='center', fontsize=12, fontweight='bold')
    
    # Arrows
    # Backbone to FPN
    for y in [7.75, 5.5, 2.75]:
        ax.annotate('', xy=(4.9, y), xytext=(3.6, y),
                   arrowprops=dict(arrowstyle='->', color='black', lw=1.5))
    
    # FPN vertical (top-down)
    ax.annotate('', xy=(6, 6.4), xytext=(6, 6.9), arrowprops=dict(arrowstyle='->', color='blue', lw=2))
    ax.annotate('', xy=(6, 3.9), xytext=(6, 4.4), arrowprops=dict(arrowstyle='->', color='blue', lw=2))
    
    # FPN to PANet
    for y in [7.75, 5.5, 2.75]:
        ax.annotate('', xy=(8.9, y), xytext=(7.1, y),
                   arrowprops=dict(arrowstyle='->', color='black', lw=1.5))
    
    # PANet vertical (bottom-up)
    ax.annotate('', xy=(10, 4.4), xytext=(10, 4.1), arrowprops=dict(arrowstyle='->', color='red', lw=2))
    ax.annotate('', xy=(10, 6.9), xytext=(10, 6.6), arrowprops=dict(arrowstyle='->', color='red', lw=2))
    
    ax.set_title('üîÑ Neck: FPN + PANet Feature Fusion', fontsize=14, fontweight='bold')
    ax.axis('off')
    
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'neck_architecture.png', dpi=150)
    plt.show()

visualize_neck()

---

# Part 4: Head - Decoupled Detection Head

## üìê Anchor-Free Design

YOLOv11 uses **anchor-free** detection:
- No predefined anchor boxes
- Directly predicts box coordinates
- Simpler, faster training

## Decoupled Head

Separates classification and regression:

```
Feature Map
     ‚îÇ
     ‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
     ‚îÇ              ‚îÇ              ‚îÇ
     ‚Üì              ‚Üì              ‚Üì
Classification   Box Reg        DFL
  Branch        Branch       (Distribution)
     ‚îÇ              ‚îÇ              ‚îÇ
     ‚Üì              ‚Üì              ‚Üì
  Classes        x,y,w,h      Fine Bbox
```

## Output Format

For each grid cell, output:
- **Class scores**: [num_classes] probabilities
- **Box coordinates**: [x, y, w, h] (center format)
- **Objectness**: Confidence score

In [None]:
# ============================================================
# YOLO OUTPUT VISUALIZATION
# ============================================================

def visualize_yolo_grid():
    """Visualize how YOLO divides image into grid."""
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    fig.suptitle('üéØ YOLO Detection at Multiple Scales', fontsize=14, fontweight='bold')
    
    scales = [
        ('Large Objects\n80√ó80 grid', 8),
        ('Medium Objects\n40√ó40 grid', 16),
        ('Small Objects\n20√ó20 grid', 32),
    ]
    
    for ax, (title, grid_size) in zip(axes, scales):
        # Draw grid
        ax.set_xlim(0, 640)
        ax.set_ylim(640, 0)
        
        # Draw grid lines
        for i in range(0, 641, grid_size):
            ax.axhline(y=i, color='blue', alpha=0.3, linewidth=0.5)
            ax.axvline(x=i, color='blue', alpha=0.3, linewidth=0.5)
        
        # Highlight some cells
        np.random.seed(42)
        for _ in range(5):
            x = np.random.randint(0, 640 // grid_size) * grid_size
            y = np.random.randint(0, 640 // grid_size) * grid_size
            rect = plt.Rectangle((x, y), grid_size, grid_size, fill=True, 
                                 color='green', alpha=0.5)
            ax.add_patch(rect)
        
        ax.set_title(title)
        ax.set_xlabel('640 pixels')
        ax.set_aspect('equal')
    
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'yolo_grid.png', dpi=150)
    plt.show()

visualize_yolo_grid()

---

# Part 5: YOLOv11 Specific Features

## C2PSA (Cross Stage Partial with Spatial Attention)

New in YOLOv11:
- Combines CSP with spatial attention
- Better feature representation
- Improved small object detection

## Model Variants

| Model | Params | mAP | Speed |
|-------|--------|-----|-------|
| yolo11n | 2.6M | 39.5 | Fast |
| yolo11s | 9.4M | 47.0 | Fast |
| yolo11m | 20.1M | 51.5 | Medium |
| yolo11l | 25.3M | 53.4 | Slow |
| yolo11x | 56.9M | 54.7 | Slowest |

For this project, we'll use **yolo11n** (nano) for faster training.

In [None]:
# Model comparison visualization
def visualize_model_comparison():
    """Compare YOLOv11 model variants."""
    
    models = ['yolo11n', 'yolo11s', 'yolo11m', 'yolo11l', 'yolo11x']
    params = [2.6, 9.4, 20.1, 25.3, 56.9]  # Millions
    mAP = [39.5, 47.0, 51.5, 53.4, 54.7]
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Parameters
    colors = plt.cm.viridis(np.linspace(0, 1, len(models)))
    axes[0].bar(models, params, color=colors)
    axes[0].set_ylabel('Parameters (M)')
    axes[0].set_title('Model Size')
    
    # mAP
    axes[1].bar(models, mAP, color=colors)
    axes[1].set_ylabel('mAP@50')
    axes[1].set_title('Performance')
    
    plt.suptitle('üìä YOLOv11 Model Variants', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'model_comparison.png', dpi=150)
    plt.show()

visualize_model_comparison()

## üìù Summary

### YOLOv11 Architecture:

| Component | Purpose | Key Feature |
|-----------|---------|-------------|
| **Backbone** | Feature extraction | CSPDarknet + C2PSA |
| **Neck** | Feature fusion | FPN + PANet |
| **Head** | Detection | Decoupled, Anchor-free |

### Key Concepts:
1. Multi-scale detection (small, medium, large objects)
2. Feature pyramid for semantic + localization info
3. Anchor-free for simplified training
4. Decoupled head for better convergence

### Next: Task 8 - Loss Function Mathematics

In [None]:
print("\n" + "="*60)
print("‚úÖ TASK 7 COMPLETE: YOLO Architecture Deep Dive")
print("="*60)
print("\nüìã Topics covered:")
print("   ‚úì YOLO evolution (v1 to v11)")
print("   ‚úì Backbone: CSPDarknet + C2PSA")
print("   ‚úì Neck: FPN + PANet")
print("   ‚úì Head: Decoupled, Anchor-free")
print("   ‚úì Multi-scale detection")
print("   ‚úì Model variants comparison")
print("\n‚û°Ô∏è Ready for Task 8: Loss Functions")