# üß† Task 5: CNN Fundamentals with NumPy

## üéØ Objective
Implement core CNN components from scratch using NumPy to understand the mathematical foundations before using YOLOv11.

---

## üìö Why This Matters

Before using complex frameworks, understanding the math helps you:
1. **Debug effectively** - Know what's happening inside
2. **Optimize performance** - Understand bottlenecks
3. **Customize architectures** - Modify with confidence

### ML Rules Applied:
- **Rule #14**: Starting with an interpretable model makes debugging easier

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path

np.random.seed(42)
plt.style.use('seaborn-v0_8-whitegrid')

PROJECT_ROOT = Path(r"D:\het\SELF\RP\YOLO-V11-PRO")
print("‚úÖ Libraries imported (NumPy only - no PyTorch!)")

---

# Part 1: Convolution Operation

## üìê Mathematical Definition

**2D Discrete Convolution:**
$$
(I * K)[i,j] = \sum_{m=0}^{k_h-1} \sum_{n=0}^{k_w-1} I[i+m, j+n] \cdot K[m,n]
$$

Where:
- **I** = Input image (H √ó W)
- **K** = Kernel/Filter (k_h √ó k_w)
- ***** = Convolution operation

**Output Size:**
$$
O_h = \frac{H - k_h + 2P}{S} + 1 \quad\quad O_w = \frac{W - k_w + 2P}{S} + 1
$$

Where: P = padding, S = stride

In [None]:
# ============================================================
# 2D CONVOLUTION - NumPy Implementation
# ============================================================

def conv2d(image, kernel, stride=1, padding=0):
    """
    2D Convolution operation (NumPy only).
    
    Args:
        image: Input image (H, W) or (H, W, C)
        kernel: Convolution kernel (k_h, k_w)
        stride: Step size for sliding window
        padding: Zero-padding around image
    
    Returns:
        Convolved output
    """
    # Handle grayscale vs RGB
    if len(image.shape) == 2:
        image = image[:, :, np.newaxis]
    
    H, W, C = image.shape
    k_h, k_w = kernel.shape
    
    # Apply padding
    if padding > 0:
        image = np.pad(image, ((padding, padding), (padding, padding), (0, 0)), mode='constant')
        H, W, C = image.shape
    
    # Calculate output dimensions
    out_h = (H - k_h) // stride + 1
    out_w = (W - k_w) // stride + 1
    
    # Initialize output
    output = np.zeros((out_h, out_w, C))
    
    # Perform convolution
    for c in range(C):
        for i in range(out_h):
            for j in range(out_w):
                # Extract region
                region = image[i*stride:i*stride+k_h, j*stride:j*stride+k_w, c]
                # Element-wise multiply and sum
                output[i, j, c] = np.sum(region * kernel)
    
    return output.squeeze()  # Remove channel dim if grayscale

print("‚úÖ conv2d() function defined")

In [None]:
# ============================================================
# COMMON KERNELS
# ============================================================

# Edge Detection Kernels
SOBEL_X = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]], dtype=np.float32)

SOBEL_Y = np.array([[-1, -2, -1],
                    [ 0,  0,  0],
                    [ 1,  2,  1]], dtype=np.float32)

LAPLACIAN = np.array([[ 0, -1,  0],
                      [-1,  4, -1],
                      [ 0, -1,  0]], dtype=np.float32)

# Blur Kernels
BOX_BLUR = np.ones((3, 3), dtype=np.float32) / 9

GAUSSIAN_3x3 = np.array([[1, 2, 1],
                         [2, 4, 2],
                         [1, 2, 1]], dtype=np.float32) / 16

# Sharpen Kernel
SHARPEN = np.array([[ 0, -1,  0],
                    [-1,  5, -1],
                    [ 0, -1,  0]], dtype=np.float32)

print("‚úÖ Kernels defined")
print(f"\nSobel X (vertical edges):\n{SOBEL_X}")
print(f"\nGaussian Blur:\n{GAUSSIAN_3x3}")

In [None]:
# Visualize convolution effects
def visualize_convolutions(image_path):
    """Apply different kernels and visualize results."""
    
    img = np.array(Image.open(image_path).convert('L'))  # Grayscale
    
    kernels = [
        ('Original', None),
        ('Sobel X (Vertical Edges)', SOBEL_X),
        ('Sobel Y (Horizontal Edges)', SOBEL_Y),
        ('Laplacian (All Edges)', LAPLACIAN),
        ('Box Blur', BOX_BLUR),
        ('Sharpen', SHARPEN)
    ]
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    fig.suptitle('üî¨ Convolution Effects (NumPy Implementation)', fontsize=14, fontweight='bold')
    
    for ax, (name, kernel) in zip(axes.flat, kernels):
        if kernel is None:
            result = img
        else:
            result = conv2d(img.astype(np.float32), kernel, padding=1)
        
        ax.imshow(result, cmap='gray')
        ax.set_title(name)
        ax.axis('off')
    
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'convolution_effects.png', dpi=150)
    plt.show()

# Find sample image
sample_dir = PROJECT_ROOT / "data" / "processed" / "images" / "train"
samples = list(sample_dir.glob("*.jpg"))[:1]
if samples:
    visualize_convolutions(samples[0])

---

# Part 2: Activation Functions

## üìê Why Non-Linearity?

Without activation functions, a neural network is just a linear transformation:
$$
y = W_n \cdot (W_{n-1} \cdot ... \cdot (W_1 \cdot x)) = W_{combined} \cdot x
$$

Activation functions introduce **non-linearity**, enabling networks to learn complex patterns.

In [None]:
# ============================================================
# ACTIVATION FUNCTIONS - NumPy Implementation
# ============================================================

def sigmoid(x):
    """Sigmoid: œÉ(x) = 1 / (1 + e^(-x))"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

def sigmoid_derivative(x):
    """dœÉ/dx = œÉ(x) √ó (1 - œÉ(x))"""
    s = sigmoid(x)
    return s * (1 - s)

def tanh(x):
    """tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))"""
    return np.tanh(x)

def tanh_derivative(x):
    """d(tanh)/dx = 1 - tanh¬≤(x)"""
    return 1 - np.tanh(x)**2

def relu(x):
    """ReLU: max(0, x)"""
    return np.maximum(0, x)

def relu_derivative(x):
    """dReLU/dx = 1 if x > 0 else 0"""
    return (x > 0).astype(np.float32)

def leaky_relu(x, alpha=0.01):
    """Leaky ReLU: max(Œ±x, x)"""
    return np.where(x > 0, x, alpha * x)

def silu(x):
    """SiLU/Swish (used in YOLOv11): x √ó œÉ(x)"""
    return x * sigmoid(x)

def silu_derivative(x):
    """d(SiLU)/dx = œÉ(x) + x √ó œÉ(x) √ó (1 - œÉ(x))"""
    s = sigmoid(x)
    return s + x * s * (1 - s)

def softmax(x):
    """Softmax: exp(x_i) / Œ£exp(x_j)"""
    exp_x = np.exp(x - np.max(x))  # Numerical stability
    return exp_x / np.sum(exp_x)

print("‚úÖ Activation functions defined")

In [None]:
# Visualize activation functions
x = np.linspace(-5, 5, 200)

fig, axes = plt.subplots(2, 3, figsize=(15, 8))
fig.suptitle('üìà Activation Functions (NumPy Implementation)', fontsize=14, fontweight='bold')

activations = [
    ('Sigmoid', sigmoid, sigmoid_derivative),
    ('Tanh', tanh, tanh_derivative),
    ('ReLU', relu, relu_derivative),
    ('Leaky ReLU', leaky_relu, lambda x: np.where(x > 0, 1, 0.01)),
    ('SiLU/Swish (YOLOv11)', silu, silu_derivative),
    ('Softmax Output', lambda x: np.array([softmax(np.array([xi, 0, -xi])) for xi in x]), None)
]

for ax, (name, func, deriv) in zip(axes.flat, activations):
    if 'Softmax' not in name:
        ax.plot(x, func(x), 'b-', linewidth=2, label='f(x)')
        if deriv:
            ax.plot(x, deriv(x), 'r--', linewidth=1.5, label="f'(x)")
        ax.axhline(y=0, color='k', linewidth=0.5)
        ax.axvline(x=0, color='k', linewidth=0.5)
        ax.legend()
    else:
        y = func(x)
        ax.plot(x, y[:, 0], label='Class 0')
        ax.plot(x, y[:, 1], label='Class 1')
        ax.plot(x, y[:, 2], label='Class 2')
        ax.legend()
    ax.set_title(name)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'activation_functions.png', dpi=150)
plt.show()

---

# Part 3: Pooling Operations

## üìê Mathematical Definition

**Max Pooling:**
$$
y[i,j] = \max_{(m,n) \in R_{ij}} x[m,n]
$$

**Average Pooling:**
$$
y[i,j] = \frac{1}{|R|} \sum_{(m,n) \in R_{ij}} x[m,n]
$$

Where R is the pooling region.

In [None]:
# ============================================================
# POOLING OPERATIONS - NumPy Implementation
# ============================================================

def max_pool2d(image, pool_size=2, stride=2):
    """2D Max Pooling."""
    if len(image.shape) == 2:
        image = image[:, :, np.newaxis]
    
    H, W, C = image.shape
    out_h = (H - pool_size) // stride + 1
    out_w = (W - pool_size) // stride + 1
    
    output = np.zeros((out_h, out_w, C))
    
    for c in range(C):
        for i in range(out_h):
            for j in range(out_w):
                region = image[i*stride:i*stride+pool_size, 
                              j*stride:j*stride+pool_size, c]
                output[i, j, c] = np.max(region)
    
    return output.squeeze()

def avg_pool2d(image, pool_size=2, stride=2):
    """2D Average Pooling."""
    if len(image.shape) == 2:
        image = image[:, :, np.newaxis]
    
    H, W, C = image.shape
    out_h = (H - pool_size) // stride + 1
    out_w = (W - pool_size) // stride + 1
    
    output = np.zeros((out_h, out_w, C))
    
    for c in range(C):
        for i in range(out_h):
            for j in range(out_w):
                region = image[i*stride:i*stride+pool_size,
                              j*stride:j*stride+pool_size, c]
                output[i, j, c] = np.mean(region)
    
    return output.squeeze()

print("‚úÖ Pooling functions defined")

In [None]:
# Visualize pooling
def visualize_pooling(image_path):
    """Show pooling effects."""
    img = np.array(Image.open(image_path).convert('L').resize((128, 128)))
    
    max_pooled = max_pool2d(img.astype(np.float32))
    avg_pooled = avg_pool2d(img.astype(np.float32))
    
    fig, axes = plt.subplots(1, 3, figsize=(12, 4))
    fig.suptitle('üî≤ Pooling Operations (2√ó2, stride=2)', fontsize=14, fontweight='bold')
    
    axes[0].imshow(img, cmap='gray')
    axes[0].set_title(f'Original ({img.shape[0]}√ó{img.shape[1]})')
    
    axes[1].imshow(max_pooled, cmap='gray')
    axes[1].set_title(f'Max Pool ({max_pooled.shape[0]}√ó{max_pooled.shape[1]})')
    
    axes[2].imshow(avg_pooled, cmap='gray')
    axes[2].set_title(f'Avg Pool ({avg_pooled.shape[0]}√ó{avg_pooled.shape[1]})')
    
    for ax in axes:
        ax.axis('off')
    
    plt.tight_layout()
    plt.savefig(PROJECT_ROOT / 'docs' / 'assets' / 'pooling_demo.png', dpi=150)
    plt.show()

if samples:
    visualize_pooling(samples[0])

---

# Part 4: Batch Normalization

## üìê Mathematical Definition

$$
\hat{x}_i = \frac{x_i - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}
$$
$$
y_i = \gamma \hat{x}_i + \beta
$$

Where:
- Œº_B = batch mean
- œÉ¬≤_B = batch variance
- Œ≥, Œ≤ = learnable parameters
- Œµ = small constant for stability

In [None]:
# ============================================================
# BATCH NORMALIZATION - NumPy Implementation
# ============================================================

def batch_norm(x, gamma=1.0, beta=0.0, epsilon=1e-5):
    """
    Batch Normalization (NumPy).
    
    Args:
        x: Input array (N, ...) - first dim is batch
        gamma: Scale parameter
        beta: Shift parameter
        epsilon: Numerical stability
    """
    # Compute batch statistics
    mu = np.mean(x, axis=0)
    var = np.var(x, axis=0)
    
    # Normalize
    x_norm = (x - mu) / np.sqrt(var + epsilon)
    
    # Scale and shift
    out = gamma * x_norm + beta
    
    return out, mu, var

# Test
test_batch = np.random.randn(32, 64, 64)  # Batch of 32, 64x64
normalized, mu, var = batch_norm(test_batch)

print(f"‚úÖ BatchNorm test:")
print(f"   Input mean: {test_batch.mean():.4f}, std: {test_batch.std():.4f}")
print(f"   Output mean: {normalized.mean():.4f}, std: {normalized.std():.4f}")

---

# Part 5: Simple Forward Pass

Let's combine everything into a simple CNN layer!

In [None]:
# ============================================================
# SIMPLE CNN LAYER - NumPy Implementation
# ============================================================

class ConvLayer:
    """Basic Convolutional Layer (NumPy only)."""
    
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
        self.stride = stride
        self.padding = padding
        
        # Initialize weights (He initialization)
        scale = np.sqrt(2.0 / (in_channels * kernel_size * kernel_size))
        self.weights = np.random.randn(out_channels, in_channels, kernel_size, kernel_size) * scale
        self.bias = np.zeros(out_channels)
    
    def forward(self, x):
        """Forward pass."""
        # x shape: (C, H, W)
        C, H, W = x.shape
        out_channels = self.weights.shape[0]
        k = self.weights.shape[2]
        
        # Pad
        if self.padding > 0:
            x = np.pad(x, ((0, 0), (self.padding, self.padding), (self.padding, self.padding)))
        
        _, H_pad, W_pad = x.shape
        out_h = (H_pad - k) // self.stride + 1
        out_w = (W_pad - k) // self.stride + 1
        
        output = np.zeros((out_channels, out_h, out_w))
        
        for oc in range(out_channels):
            for i in range(out_h):
                for j in range(out_w):
                    region = x[:, i*self.stride:i*self.stride+k, j*self.stride:j*self.stride+k]
                    output[oc, i, j] = np.sum(region * self.weights[oc]) + self.bias[oc]
        
        return output

# Test
conv = ConvLayer(3, 16, kernel_size=3)
test_input = np.random.randn(3, 64, 64)  # RGB 64x64
output = conv.forward(test_input)
output = relu(output)  # Apply activation
output = max_pool2d(output.transpose(1, 2, 0)).transpose(2, 0, 1)  # Pool

print(f"‚úÖ Conv Layer Test:")
print(f"   Input shape: {test_input.shape}")
print(f"   Output shape (after pool): {output.shape}")

## üìù Summary

### Implemented from Scratch (NumPy only):

| Component | Formula | Function |
|-----------|---------|----------|
| **Convolution** | Œ£ I[i+m,j+n] √ó K[m,n] | `conv2d()` |
| **ReLU** | max(0, x) | `relu()` |
| **Sigmoid** | 1/(1+e^-x) | `sigmoid()` |
| **SiLU** | x √ó œÉ(x) | `silu()` |
| **Max Pool** | max(region) | `max_pool2d()` |
| **Avg Pool** | mean(region) | `avg_pool2d()` |
| **BatchNorm** | (x-Œº)/‚àö(œÉ¬≤+Œµ) | `batch_norm()` |

### Next: Task 6 - Object Detection Metrics

In [None]:
print("\n" + "="*60)
print("‚úÖ TASK 5 COMPLETE: CNN Fundamentals with NumPy")
print("="*60)
print("\nüìã Implemented (NumPy only, no PyTorch!):")
print("   ‚úì 2D Convolution operation")
print("   ‚úì Edge detection kernels (Sobel, Laplacian)")
print("   ‚úì Activation functions (ReLU, Sigmoid, Tanh, SiLU)")
print("   ‚úì Pooling operations (Max, Average)")
print("   ‚úì Batch Normalization")
print("   ‚úì Simple ConvLayer class")
print("\n‚û°Ô∏è Ready for Task 6: Object Detection Metrics")