# 04 Conv2D 卷積層

## 學習目標

1. 理解多通道卷積的前向傳播
2. **詳細推導**卷積層的反向傳播（這是最難的部分！）
3. 實作完整的 Conv2D 類別，包含 forward 和 backward
4. 使用 im2col 技巧加速卷積（選做）
5. 嚴格的梯度檢驗

## 回顧：多通道卷積

在 Module 1 我們實作過 2D 卷積。現在要處理**多 batch、多通道**的情況：

- **輸入** $X$：形狀 $(N, C_{in}, H, W)$
- **卷積核** $W$：形狀 $(C_{out}, C_{in}, k_H, k_W)$
- **偏置** $b$：形狀 $(C_{out},)$
- **輸出** $Y$：形狀 $(N, C_{out}, H', W')$

其中輸出大小：
$$H' = \frac{H + 2P - k_H}{S} + 1$$
$$W' = \frac{W + 2P - k_W}{S} + 1$$

In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
print("Conv2D module loaded!")

## 第一部分：前向傳播

### 公式（index 展開）

輸出的每個元素：

$$Y[n, c_{out}, i, j] = \sum_{c_{in}=0}^{C_{in}-1} \sum_{p=0}^{k_H-1} \sum_{q=0}^{k_W-1} X[n, c_{in}, i \cdot S + p, j \cdot S + q] \cdot W[c_{out}, c_{in}, p, q] + b[c_{out}]$$

這看起來很複雜，讓我們一步步理解：
1. 對每個輸出位置 $(i, j)$
2. 對所有輸入通道 $c_{in}$ 求和
3. 對卷積核的每個位置 $(p, q)$ 做點積
4. 加上該輸出通道的偏置

In [None]:
def conv2d_forward_naive(X, W, b, stride=1, padding=0):
    """
    卷積前向傳播（樸素版本，用於理解）
    
    Parameters
    ----------
    X : np.ndarray, shape (N, C_in, H, W)
        輸入
    W : np.ndarray, shape (C_out, C_in, kH, kW)
        卷積核
    b : np.ndarray, shape (C_out,)
        偏置
    stride : int
        步長
    padding : int
        零填充
    
    Returns
    -------
    Y : np.ndarray, shape (N, C_out, H', W')
    """
    N, C_in, H, W_in = X.shape
    C_out, _, kH, kW = W.shape
    
    # 零填充
    if padding > 0:
        X_pad = np.pad(X, ((0, 0), (0, 0), (padding, padding), (padding, padding)), mode='constant')
    else:
        X_pad = X
    
    _, _, H_pad, W_pad = X_pad.shape
    
    # 輸出大小
    H_out = (H_pad - kH) // stride + 1
    W_out = (W_pad - kW) // stride + 1
    
    # 初始化輸出
    Y = np.zeros((N, C_out, H_out, W_out))
    
    # 6 層迴圈（很慢，但容易理解）
    for n in range(N):                    # 對每個樣本
        for c_out in range(C_out):        # 對每個輸出通道
            for i in range(H_out):        # 對每個輸出高度位置
                for j in range(W_out):    # 對每個輸出寬度位置
                    # 取出感受野
                    h_start = i * stride
                    h_end = h_start + kH
                    w_start = j * stride
                    w_end = w_start + kW
                    
                    receptive_field = X_pad[n, :, h_start:h_end, w_start:w_end]
                    
                    # 點積（對所有輸入通道）
                    Y[n, c_out, i, j] = np.sum(receptive_field * W[c_out]) + b[c_out]
    
    return Y

# 測試
N, C_in, H, W = 2, 3, 5, 5
C_out, kH, kW = 4, 3, 3

X = np.random.randn(N, C_in, H, W)
W_conv = np.random.randn(C_out, C_in, kH, kW)
b_conv = np.random.randn(C_out)

Y = conv2d_forward_naive(X, W_conv, b_conv, stride=1, padding=0)

print(f"輸入形狀: {X.shape}")
print(f"卷積核形狀: {W_conv.shape}")
print(f"偏置形狀: {b_conv.shape}")
print(f"輸出形狀: {Y.shape}")
print(f"預期輸出形狀: ({N}, {C_out}, {H-kH+1}, {W-kW+1})")

## 第二部分：反向傳播（詳細推導）

這是 CNN 中最難理解的部分。我們需要計算：
- $\frac{\partial L}{\partial X}$（對輸入的梯度）
- $\frac{\partial L}{\partial W}$（對卷積核的梯度）
- $\frac{\partial L}{\partial b}$（對偏置的梯度）

假設我們已知 $\frac{\partial L}{\partial Y}$（記作 $dY$）。

### 2.1 對偏置 $b$ 的梯度

偏置 $b[c_{out}]$ 影響所有 $Y[:, c_{out}, :, :]$，且影響是直接加 1：

$$\frac{\partial L}{\partial b[c_{out}]} = \sum_n \sum_i \sum_j dY[n, c_{out}, i, j]$$

向量形式：對 $dY$ 的 axis=(0, 2, 3) 求和

### 2.2 對卷積核 $W$ 的梯度

考慮 $W[c_{out}, c_{in}, p, q]$ 如何影響 $Y$：

$$Y[n, c_{out}, i, j] = \ldots + X[n, c_{in}, i \cdot S + p, j \cdot S + q] \cdot W[c_{out}, c_{in}, p, q] + \ldots$$

所以：

$$\frac{\partial Y[n, c_{out}, i, j]}{\partial W[c_{out}, c_{in}, p, q]} = X[n, c_{in}, i \cdot S + p, j \cdot S + q]$$

使用 chain rule：

$$\frac{\partial L}{\partial W[c_{out}, c_{in}, p, q]} = \sum_n \sum_i \sum_j dY[n, c_{out}, i, j] \cdot X[n, c_{in}, i \cdot S + p, j \cdot S + q]$$

**這其實就是 $dY$ 和 $X$ 之間的「互相關」運算！**

### 2.3 對輸入 $X$ 的梯度（最難的部分！）

考慮 $X[n, c_{in}, h, w]$ 如何影響 $Y$。

關鍵觀察：$X[n, c_{in}, h, w]$ 會影響**多個**輸出位置！

具體來說，如果 $h = i \cdot S + p$ 且 $w = j \cdot S + q$，則 $X[n, c_{in}, h, w]$ 參與了 $Y[n, :, i, j]$ 的計算。

$$\frac{\partial L}{\partial X[n, c_{in}, h, w]} = \sum_{c_{out}} \sum_{i, j \text{ s.t. } (h,w) \in \text{receptive field of } (i,j)} dY[n, c_{out}, i, j] \cdot W[c_{out}, c_{in}, h - i \cdot S, w - j \cdot S]$$

**這其實是將 $dY$ 與 翻轉後的 $W$ 做「full convolution」！**

$$\frac{\partial L}{\partial X} = dY \ast \text{flip}(W)$$

其中 flip 是將 $W$ 旋轉 180 度（沿 kH 和 kW 軸翻轉）。

In [None]:
def conv2d_backward_naive(dY, X, W, stride=1, padding=0):
    """
    卷積反向傳播（樸素版本）
    
    Parameters
    ----------
    dY : np.ndarray, shape (N, C_out, H_out, W_out)
        對輸出的梯度
    X : np.ndarray, shape (N, C_in, H, W)
        前向傳播時的輸入
    W : np.ndarray, shape (C_out, C_in, kH, kW)
        卷積核
    stride : int
    padding : int
    
    Returns
    -------
    dX : np.ndarray, shape (N, C_in, H, W)
    dW : np.ndarray, shape (C_out, C_in, kH, kW)
    db : np.ndarray, shape (C_out,)
    """
    N, C_in, H, W_in = X.shape
    C_out, _, kH, kW = W.shape
    _, _, H_out, W_out = dY.shape
    
    # 零填充輸入
    if padding > 0:
        X_pad = np.pad(X, ((0, 0), (0, 0), (padding, padding), (padding, padding)), mode='constant')
    else:
        X_pad = X
    
    # 初始化梯度
    dX_pad = np.zeros_like(X_pad)
    dW = np.zeros_like(W)
    db = np.zeros(C_out)
    
    # 計算 db：對 dY 的 axis=(0, 2, 3) 求和
    db = np.sum(dY, axis=(0, 2, 3))
    
    # 計算 dW 和 dX
    for n in range(N):
        for c_out in range(C_out):
            for i in range(H_out):
                for j in range(W_out):
                    h_start = i * stride
                    h_end = h_start + kH
                    w_start = j * stride
                    w_end = w_start + kW
                    
                    # dW: 累積 dY * X
                    dW[c_out] += dY[n, c_out, i, j] * X_pad[n, :, h_start:h_end, w_start:w_end]
                    
                    # dX: 累積 dY * W
                    dX_pad[n, :, h_start:h_end, w_start:w_end] += dY[n, c_out, i, j] * W[c_out]
    
    # 移除 padding
    if padding > 0:
        dX = dX_pad[:, :, padding:-padding, padding:-padding]
    else:
        dX = dX_pad
    
    return dX, dW, db

# 測試
dY = np.random.randn(*Y.shape)
dX, dW, db = conv2d_backward_naive(dY, X, W_conv, stride=1, padding=0)

print(f"dY 形狀: {dY.shape}")
print(f"dX 形狀: {dX.shape} (應與 X 相同: {X.shape})")
print(f"dW 形狀: {dW.shape} (應與 W 相同: {W_conv.shape})")
print(f"db 形狀: {db.shape} (應與 b 相同: {b_conv.shape})")

## 第三部分：梯度檢驗

In [None]:
def gradient_check_conv2d(X, W, b, stride=1, padding=0, eps=1e-5):
    """
    對 Conv2D 進行梯度檢驗
    
    使用 loss = sum(Y^2) 作為測試損失
    """
    # 前向傳播
    Y = conv2d_forward_naive(X, W, b, stride, padding)
    
    # dL/dY = 2Y
    dY = 2 * Y
    
    # 解析梯度
    dX, dW, db = conv2d_backward_naive(dY, X, W, stride, padding)
    
    all_passed = True
    
    # === 檢驗 dW ===
    print("=== 檢驗 dW ===")
    dW_numerical = np.zeros_like(W)
    
    # 隨機選幾個位置檢驗（全部檢驗太慢）
    num_checks = min(20, W.size)
    indices = np.random.choice(W.size, num_checks, replace=False)
    
    for idx in indices:
        multi_idx = np.unravel_index(idx, W.shape)
        old_val = W[multi_idx]
        
        W[multi_idx] = old_val + eps
        Y_plus = conv2d_forward_naive(X, W, b, stride, padding)
        loss_plus = np.sum(Y_plus ** 2)
        
        W[multi_idx] = old_val - eps
        Y_minus = conv2d_forward_naive(X, W, b, stride, padding)
        loss_minus = np.sum(Y_minus ** 2)
        
        W[multi_idx] = old_val
        
        dW_numerical[multi_idx] = (loss_plus - loss_minus) / (2 * eps)
    
    # 只比較檢驗過的位置
    for idx in indices:
        multi_idx = np.unravel_index(idx, W.shape)
        analytic = dW[multi_idx]
        numerical = dW_numerical[multi_idx]
        rel_error = abs(analytic - numerical) / (abs(analytic) + abs(numerical) + 1e-8)
        if rel_error > 1e-4:
            print(f"  位置 {multi_idx}: 解析={analytic:.6f}, 數值={numerical:.6f}, 誤差={rel_error:.2e} ❌")
            all_passed = False
    
    if all_passed:
        print(f"  抽查 {num_checks} 個位置全部通過 ✓")
    
    # === 檢驗 db ===
    print("\n=== 檢驗 db ===")
    db_numerical = np.zeros_like(b)
    
    for c in range(len(b)):
        old_val = b[c]
        
        b[c] = old_val + eps
        Y_plus = conv2d_forward_naive(X, W, b, stride, padding)
        loss_plus = np.sum(Y_plus ** 2)
        
        b[c] = old_val - eps
        Y_minus = conv2d_forward_naive(X, W, b, stride, padding)
        loss_minus = np.sum(Y_minus ** 2)
        
        b[c] = old_val
        
        db_numerical[c] = (loss_plus - loss_minus) / (2 * eps)
    
    rel_error = np.max(np.abs(db - db_numerical) / (np.abs(db) + np.abs(db_numerical) + 1e-8))
    print(f"  最大相對誤差: {rel_error:.2e}")
    print(f"  通過: {rel_error < 1e-4}")
    if rel_error > 1e-4:
        all_passed = False
    
    # === 檢驗 dX ===
    print("\n=== 檢驗 dX ===")
    dX_numerical = np.zeros_like(X)
    
    num_checks = min(20, X.size)
    indices = np.random.choice(X.size, num_checks, replace=False)
    X_test = X.copy()
    
    for idx in indices:
        multi_idx = np.unravel_index(idx, X.shape)
        old_val = X_test[multi_idx]
        
        X_test[multi_idx] = old_val + eps
        Y_plus = conv2d_forward_naive(X_test, W, b, stride, padding)
        loss_plus = np.sum(Y_plus ** 2)
        
        X_test[multi_idx] = old_val - eps
        Y_minus = conv2d_forward_naive(X_test, W, b, stride, padding)
        loss_minus = np.sum(Y_minus ** 2)
        
        X_test[multi_idx] = old_val
        
        dX_numerical[multi_idx] = (loss_plus - loss_minus) / (2 * eps)
    
    for idx in indices:
        multi_idx = np.unravel_index(idx, X.shape)
        analytic = dX[multi_idx]
        numerical = dX_numerical[multi_idx]
        rel_error = abs(analytic - numerical) / (abs(analytic) + abs(numerical) + 1e-8)
        if rel_error > 1e-4:
            print(f"  位置 {multi_idx}: 解析={analytic:.6f}, 數值={numerical:.6f}, 誤差={rel_error:.2e} ❌")
            all_passed = False
    
    if all_passed:
        print(f"  抽查 {num_checks} 個位置全部通過 ✓")
    
    return all_passed

# 使用小規模資料測試
np.random.seed(42)
X_small = np.random.randn(2, 2, 4, 4)
W_small = np.random.randn(3, 2, 2, 2)
b_small = np.random.randn(3)

passed = gradient_check_conv2d(X_small, W_small, b_small, stride=1, padding=0)
print(f"\n總體結果: {'全部通過 ✓' if passed else '有錯誤 ✗'}")

## 第四部分：完整的 Conv2D 類別

In [None]:
class Conv2D:
    """
    2D 卷積層
    
    Parameters
    ----------
    in_channels : int
        輸入通道數
    out_channels : int
        輸出通道數
    kernel_size : int or tuple
        卷積核大小
    stride : int
        步長
    padding : int
        零填充
    """
    
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        self.in_channels = in_channels
        self.out_channels = out_channels
        
        if isinstance(kernel_size, int):
            self.kernel_size = (kernel_size, kernel_size)
        else:
            self.kernel_size = kernel_size
        
        self.stride = stride
        self.padding = padding
        
        # He 初始化
        kH, kW = self.kernel_size
        std = np.sqrt(2.0 / (in_channels * kH * kW))
        self.W = np.random.randn(out_channels, in_channels, kH, kW) * std
        self.b = np.zeros(out_channels)
        
        # 梯度
        self.dW = None
        self.db = None
        
        # 快取
        self.cache = None
    
    def forward(self, X):
        """
        前向傳播
        
        Parameters
        ----------
        X : np.ndarray, shape (N, C_in, H, W)
        
        Returns
        -------
        Y : np.ndarray, shape (N, C_out, H', W')
        """
        N, C_in, H, W_in = X.shape
        C_out = self.out_channels
        kH, kW = self.kernel_size
        S = self.stride
        P = self.padding
        
        # 零填充
        if P > 0:
            X_pad = np.pad(X, ((0, 0), (0, 0), (P, P), (P, P)), mode='constant')
        else:
            X_pad = X
        
        _, _, H_pad, W_pad = X_pad.shape
        
        H_out = (H_pad - kH) // S + 1
        W_out = (W_pad - kW) // S + 1
        
        Y = np.zeros((N, C_out, H_out, W_out))
        
        for n in range(N):
            for c_out in range(C_out):
                for i in range(H_out):
                    for j in range(W_out):
                        h_start = i * S
                        w_start = j * S
                        receptive = X_pad[n, :, h_start:h_start+kH, w_start:w_start+kW]
                        Y[n, c_out, i, j] = np.sum(receptive * self.W[c_out]) + self.b[c_out]
        
        # 儲存快取
        self.cache = (X, X_pad)
        
        return Y
    
    def backward(self, dY):
        """
        反向傳播
        
        Parameters
        ----------
        dY : np.ndarray, shape (N, C_out, H_out, W_out)
        
        Returns
        -------
        dX : np.ndarray, shape (N, C_in, H, W)
        """
        X, X_pad = self.cache
        N, C_in, H, W_in = X.shape
        C_out = self.out_channels
        kH, kW = self.kernel_size
        S = self.stride
        P = self.padding
        _, _, H_out, W_out = dY.shape
        
        dX_pad = np.zeros_like(X_pad)
        self.dW = np.zeros_like(self.W)
        self.db = np.sum(dY, axis=(0, 2, 3))
        
        for n in range(N):
            for c_out in range(C_out):
                for i in range(H_out):
                    for j in range(W_out):
                        h_start = i * S
                        w_start = j * S
                        
                        self.dW[c_out] += dY[n, c_out, i, j] * X_pad[n, :, h_start:h_start+kH, w_start:w_start+kW]
                        dX_pad[n, :, h_start:h_start+kH, w_start:w_start+kW] += dY[n, c_out, i, j] * self.W[c_out]
        
        # 移除 padding
        if P > 0:
            dX = dX_pad[:, :, P:-P, P:-P]
        else:
            dX = dX_pad
        
        return dX
    
    def __repr__(self):
        return f"Conv2D({self.in_channels}, {self.out_channels}, kernel_size={self.kernel_size}, stride={self.stride}, padding={self.padding})"

# 測試
conv = Conv2D(in_channels=3, out_channels=8, kernel_size=3, stride=1, padding=1)
print(conv)

X_test = np.random.randn(2, 3, 8, 8)
Y_test = conv.forward(X_test)
print(f"\n輸入形狀: {X_test.shape}")
print(f"輸出形狀: {Y_test.shape}")
print("（padding=1 使得輸出大小與輸入相同）")

## 第五部分：im2col 加速技巧（選做）

im2col (image to column) 是一種將卷積運算轉換為矩陣乘法的技巧，可以利用高度優化的 BLAS 庫來加速。

### 基本思想

1. 把每個感受野「展開」成一列
2. 把卷積核「展開」成一行
3. 卷積變成簡單的矩陣乘法

In [None]:
def im2col(X, kernel_size, stride=1, padding=0):
    """
    將輸入展開成矩陣，用於加速卷積
    
    Parameters
    ----------
    X : np.ndarray, shape (N, C, H, W)
    kernel_size : int or tuple (kH, kW)
    stride : int
    padding : int
    
    Returns
    -------
    col : np.ndarray, shape (N * H_out * W_out, C * kH * kW)
    """
    N, C, H, W = X.shape
    
    if isinstance(kernel_size, int):
        kH, kW = kernel_size, kernel_size
    else:
        kH, kW = kernel_size
    
    # 零填充
    if padding > 0:
        X_pad = np.pad(X, ((0, 0), (0, 0), (padding, padding), (padding, padding)), mode='constant')
    else:
        X_pad = X
    
    _, _, H_pad, W_pad = X_pad.shape
    
    H_out = (H_pad - kH) // stride + 1
    W_out = (W_pad - kW) // stride + 1
    
    # 展開
    col = np.zeros((N, C, kH, kW, H_out, W_out))
    
    for p in range(kH):
        p_max = p + stride * H_out
        for q in range(kW):
            q_max = q + stride * W_out
            col[:, :, p, q, :, :] = X_pad[:, :, p:p_max:stride, q:q_max:stride]
    
    # 重排形狀: (N, C, kH, kW, H_out, W_out) -> (N * H_out * W_out, C * kH * kW)
    col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N * H_out * W_out, -1)
    
    return col

def col2im(col, X_shape, kernel_size, stride=1, padding=0):
    """
    im2col 的逆操作
    
    Parameters
    ----------
    col : np.ndarray, shape (N * H_out * W_out, C * kH * kW)
    X_shape : tuple (N, C, H, W)
    kernel_size : int or tuple
    stride : int
    padding : int
    
    Returns
    -------
    X : np.ndarray, shape (N, C, H, W)
    """
    N, C, H, W = X_shape
    
    if isinstance(kernel_size, int):
        kH, kW = kernel_size, kernel_size
    else:
        kH, kW = kernel_size
    
    H_pad = H + 2 * padding
    W_pad = W + 2 * padding
    H_out = (H_pad - kH) // stride + 1
    W_out = (W_pad - kW) // stride + 1
    
    # 重排形狀
    col = col.reshape(N, H_out, W_out, C, kH, kW).transpose(0, 3, 4, 5, 1, 2)
    
    X_pad = np.zeros((N, C, H_pad, W_pad))
    
    for p in range(kH):
        p_max = p + stride * H_out
        for q in range(kW):
            q_max = q + stride * W_out
            X_pad[:, :, p:p_max:stride, q:q_max:stride] += col[:, :, p, q, :, :]
    
    # 移除 padding
    if padding > 0:
        X = X_pad[:, :, padding:-padding, padding:-padding]
    else:
        X = X_pad
    
    return X

# 測試 im2col
X_test = np.random.randn(2, 3, 4, 4)
col = im2col(X_test, kernel_size=2, stride=1, padding=0)

print(f"輸入形狀: {X_test.shape}")
print(f"im2col 後形狀: {col.shape}")
print(f"預期形狀: (N * H_out * W_out, C * kH * kW) = (2 * 3 * 3, 3 * 2 * 2) = (18, 12)")

In [None]:
class Conv2DFast:
    """
    使用 im2col 加速的 Conv2D
    """
    
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        self.in_channels = in_channels
        self.out_channels = out_channels
        
        if isinstance(kernel_size, int):
            self.kernel_size = (kernel_size, kernel_size)
        else:
            self.kernel_size = kernel_size
        
        self.stride = stride
        self.padding = padding
        
        kH, kW = self.kernel_size
        std = np.sqrt(2.0 / (in_channels * kH * kW))
        self.W = np.random.randn(out_channels, in_channels, kH, kW) * std
        self.b = np.zeros(out_channels)
        
        self.dW = None
        self.db = None
        self.cache = None
    
    def forward(self, X):
        N, C_in, H, W_in = X.shape
        C_out = self.out_channels
        kH, kW = self.kernel_size
        S = self.stride
        P = self.padding
        
        H_out = (H + 2 * P - kH) // S + 1
        W_out = (W_in + 2 * P - kW) // S + 1
        
        # im2col
        col = im2col(X, self.kernel_size, S, P)
        
        # 展開卷積核: (C_out, C_in, kH, kW) -> (C_out, C_in * kH * kW)
        W_row = self.W.reshape(C_out, -1)
        
        # 矩陣乘法: (N*H_out*W_out, C_in*kH*kW) @ (C_in*kH*kW, C_out) -> (N*H_out*W_out, C_out)
        out = col @ W_row.T + self.b
        
        # 重排形狀
        Y = out.reshape(N, H_out, W_out, C_out).transpose(0, 3, 1, 2)
        
        self.cache = (X, col)
        
        return Y
    
    def backward(self, dY):
        X, col = self.cache
        N, C_out, H_out, W_out = dY.shape
        C_in = self.in_channels
        kH, kW = self.kernel_size
        
        # 展開 dY: (N, C_out, H_out, W_out) -> (N*H_out*W_out, C_out)
        dY_col = dY.transpose(0, 2, 3, 1).reshape(-1, C_out)
        
        # db
        self.db = np.sum(dY_col, axis=0)
        
        # dW: col.T @ dY_col
        # col: (N*H_out*W_out, C_in*kH*kW)
        # dY_col: (N*H_out*W_out, C_out)
        # dW: (C_in*kH*kW, C_out) -> transpose -> (C_out, C_in*kH*kW) -> reshape
        dW_flat = col.T @ dY_col  # (C_in*kH*kW, C_out)
        self.dW = dW_flat.T.reshape(C_out, C_in, kH, kW)
        
        # dX: dY_col @ W
        W_row = self.W.reshape(C_out, -1)
        dcol = dY_col @ W_row  # (N*H_out*W_out, C_in*kH*kW)
        
        # col2im
        dX = col2im(dcol, X.shape, self.kernel_size, self.stride, self.padding)
        
        return dX

# 比較樸素版本和快速版本
import time

np.random.seed(42)
X_bench = np.random.randn(4, 3, 16, 16)

conv_naive = Conv2D(3, 8, 3, stride=1, padding=1)
conv_fast = Conv2DFast(3, 8, 3, stride=1, padding=1)
conv_fast.W = conv_naive.W.copy()
conv_fast.b = conv_naive.b.copy()

# 前向傳播比較
start = time.time()
Y_naive = conv_naive.forward(X_bench)
time_naive = time.time() - start

start = time.time()
Y_fast = conv_fast.forward(X_bench)
time_fast = time.time() - start

print(f"樸素版本時間: {time_naive*1000:.2f} ms")
print(f"im2col 版本時間: {time_fast*1000:.2f} ms")
print(f"加速比: {time_naive / time_fast:.2f}x")
print(f"輸出差異: {np.max(np.abs(Y_naive - Y_fast)):.2e}")

## 練習題

### 練習 1：實作帶 dilation 的卷積

Dilation（膨脹）讓卷積核「擴張」，可以增加感受野而不增加參數。

對於 dilation=d，卷積核相當於在原本的元素之間插入 d-1 個零。

**提示**：修改 h_start 和 w_start 的計算方式

In [None]:
def conv2d_forward_dilated(X, W, b, stride=1, padding=0, dilation=1):
    """
    帶 dilation 的卷積前向傳播
    
    dilation=1 等於普通卷積
    dilation=2 表示卷積核元素之間間隔 1 個像素
    """
    N, C_in, H, W_in = X.shape
    C_out, _, kH, kW = W.shape
    
    # 有效卷積核大小
    kH_eff = kH + (kH - 1) * (dilation - 1)
    kW_eff = kW + (kW - 1) * (dilation - 1)
    
    # 零填充
    if padding > 0:
        X_pad = np.pad(X, ((0, 0), (0, 0), (padding, padding), (padding, padding)), mode='constant')
    else:
        X_pad = X
    
    _, _, H_pad, W_pad = X_pad.shape
    
    # 輸出大小
    H_out = (H_pad - kH_eff) // stride + 1
    W_out = (W_pad - kW_eff) // stride + 1
    
    Y = np.zeros((N, C_out, H_out, W_out))
    
    # 解答：修改卷積邏輯
    for n in range(N):
        for c_out in range(C_out):
            for i in range(H_out):
                for j in range(W_out):
                    # 計算起始位置
                    h_start = i * stride
                    w_start = j * stride
                    
                    # 對卷積核的每個位置
                    total = 0
                    for p in range(kH):
                        for q in range(kW):
                            # 考慮 dilation 的偏移
                            h_idx = h_start + p * dilation
                            w_idx = w_start + q * dilation
                            
                            for c_in in range(C_in):
                                total += X_pad[n, c_in, h_idx, w_idx] * W[c_out, c_in, p, q]
                    
                    Y[n, c_out, i, j] = total + b[c_out]
    
    return Y

# 測試
X_test = np.random.randn(1, 1, 7, 7)
W_test = np.random.randn(1, 1, 3, 3)
b_test = np.zeros(1)

# 普通卷積
Y_d1 = conv2d_forward_dilated(X_test, W_test, b_test, dilation=1)
print(f"dilation=1: 輸入 {X_test.shape} -> 輸出 {Y_d1.shape}")

# 膨脹卷積
Y_d2 = conv2d_forward_dilated(X_test, W_test, b_test, dilation=2)
print(f"dilation=2: 輸入 {X_test.shape} -> 輸出 {Y_d2.shape}")

Y_d3 = conv2d_forward_dilated(X_test, W_test, b_test, dilation=3)
print(f"dilation=3: 輸入 {X_test.shape} -> 輸出 {Y_d3.shape}")

In [None]:
# 視覺化 dilation 效果
fig, axes = plt.subplots(1, 3, figsize=(12, 4))

# 產生測試卷積核
kernel = np.array([[1, 0, 1],
                   [0, 1, 0],
                   [1, 0, 1]])

dilations = [1, 2, 3]

for idx, d in enumerate(dilations):
    ax = axes[idx]
    
    # 產生膨脹後的卷積核（用於視覺化）
    kH, kW = kernel.shape
    kH_eff = kH + (kH - 1) * (d - 1)
    kW_eff = kW + (kW - 1) * (d - 1)
    
    kernel_dilated = np.zeros((kH_eff, kW_eff))
    for i in range(kH):
        for j in range(kW):
            kernel_dilated[i * d, j * d] = kernel[i, j]
    
    ax.imshow(kernel_dilated, cmap='Blues', vmin=0, vmax=1)
    ax.set_title(f'dilation={d} (effective size: {kH_eff}x{kW_eff})')
    ax.set_xticks([])
    ax.set_yticks([])
    
    # 標記原始卷積核位置
    for i in range(kH):
        for j in range(kW):
            if kernel[i, j] > 0:
                ax.text(j * d, i * d, '1', ha='center', va='center', fontsize=12)

plt.suptitle('Dilated Convolution Kernels', fontsize=14)
plt.tight_layout()
plt.show()

### 練習 2：驗證反向傳播的「full convolution」解釋

對輸入的梯度 $dX$ 可以理解為 $dY$ 與翻轉的 $W$ 做 full convolution。

**任務**：直接用 full convolution 計算 dX，與我們的 backward 結果比較。

In [None]:
def full_conv2d(X, W):
    """
    Full convolution（輸出比輸入大）
    
    等效於 padding = kernel_size - 1 的卷積
    """
    N, C_in, H, W_in = X.shape
    C_out, _, kH, kW = W.shape
    
    # Full convolution 的 padding
    pad_h = kH - 1
    pad_w = kW - 1
    
    X_pad = np.pad(X, ((0, 0), (0, 0), (pad_h, pad_h), (pad_w, pad_w)), mode='constant')
    
    H_out = H + kH - 1
    W_out = W_in + kW - 1
    
    Y = np.zeros((N, C_out, H_out, W_out))
    
    for n in range(N):
        for c_out in range(C_out):
            for i in range(H_out):
                for j in range(W_out):
                    receptive = X_pad[n, :, i:i+kH, j:j+kW]
                    Y[n, c_out, i, j] = np.sum(receptive * W[c_out])
    
    return Y

# 驗證 dX 的 full convolution 解釋
np.random.seed(42)
N, C_in, H, W_in = 1, 1, 4, 4
C_out, kH, kW = 1, 2, 2

X = np.random.randn(N, C_in, H, W_in)
W_conv = np.random.randn(C_out, C_in, kH, kW)
b_conv = np.zeros(C_out)

# 前向傳播
Y = conv2d_forward_naive(X, W_conv, b_conv)
dY = np.random.randn(*Y.shape)

# 使用 backward 計算 dX
dX_backward, _, _ = conv2d_backward_naive(dY, X, W_conv)

# 使用 full convolution 計算 dX
# dX = dY * flip(W)，但要處理通道
# 翻轉 W
W_flipped = np.flip(np.flip(W_conv, axis=2), axis=3)

# 注意：這裡簡化為單通道情況
# 對於多通道，需要額外處理
dX_full = full_conv2d(dY, W_flipped)

print(f"dX from backward: shape = {dX_backward.shape}")
print(dX_backward[0, 0])

print(f"\ndX from full conv: shape = {dX_full.shape}")
print(dX_full[0, 0])

print(f"\n差異: {np.max(np.abs(dX_backward - dX_full)):.2e}")

## 總結

在這個 notebook 中，我們深入學習了 Conv2D 層的實作：

### 前向傳播
$$Y[n, c_{out}, i, j] = \sum_{c_{in}} \sum_{p, q} X[n, c_{in}, i \cdot S + p, j \cdot S + q] \cdot W[c_{out}, c_{in}, p, q] + b[c_{out}]$$

### 反向傳播

| 梯度 | 公式 | 解釋 |
|------|------|------|
| $db$ | $\sum_{n,i,j} dY[n,:,i,j]$ | 對空間位置和樣本求和 |
| $dW$ | $dY \star X$ | dY 和 X 的互相關 |
| $dX$ | $dY \ast \text{flip}(W)$ | dY 和翻轉 W 的 full convolution |

### im2col 加速
- 將卷積轉換為矩陣乘法
- 可利用高度優化的 BLAS 庫
- 空間換時間（需要額外記憶體存 col）

### 下一步

接下來我們將實作 **Pooling 層**，這是 CNN 中另一個重要的組件。