# 卷积神经网络

- 平移不变性：不管检测对象出现在图像中的哪个位置，神经网络的前面几层应该对相同的图像区域具有相似的反应。

- 神经网络的前面几层应该只探索输入图像中的局部区域，而不过度在意图像中相隔较远区域的关系。最终，可以聚合这些局部特征，以在整个图像级别进行预测。

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

In [2]:
def corr2d(X, K):
    """
    Compute 2D cross-correlation.
    X: input tensor of shape (H, W)
    K: kernel tensor of shape (h, w)
    Returns: output tensor of shape (H - h + 1, W - w + 1)
    """
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()

    return Y

假设输入形状为$n_h\times n_w$，卷积核形状为$k_h\times k_w$，则输出形状为$(n_h - k_h + 1)\times (n_w - k_w + 1)$

In [3]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

In [4]:
X = torch.ones((6, 8))
X[:, 2:6] = 0
X

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])

In [5]:
K = torch.tensor([[1.0, -1.0]])
K

tensor([[ 1., -1.]])

In [6]:
Y = corr2d(X, K)

In [7]:
conv2d = nn.Conv2d(1, 1, kernel_size=(1, 2), bias=False)

X = X.reshape((1, 1, 6, 8))
Y = Y.reshape((1, 1, 6, 7))
lr = 1e-3

for i in range(1000):
    Y_hat = conv2d(X)
    l = (Y_hat - Y) ** 2
    conv2d.zero_grad()
    l.sum().backward()

    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    if (i + 1) % 100 == 0:
        print(f'epoch {i+1}, loss {l.sum():.3f}, weights {conv2d.weight.data.reshape((1, 2))}')

epoch 100, loss 1.522, weights tensor([[ 0.6482, -0.6480]])
epoch 200, loss 0.136, weights tensor([[ 0.8948, -0.8948]])
epoch 300, loss 0.012, weights tensor([[ 0.9685, -0.9685]])
epoch 400, loss 0.001, weights tensor([[ 0.9906, -0.9906]])
epoch 500, loss 0.000, weights tensor([[ 0.9972, -0.9972]])
epoch 600, loss 0.000, weights tensor([[ 0.9992, -0.9992]])
epoch 700, loss 0.000, weights tensor([[ 0.9997, -0.9997]])
epoch 800, loss 0.000, weights tensor([[ 0.9999, -0.9999]])
epoch 900, loss 0.000, weights tensor([[ 1.0000, -1.0000]])
epoch 1000, loss 0.000, weights tensor([[ 1.0000, -1.0000]])


假设输入形状为$n_h\times n_w$，卷积核形状为$k_h\times k_w$，并且添加$p_h$行填充和$p_w$列填充，则输出形状为$(n_h - k_h + p_h + 1)\times (n_w - k_w + p_w + 1)$

一般情况下设置$p_h = k_h - 1$, $p_w = k_w - 1$，使输出和输入具有相同的高度和宽度。并且在CNN中，卷积核的高度和宽度通常为奇数，方便在输入的四周填充相同高度或宽度的行列。这样，输出$Y[i, j]$是以$X[i, j]$为中心，与卷积核进行互相关运算得到的结果。

In [None]:
def comp_conv2d(conv2d, X):
    X = X.reshape((1, 1) + X.shape)
    Y = conv2d(X)
    return Y.reshape(Y.shape[2:])

X = torch.rand(size=(8, 8))

torch.Size([8, 8])

In [10]:
conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1)
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

In [9]:
conv2d = nn.Conv2d(1, 1, kernel_size=(5, 3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

当垂直步幅为$s_h$，水平步幅为$s_w$时，输出形状为$\dfrac{(n_h - k_h + p_h + s_h)}{s_h} \times \dfrac{(n_w - k_w + p_w + s_w)}{s_w}$

In [13]:
conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape

torch.Size([4, 4])