# From Fully-Connected Layers to Convolutions

- To start off, we can consider an MLP with two-dimensional images $X$ as inputs and their immediate hidden representations $HH$ similarly represented as matrices in mathematics and as two-dimensional tensors in code, where both $X$ and $H$ have the same shape. Let that sink in. We now conceive of not only the inputs but also the hidden representations as possessing spatial structure

$$
\begin{aligned}
{[\mathbf{H}]_{i, j} } &=[\mathbf{U}]_{i, j}+\sum_{k} \sum_{l}[\mathbf{W}]_{i, j, k, l}[\mathbf{X}]_{k, l} \\
&=[\mathbf{U}]_{i, j}+\sum_{a} \sum_{b}[\mathbf{V}]_{i, j, a, b}[\mathbf{X}]_{i+a, j+b}
\end{aligned}
$$

- Now let us invoke the first principle established above: **translation invariance**. This implies that a shift in the input $X$ should simply lead to a shift in the hidden representation $H$
- This is only possible if $V$ and $U$ do not actually depend on $(i, j)$, i.e.

$$
[\mathbf{H}]_{i, j}=u+\sum_{a} \sum_{b}[\mathbf{V}]_{a, b}[\mathbf{X}]_{i+a, j+b}
$$

- Now let us invoke the second principle: **locality**. As motivated above, we believe that we should not have to look very far away from location $(i, j)$ in order to glean relevant information to assess what is going on at $[\mathbf{H}]_{i, j}$.

$$
[\mathbf{H}]_{i, j}=u+\sum_{a=-\Delta}^{\Delta} \sum_{b=-\Delta}^{\Delta}[\mathbf{V}]_{a, b}[\mathbf{X}]_{i+a, j+b}
$$

- fully connected layer + translation + locality = convolution


# Convolutions

![convolution](./images/convolution.png)

- two dimension Cross-Correlation

$$
y_{i, j}=\sum_{a=1}^{h} \sum_{b=1}^{w} w_{a, b} x_{i+a, j+b}
$$

- two dimension convolution

$$
y_{i, j}=\sum_{a=1}^{h} \sum_{b=1}^{w} w_{-a, -b} x_{i+a, j+b}
$$


In [1]:
import torch
from torch import nn
from d2l import torch as d2l

def corr2d(X, K):
    """Compute 2D cross-correlation."""
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
    return Y

In [2]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
corr2d(X, K)

tensor([[19., 25.],
        [37., 43.]])

In [3]:
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super().__init__()
        self.weight = nn.Parameter(torch.rand(kernel_size))
        self.bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        return corr2d(x, self.weight) + self.bias

In [4]:
X = torch.ones((6, 8))
X[:, 2:6] = 0
print(X)

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])


In [5]:
K = torch.tensor([[1.0, -1.0]])
Y = corr2d(X, K)
print(Y)

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])


In [6]:
corr2d(X.t(), K)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [7]:
# Construct a two-dimensional convolutional layer with 1 output channel and a
# kernel of shape (1, 2). For the sake of simplicity, we ignore the bias here
conv2d = nn.Conv2d(1,1, kernel_size=(1, 2), bias=False)

# The two-dimensional convolutional layer uses four-dimensional input and
# output in the format of (example, channel, height, width), where the batch
# size (number of examples in the batch) and the number of channels are both 1
X = X.reshape((1, 1, 6, 8))
Y = Y.reshape((1, 1, 6, 7))
lr = 3e-2  # Learning rate

for i in range(10):
    Y_hat = conv2d(X)
    l = (Y_hat - Y) ** 2
    conv2d.zero_grad()
    l.sum().backward()
    # Update the kernel
    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    print(f'epoch {i + 1}, loss {l.sum():.3f}')

epoch 1, loss 4.642
epoch 2, loss 2.316
epoch 3, loss 1.214
epoch 4, loss 0.667
epoch 5, loss 0.382
epoch 6, loss 0.226
epoch 7, loss 0.137
epoch 8, loss 0.085
epoch 9, loss 0.053
epoch 10, loss 0.033


In [8]:
conv2d.weight.data.reshape((1, 2))

tensor([[ 1.0130, -0.9758]])