## Chapter 7 : CNN
1. **Invariance**: translation equivariance, locality -> The earliest layers should respond similarly to the same patch and focus on local regions.
2. **Convolution**: math is $(f * g)(i, j) = \sum_a \sum_b f(a, b)  g(i - a, j - b)$, remind that **cross-correlation** is $(f * g)(i, j) = \sum_a \sum_b f(a, b)  g(i + a, j + b)$
   - The difference is not important as we will learn the kernel, `k_conv_learned = k_corr_learned.T`, or `conv(X, k_conv_learned) = corr(X, k_corr_learned)`
3. **Receptive Field**ï¼š for any element (tensors on the conv layer) x, all the elements that may effect x in the previous layers in the forward population.
4. **Padding, Stride**: $\lfloor (n_h - k_h + p_h + s_h) / s_h \rfloor \times \lfloor (n_w - k_w + p_w + s_w) / s_w \rfloor$, often `p_h = k_h - 1`, the same for `p_w`. `p_h = p_h_upper + p_h_lower`.
5. **Channel**:
   - multi in $c_i$ -> kernel must also have the same channels ($c_i \times k_h \times k_w$), then add them up.
   - multi out $c_o$ -> kernel with $c_o \times c_i \times k_h \times k_w$, get $c_o$ output channels.
6. use `torch.stack` to stack tensors
7. **Pooling**: mitigating the sensitivity of convolutional layers to location and of spatially downsampling representations.

In [1]:
import torch
from torch import nn

In [63]:
reduce_sum = lambda x, *args, **kwargs: x.sum(*args, **kwargs)
def corr2d(X, K):
    h, w = K.shape
    Y = torch.zeros(X.shape[0] - h + 1, X.shape[1] - w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = reduce_sum((X[i:i+h, j:j+w] * K))
    return Y

In [64]:
X = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])
print(X, K)
corr2d(X, K)

tensor([[0., 1., 2.],
        [3., 4., 5.],
        [6., 7., 8.]]) tensor([[0., 1.],
        [2., 3.]])


tensor([[19., 25.],
        [37., 43.]])

In [65]:
def corr2d_multi_in(X, k):
    return sum(corr2d(X[i], k[i]) for i in range(X.shape[0]))
    # for x, k in zip(X, K):
    #     print(x, k)
    #     print(corr2d(x, k))
    # return sum(corr2d(x, k) for x, k in zip(X, K))

In [66]:
new_X = torch.tensor([[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]],
               [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]])
new_K = torch.tensor([[[0.0, 1.0], [2.0, 3.0]], [[1.0, 2.0], [3.0, 4.0]]])

print(new_X.shape, new_K.shape)
corr2d_multi_in(new_X, new_K)

torch.Size([2, 3, 3]) torch.Size([2, 2, 2])


tensor([[ 56.,  72.],
        [104., 120.]])

In [67]:
def corr2d_multi_in_out(X, K):
    return torch.stack([corr2d_multi_in(X, k) for k in K], 0)

In [69]:
multi_channel_k = torch.stack([new_K, new_K+1, new_K+2], 0)
print(multi_channel_k.shape)

torch.Size([3, 2, 2, 2])


In [70]:
corr2d_multi_in_out(new_X, multi_channel_k)

tensor([[[ 56.,  72.],
         [104., 120.]],

        [[ 76., 100.],
         [148., 172.]],

        [[ 96., 128.],
         [192., 224.]]])

In [91]:
edge_k = torch.tensor([1.0, -1.0]).reshape(1, 2)
edge_x = torch.ones((6, 8))
edge_x[:, 2:6] = 0
print(edge_x)

tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])


In [92]:
corr2d(edge_x, edge_k)

tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])

In [93]:
def pool2d(X, pool_size, pool_method="max"):
    p_h, p_w = pool_size
    x_h, x_w = X.shape
    Y = torch.zeros(x_h - p_h + 1, x_w - p_w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if pool_method == "max":
                Y[i, j] = X[i:i+p_h, j:j+p_w].max()
            elif pool_method == "avg":
                Y[i, j] = X[i:i+p_h, j:j+p_w].mean()
    return Y

In [94]:
print(pool2d(edge_x, (2,2)))

tensor([[1., 1., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 1., 1.]])


In [95]:
print(pool2d(corr2d(edge_x, edge_k), (2,2)))

tensor([[1., 1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0.]])


In [96]:
edge_x[:, 2] = 1
edge_x[:, 4] = 1
print(edge_x)

tensor([[1., 1., 1., 0., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1., 0., 1., 1.],
        [1., 1., 1., 0., 1., 0., 1., 1.]])


In [97]:
print(corr2d(edge_x, edge_k))

tensor([[ 0.,  0.,  1., -1.,  1., -1.,  0.],
        [ 0.,  0.,  1., -1.,  1., -1.,  0.],
        [ 0.,  0.,  1., -1.,  1., -1.,  0.],
        [ 0.,  0.,  1., -1.,  1., -1.,  0.],
        [ 0.,  0.,  1., -1.,  1., -1.,  0.],
        [ 0.,  0.,  1., -1.,  1., -1.,  0.]])


In [98]:
print(pool2d(corr2d(edge_x, edge_k), (2,2)))

tensor([[0., 1., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1., 0.]])


In [101]:
print(X.shape)
print(torch.cat((X,X), dim=0).shape)
print(torch.stack((X,X), dim=0).shape)
print(torch.stack((X,X), dim=1).shape)

torch.Size([3, 3])
torch.Size([6, 3])
torch.Size([2, 3, 3])
torch.Size([3, 2, 3])


In [102]:
net = nn.Sequential(
            nn.LazyConv2d(6, kernel_size=5, padding=2), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.LazyConv2d(16, kernel_size=5), nn.Sigmoid(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Flatten(),
            nn.LazyLinear(120), nn.Sigmoid(),
            nn.LazyLinear(84), nn.Sigmoid(),
            nn.LazyLinear(10))

In [106]:
def layer_summary(net, X_shape):
    X = torch.randn(*X_shape)
    for layer in net:
        X = layer(X)
        print(layer.__class__.__name__, 'output shape:\t', X.shape)

In [107]:
layer_summary(net, (256, 3, 28, 28))

Conv2d output shape:	 torch.Size([256, 6, 28, 28])
Sigmoid output shape:	 torch.Size([256, 6, 28, 28])
AvgPool2d output shape:	 torch.Size([256, 6, 14, 14])
Conv2d output shape:	 torch.Size([256, 16, 10, 10])
Sigmoid output shape:	 torch.Size([256, 16, 10, 10])
AvgPool2d output shape:	 torch.Size([256, 16, 5, 5])
Flatten output shape:	 torch.Size([256, 400])
Linear output shape:	 torch.Size([256, 120])
Sigmoid output shape:	 torch.Size([256, 120])
Linear output shape:	 torch.Size([256, 84])
Sigmoid output shape:	 torch.Size([256, 84])
Linear output shape:	 torch.Size([256, 10])
