### 填充
1. 卷积层的输出形状由输入形状和卷积核窗口形状决定。 (n - h + 1) * (n - w + 1)
2. padding = (h/w - 1)/2 使输入和输出具有相同的高和宽。 上-上取整。上下同时填充。
3. conv2d传入的shape是 [batch_size, channels, height, width]

In [2]:
import torch
import torch.nn as nn
import sys
sys.path.append("..") 
import d2lzh_pytorch as d2l

In [3]:
X = torch.rand(8, 8)
#输入的通道为1，输出结果也是1个filter，filter宽度为3， 填充为1
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)

def comp(conv2d, X):
    x = X.view((1, 1) + X.shape) #shape batch, channel, h, w
    Y = conv2d(x)
    return Y.view(Y.shape[2:])

z = comp(conv2d, X)
print(z.shape)

torch.Size([8, 8])


In [4]:
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(5, 3), padding=(2, 1))
z = comp(conv2d, X)
print(z.shape)


torch.Size([8, 8])


### 步幅
$$\lfloor(n_h-k_h+p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+p_w+s_w)/s_w\rfloor.$$
<br/>记住这个万能公式就好了，管他什么步长步幅

In [5]:
conv2d = nn.Conv2d(1, 1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))
#8-3+0+3 / 3 = 2；8-5+1+4 /3 = 2
print(comp(conv2d, X).shape)

torch.Size([2, 2])


### channels
1. 通道n>1时，卷积核就变成了一个立方体。然后按通道相加，输出结果还是一个二维数组，非立方体。
2. 设卷积核输入通道数和输出通道数分别为$c_i$和$c_o$，高和宽分别为$k_h$和$k_w$。如果希望得到含多个通道的输出，我们可以为每个输出通道分别创建形状为$c_i\times k_h\times k_w$的核数组。将它们在输出通道维上连结，卷积核的形状即$c_o\times c_i\times k_h\times k_w$。
3. $1\times 1$卷积层输入和输出具有相同的高和宽，被当作保持高和宽维度形状不变的全连接层使用，通常用来调整网络层之间的通道数。

In [7]:
x = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]], [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
k = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])
print(x.shape, k.shape)

torch.Size([2, 3, 3]) torch.Size([2, 2, 2])


In [8]:
def comp2(x, k):
    #先计算第一个通道的，之后迭代累加！
    res = d2l.corr2d(x[0, :, :], k[0, :, :])
    for i in range(1, x.shape[0]):
        res += d2l.corr2d(x[i, :, :], k[i, :, :])
    return res

y = comp2(x, k)
print(y)

tensor([[ 56.,  72.],
        [104., 120.]])


In [9]:
def corr2d_multi_in_out(X, K):
    # 对K的第0维遍历，每次同输入X做互相关计算。所有结果使用stack函数合并在一起
    return torch.stack([comp2(X, k) for k in K])

In [10]:
t = torch.stack([k, k + 1, k + 2])
print(t.shape)

torch.Size([3, 2, 2, 2])


In [13]:
# 是卷积核的shape改变了！！所以传入的k改成t！
print(corr2d_multi_in_out(x, t))

tensor([[[ 56.,  72.],
         [104., 120.]],

        [[ 76., 100.],
         [148., 172.]],

        [[ 96., 128.],
         [192., 224.]]])


---------
<br/>
### Pooling
1. 最大池化与平均池化。
2. 默认情况下，`MaxPool2d`实例里步幅和池化窗口形状相同。
3. 池化层的输出通道数跟输入通道数相同。

In [14]:
def pool2d(X, pool_size, mode='max'):
    X = X.float()
    p_h, p_w = pool_size
    Y = torch.zeros(X.shape[0] - p_h + 1, X.shape[1] - p_w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j] = X[i: i + p_h, j: j + p_w].max()
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()       
    return Y

In [15]:
X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
pool2d(X, (2, 2))

tensor([[4., 5.],
        [7., 8.]])

In [16]:
pool2d(X, (2, 2), 'avg')

tensor([[2., 3.],
        [5., 6.]])

In [17]:
X = torch.arange(16, dtype=torch.float).view((1, 1, 4, 4))
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])

In [18]:
pool2d = nn.MaxPool2d(3)
pool2d(X)

tensor([[[[10.]]]])

In [19]:
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]]]])

In [20]:
pool2d = nn.MaxPool2d((2, 4), padding=(1, 2), stride=(2, 3))
pool2d(X)

tensor([[[[ 1.,  3.],
          [ 9., 11.],
          [13., 15.]]]])

In [21]:
X = torch.cat((X, X + 1), dim=1)
X

tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]],

         [[ 1.,  2.,  3.,  4.],
          [ 5.,  6.,  7.,  8.],
          [ 9., 10., 11., 12.],
          [13., 14., 15., 16.]]]])

In [22]:
pool2d = nn.MaxPool2d(3, padding=1, stride=2)
pool2d(X)

tensor([[[[ 5.,  7.],
          [13., 15.]],

         [[ 6.,  8.],
          [14., 16.]]]])