# CNNs


## Simple Convolutional Layer from scratch


- CNN中卷积层实际上是互相关计算，卷积的矩阵是互相关矩阵的上下、左右反转，但这无所谓，因为不管是卷积还是互相关，里面的参数都是学出来的，所以用哪种方式学出来的矩阵和输入的乘积结果都是不变的。

- 特征图和感受野：对于某一filter，卷积层的输出为这个filter的特征图；感受野就是权重共享的范围，即在输出中每一像素点对应输入中决定该点值的那一个范围，范围大小为filter.shape。

- 卷积层的输出不仅是h和w，还有一个channel，输出的channel是filter的个数。
  - 若输入通道为3，输出通道为1，即用一种卷积核去卷三种输入特征的图，输出的一个特征是将三者相加
    - 一种卷积核即（输入通道数\*高\*宽），在（此例中）空间上是一个立方体
  - 若输入通道为3，输出通道为4，用4种卷积核去卷3种输入特征的图，此时卷积核的维度为：（卷积核高度、卷积核宽度、输入通道数、输出通道数（卷积核个数））一共4个,每个shape（3\*h\*w）

- **CNN的公式**：$H_{out}=\frac{H_{in}+2*padding-(dilation*(kernel-1)+1)}{stride}+1$
    - 其中，$\frac{H_{in}-kernel}{stride}+1$，可由数列推算。也可理解为：+1是因为要确保起始位置是可以的，然后再在[H_in-kernel:-1]这个区间去找，这时候向下取整，因为要确保这个位置是可以存放下kernel的，即position+kernel<[-1]
    - 对于分母，最终得kernel size为原kernel-1（扩张的个数）\*扩张的范围+1（最后一行/列），最终得H_in得加上padding，所以formula如上。
    - 注意，在CNN中，卷积是向下取整，池化向上取整。

- 1\*1卷积层：输出的长宽不变，变的是channel，可以看成是在feature维度上的降维。

- 一些细节：
  - python中的sum
    - sum(): 计算所有元素相加
    - sum(axis=0): 计算每一列之和，返回一行
    - sum(axis=1): 计算每一行之和，返回一列
  - torch中的sum(input, dim, keepdim=False): dim是the dimension or dimensions to reduce，如果是1，那么就消除列，变成一列（x，1），也可以理解为在列的角度进行sum，对每一行，所有列相加。与python的sum是一样。
  - torch.argmax()也是这样。
  
    

### Convolution layer

In [1]:
import torch
from torch import nn as nn


In [2]:
# The calculation of convolution
def conv(X,Kernel):
    h,w = Kernel.shape
    Y = torch.zeros(X.shape[0] - h + 1, X.shape[1] - w + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j] = (X[i:i+h,j:j+w] * Kernel).sum()
    return Y

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
conv(X, K)

tensor([[19., 25.],
        [37., 43.]])

In [3]:
# The layer of convolution
class conv_layer(nn.Module):
    def __init__(self, kernel_size):
        super(conv_layer,self).__init__()
        self.kernel = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))
        
    def forward(self, X):
        Y = conv(X, self.kernel) + self.bias
        return Y

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = [2,2]  

net = conv_layer(K)
print(net)
print(net(X))



conv_layer()
tensor([[-3.0843, -1.9405],
        [ 0.3472,  1.4910]], grad_fn=<AddBackward0>)


In [4]:
# An example to perform convolution
X = torch.ones(6, 8)
X[:, 2:6] = 0
print('X:',X)

K = torch.tensor([[1, -1]])
Y = conv(X, K)
print('Y',Y)

X: tensor([[1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.],
        [1., 1., 0., 0., 0., 0., 1., 1.]])
Y tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],
        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])


In [12]:
# Learning process for kernel parameters
net = conv_layer(kernel_size=(1,2))

step = 20
lr = 0.01
for i in range(step):
    Y_hat = net(X)
    loss = ((Y_hat-Y)**2).sum()
    loss.backward()
    net.kernel.data -= lr*net.kernel.grad  # use -= to keep the change
    net.bias.data -= lr*net.bias.grad
    
    #net.kernel.grad.fill_(0)
    #net.bias.grad.fill_(0)
    net.kernel.grad.zero_()
    net.bias.grad.zero_()
    if (i+1) % 5 ==0:
        print('step and loss',i+1,loss.item())
    
    
print('kernel:', net.kernel.data)
print('bias:', net.bias.data)
    

step and loss 5 18.151744842529297
step and loss 10 2.075840473175049
step and loss 15 0.24572691321372986
step and loss 20 0.03135181963443756
kernel: tensor([[ 1.0017, -0.9571]])
bias: tensor([-0.0250])


### multi-channel in/out-put Convolutional layer from scratch

- 一些细节：
  - torch.cat与torch.stack
    - torch.cat(tensors,dim=0,out=None)→ Tensor
      - cat是对tensors按照指定的维度进行拼接，在该维度上叠加，返回的维度与输入的tensors的维度是一样的（输入的多个tensors维度也必须一样）
    - torch.stack(tensors,dim=0,out=None)→ Tensor
      - stack是在指定的位置上新开一个维度，进行拼接。
  - torch.rand(*sizes, out=None) → Tensor
  - torch.tensor和torch.Tensor
    - Tensor:即torch.FloatTensor()
      - 可以通过.float(),.long()等方法转换
      - a = torch.Tensor()可以创建空tensor
      - Tensor(*size)，e.g.Tensor(2,3)
    - tensor:tensor(data)会拷贝data的数据类型
  - PyTorch中的view的赋值是不改变内存地址的，所以view赋值的新东西永远是原来的tensor，所以view成x',y'可以view回来x,y

In [29]:
a = torch.LongTensor(2,3)
print(a)
print(a.size())

b = torch.tensor([2,3])
print(b)
print(b.size())

tensor([[0, 0, 0],
        [0, 0, 0]])
torch.Size([2, 3])
tensor([2, 3])
torch.Size([2])


In [21]:
# the size of X: (channel,highet,width) and K(out_channel,in_channel,height,width)

def conv_multi_in(X,K):
    res = conv(X[0,:,:],K[0,:,:])
    for i in range(1,X.shape[0]):
        res += conv(X[i,:,:],K[i,:,:])
    return res

def conv_multi_in_out(X,K):
    return torch.stack([conv_multi_in(X,k) for k in K])

X = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
              [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])
K = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])

print('the shape of K and X is:',K.shape,X.shape)
print('-------the result of multi-in conv:')
print(conv_multi_in(X, K))
print(conv_multi_in(X, K).shape)

K_multi = torch.stack([K,K+1,K+2])
print('the shape of K:',K_multi.shape)

print('-------the result of multi_in and multi_out:')

print(conv_multi_in_out(X,K_multi))
print(conv_multi_in_out(X,K_multi).shape)


the shape of K and X is: torch.Size([2, 2, 2]) torch.Size([2, 3, 3])
-------the result of multi-in conv:
tensor([[ 56.,  72.],
        [104., 120.]])
torch.Size([2, 2])
the shape of K: torch.Size([3, 2, 2, 2])
-------the result of multi_in and multi_out:
tensor([[[ 56.,  72.],
         [104., 120.]],

        [[ 76., 100.],
         [148., 172.]],

        [[ 96., 128.],
         [192., 224.]]])
torch.Size([3, 2, 2])


In [44]:
a = torch.tensor([[[1,2,3,4],[5,6,7,8]]])
print(a.size())
b = a.view((1,1,8))
c = b.clone().view(2,4,1)
print(b)
print(c)

torch.Size([1, 2, 4])
tensor([[[1, 2, 3, 4, 5, 6, 7, 8]]])
tensor([[[1],
         [2],
         [3],
         [4]],

        [[5],
         [6],
         [7],
         [8]]])


In [49]:
# 1*1 convolution
def conv_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = X.view(c_i, h * w)
    K = K.view(c_o, c_i)
    Y = torch.mm(K, X)  # 全连接层的矩阵乘法
    return Y.view(c_o, h, w)


### Pooling layer

- 若在pooling层用stride，pooling的output为$h_{out}=f(\frac{h_in+padding*2-kernel_size+1}{stride})$，f的意思是求上界

- 一些细节
  - pooling的时候用torch.cat，因为不需要像卷积那样将输入的通道数和kernel乘积相加，再将kernel_num个乘积重新分配在新的维度上。pooling不改变特征维度，只是对每个特征图进行operation，所以只需要cat就行，cat的dim就是通道的维度，对每个通道进行池化。

In [62]:
def pool2d(X, pool_size, mode='max'):
    X = X.float()
    h, w = X.shape
    p_h, p_w = pool_size
    y_h, y_w = h-p_h+1, w-p_w+1
    y = torch.zeros(y_h,y_w)
    for i in range(0,y_h):
        for j in range(0,y_w):
            if mode=='max':
                y[i][j] = X[i:i+p_h,j:j+p_w].max()
            elif mode=='mean':
                y[i][j] = X[i:i+p_h,j:j+p_w].min()
    return y
        
def pool2d_multi(X,pool_size,mode='max'):
    return torch.cat(([pool2d(x,pool_size,mode) for x in X]),dim=0)

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(pool2d(X,(2,2),'max'))


tensor([[4., 5.],
        [7., 8.]])


In [63]:
X = torch.arange(16, dtype=torch.float).view((2, 4, 2))
print(X)
print(pool2d_multi(X,(2,2),'max'))

tensor([[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.],
         [ 6.,  7.]],

        [[ 8.,  9.],
         [10., 11.],
         [12., 13.],
         [14., 15.]]])
tensor([[ 3.],
        [ 5.],
        [ 7.],
        [11.],
        [13.],
        [15.]])


### extra: stride and padding


- stride和padding在pooling层和convolution层都有用到，这里以convolution层为背景实现。 
- pooling的stride的default是kernel_size，padding是无

In [70]:
def conv_with_stride(X, Kernel, stride):
    h,w = Kernel.shape
    Y = torch.zeros((X.shape[0] - h) // stride + 1, (X.shape[1] - w) // stride + 1)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j] = (X[i*stride:i*stride+h,j*stride:j*stride+w] * Kernel).sum()
    return Y

X = torch.tensor([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]])
K = torch.tensor([[0, 1], [2, 3]])
conv_with_stride(X, K, 2)

tensor([[24., 36.],
        [72., 84.]])

In [83]:
def conv_with_padding(X_old, Kernel, padding):
    h,w = Kernel.shape
    X = torch.zeros(X_old.shape[0]+padding*2, X_old.shape[1]+2*padding)
    X[padding:padding+X_old.shape[0], padding:padding+X_old.shape[1]] += X_old
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j] = (X[i:i+h,j:j+w] * Kernel).sum()
    return Y

X = torch.tensor([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15]])
K = torch.tensor([[0, 1], [2, 3]])
conv_with_padding(X, K, 1)

tensor([[ 0.,  3.,  8., 13.,  6.],
        [12., 24., 30., 36., 14.],
        [28., 48., 54., 60., 22.],
        [44., 72., 78., 84., 30.],
        [12., 13., 14., 15.,  0.]])

## CNNs (existing networks)

### LeNet


In [84]:
# Import packages

import time
import torch 
from torch import nn, optim
import d2lzh_pytorch as d2dl


In [86]:
# Hyperparameters
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
batch_size = 256
lr, num_epochs = 0.001, 5

# Load Data
train_iterator, test_iterator = d2dl.load_data_fashion_mnist(batch_size,root='/Users/yanzheyuan/coding/dataset_pytorch/')

# Define Model
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet,self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5),
            nn.Sigmoid(),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(6,16,5),
            nn.Sigmoid(),
            nn.MaxPool2d(2,2)
        )
        self.fc = nn.Sequential(
            nn.Linear(16*4*4,120),
            nn.Sigmoid(),
            nn.Linear(120,84),
            nn.Sigmoid(),
            nn.Linear(84,10)
        )
    def forward(self,img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0],-1))
        return output

net = LeNet()
print(net)
  
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

# Train Model
# 本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进。
def evaluate_accuracy(data_iter, net, device=None):
    if device is None and isinstance(net, torch.nn.Module):
        # 如果没指定device就使用net的device
        device = list(net.parameters())[0].device
    acc_sum, n = 0.0, 0
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(net, torch.nn.Module):
                net.eval() # 评估模式, 这会关闭dropout
                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                net.train() # 改回训练模式
            else: # 自定义的模型, 3.13节之后不会用到, 不考虑GPU
                if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数
                    # 将is_training设置成False
                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() 
                else:
                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() 
            n += y.shape[0]
    return acc_sum / n

# 本函数已保存在d2lzh_pytorch包中方便以后使用
def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
    net = net.to(device)
    print("training on ", device)
    loss = torch.nn.CrossEntropyLoss()
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n, batch_count, start = 0.0, 0.0, 0, 0, time.time()
        for X, y in train_iter:
            X = X.to(device)
            y = y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            optimizer.zero_grad()
            l.backward()
            optimizer.step()
            train_l_sum += l.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))


train_ch5(net, train_iterator, test_iterator, batch_size, optimizer, device, num_epochs)

LeNet(
  (conv): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Sigmoid()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): Sigmoid()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=256, out_features=120, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): Sigmoid()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)
training on  cpu
epoch 1, loss 1.8338, train acc 0.323, test acc 0.565, time 8.3 sec
epoch 2, loss 0.9761, train acc 0.615, test acc 0.662, time 8.2 sec
epoch 3, loss 0.8065, train acc 0.700, test acc 0.714, time 8.1 sec
epoch 4, loss 0.7136, train acc 0.734, test acc 0.741, time 8.2 sec
epoch 5, loss 0.6564, train acc 0.749, test acc 0.755, time 8.2 sec


In [87]:
for X,y in train_iterator:
    print(X.size())
    break

torch.Size([256, 1, 28, 28])
