## 实验过程

1. 初步修改网络结构

首先对网络结构进行修改，使用的第一个CNN模型为两个卷积层加两个全连接层，两个卷积层的输出维度为32和64，第二个卷积层后加入一个MaxPooling，第一个FC层后加入p=0.5的Dropout层，没有添加BN层，训练后的分类结果为95.1

2. 使用GPU进行训练

在试跑了第一个CNN模型后，单单训练2个epoch都需要接近1分钟，于是转为使用GPU训练。使用以下代码将模型和数据载入到GPU中，这样就可以使用GPU进行训练了。考虑到GPU内存有限，训练数据是按照每个batch载入的，测试数据只有2000张图片，所以选择一开始就载入GPU。使用GPU训练20epoch的耗时为30秒左右。需要注意的是，CNN网络的参数量主要来源于全连接层，所以在把CNN提取的特征输入到全连接层前，需要尽量缩小feature_map的尺寸，否则内存很容易不足。

```python
device = torch.device('cuda')
net = Net().to(device)
# move data into GPU
test_x = Variable(test_x).cuda()   
test_y = Variable(test_y).cuda() 
# move data into GPU
b_x, b_y = Variable(b_x).cuda(), Variable(b_y).cuda()
```

3. 网络结构的改进

使用3个卷积层+1个FC层。MNIST数据集的图片尺寸只有28x28，不需要过多的卷积层即可提取到足够的信息，所以我只使用了3个卷积层，使用一个FC层的分类效果和使用2个FC层没啥区别，所以我减少了一个FC层，大大缩减了参数量。在模型结构的选择时，考虑到不同尺寸的卷积核对应的感受野不同，所以我同时使用了5x5和3x3的卷积核，第一个卷积层使用5x5，第二、三个卷积层使用3x3。为了避免过快的丢失细节信息，前两个卷积层的stride都设置为1，padding
也为1，这种做法可以保证经过前两个卷积层都feature_map的尺寸仍然与原图一致。最后一个卷积层的stride为3，提取特征的同时缩小feature_map的尺寸，最后再用一个max_pooling进一步提炼信息。每个卷积层后都使用了一个BN层，激活函数使用ReLU。具体的网络结构如下,最终分类精度为98.8.
```python
self.conv1 = torch.nn.Sequential(torch.nn.Conv2d(1,32,5,padding = 2,stride = 1),
                      torch.nn.BatchNorm2d(32),
                      torch.nn.ReLU(),
                      torch.nn.Conv2d(32,64,3,padding = 1,stride = 1),
                      torch.nn.BatchNorm2d(64),
                      torch.nn.ReLU(),
                      torch.nn.Conv2d(64,128,3,padding =1,stride = 3),
                      torch.nn.BatchNorm2d(128),
                      torch.nn.ReLU(),
                      torch.nn.MaxPool2d(2,2))                                    
self.dense = torch.nn.Sequential(torch.nn.Linear(5*5*128, 10))
```


4. 超参数的设置

为了充分训练模型，Epoch数设置为20，基本训练到20个Epoch时loss已经接近0了，所以没有再增加的必要。Batchsize我选择了128，训练效果比64的要好一些，由于GPU内存限制没有尝试256的Batchsize，学习率使用0.001.


In [1]:
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import torch.nn.functional as F
import numpy as np
from torch.autograd import Variable

# torch.manual_seed(1)

EPOCH = 4
LR = 0.00001
DOWNLOAD_MNIST = False

train_data = torchvision.datasets.MNIST(root='./mnist/', train=True, transform=torchvision.transforms.ToTensor(),
                                        download=DOWNLOAD_MNIST, )
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
print(train_data.train_data.shape)
print(test_data.test_data.shape)

train_x = torch.unsqueeze(train_data.train_data, dim=1).type(torch.FloatTensor) / 255.
train_y = train_data.train_labels
print(train_x.shape)

test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:10000] / 255.  # Tensor on GPU
test_y = test_data.test_labels[:10000]
print(test_x.shape)

torch.Size([60000, 28, 28])
torch.Size([10000, 28, 28])
torch.Size([60000, 1, 28, 28])
torch.Size([10000, 1, 28, 28])




In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(torch.nn.Conv2d(1,32,5,padding =2,stride=1),
                                         torch.nn.BatchNorm2d(32),
                                         torch.nn.ReLU(),
                                         torch.nn.Conv2d(32,64,3,padding =1,stride=3),
                                         torch.nn.BatchNorm2d(64),
                                         torch.nn.ReLU(),
                                         torch.nn.MaxPool2d(2,2))
                                        
        self.dense = torch.nn.Sequential(torch.nn.Linear(5*5*64, 10))
                                                                                   
    def forward(self, x):
        x = self.conv1(x)
        x = x.view(-1, 5*5*64)
        x = self.dense(x)
        
        output = x
        return output

    
device = torch.device('cuda:0')
net = Net().to(device)

print(net)
for parameters in net.parameters():
    print(parameters.size())

optimizer = torch.optim.Adam(net.parameters(), lr=LR)
# loss_func = nn.MSELoss()
loss_func = nn.CrossEntropyLoss()

data_size = 50000
batch_size = 100
LR  = 0.0001
EPOCH = 20

for epoch in range(EPOCH):
    random_indx = np.random.permutation(data_size)
    for batch_i in range(data_size // batch_size):
        indx = random_indx[batch_i * batch_size:(batch_i + 1) * batch_size]

        b_x = train_x[indx, :].to(device)
        b_y = train_y[indx].to(device)

        output = net(b_x)
        loss = loss_func(output, b_y)

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch_i % 100 == 0:
            test_output = net(test_x.to(device))
            pred_y = torch.max(test_output, 1)[1].data.squeeze()
            # pred_y = torch.max(test_output, 1)[1].data.squeeze()
            accuracy = torch.sum(pred_y == test_y.to(device)).type(torch.FloatTensor) / test_y.size(0)
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.3f' % accuracy)

        
test_output = net(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.squeeze()  # move the computation in GPU

print(pred_y, 'prediction number')
print(test_y[:10], 'real number')

Net(
  (conv1): Sequential(
    (0): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(32, 64, kernel_size=(3, 3), stride=(3, 3), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Sequential(
    (0): Linear(in_features=1600, out_features=10, bias=True)
  )
)
10
Epoch:  0 | train loss: 2.6210 | test accuracy: 0.051


RuntimeError: CUDA out of memory. Tried to allocate 958.00 MiB (GPU 0; 8.00 GiB total capacity; 5.63 GiB already allocated; 190.04 MiB free; 5.64 GiB reserved in total by PyTorch)

In [None]:
test_output = net(test_x[:1])
pred_y = torch.max(test_output, 1)[1].data.squeeze()  # move the computation in GPU

print(pred_y, 'prediction number')
print(test_y[:1], 'real number')

In [None]:
test_output

In [None]:
test_x[:1].shape