# Pytorch与MNIST

## 说明

上一节我们直接使用的是传统方法搭建，手动定义了网络结构和传播方向。

这样的方法可以让使用者更熟悉网络结构和传播方向，对参数的控制也有更多灵活性。

但是相比Keras等框架，这样的网络结构搭建似乎还是有些底层，有时候我们不需要这么灵活，只需要快速搭建一个网络即可。

本节将使用 torch.nn.Sequential() 方法搭建网络，可以看出，这样的用法搭建起来更快。

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

## 设定网络结构

MNIST数据集就不赘述了，这里我们之间开始搭建网络结构。具体结构参见上一节。

这里我们为了对比方便，把两种方法都放上去了。

In [2]:
class Net_normal(nn.Module):
    def __init__(self):
        super(Net_normal, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=20, kernel_size=5, stride=1)
        self.conv2 = nn.Conv2d(in_channels=20, out_channels=50, kernel_size=5, stride=1)
        self.fc1 = nn.Linear(in_features=4*4*50, out_features=500)
        self.fc2 = nn.Linear(in_features=500, out_features=10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

In [20]:
class Net_easy(nn.Module):
    def __init__(self):
        super(Net_easy, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 20, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(20, 50, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2))
        self.dense = nn.Sequential(
            nn.Linear(4*4*50, 500),
            nn.Linear(500, 10))
    def forward(self, x):
        conv_out = self.conv(x)
        res = conv_out.view(-1, 4*4*50)
        out = self.dense(res)
        return F.log_softmax(out, dim=1)

### 特别说明：关于两种用法的区别

上面用了两种方法来设定一个网络结构，可以看到在卷积层和全连接层的方法都是nn。

区别在于：
- 第一种方法的 Relu 和 maxpool 都是在 Functional 里面调用的，这是作为一个函数调用。
- 第二种方法则是在 nn 里面调用的，因为 Sequential 形成了一个容器，nn.ReLU必须添加到该容器才能使用。

## 加载MNIST数据集

下列加载代码分为三步：
1. 定义数据变换规则
2. 读取MNIST数据集
3. 加载数据

同第二节。

In [5]:
data_tf = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,))])

mnist_trainset = datasets.MNIST(root='./MNIST_data/MNIST', train=True, download=True, transform=data_tf)
mnist_testset = datasets.MNIST(root='./MNIST_data/MNIST', train=False, download=True, transform=data_tf)

train_loader = DataLoader(mnist_trainset, batch_size=1000, shuffle=True)
test_loader = DataLoader(mnist_testset, batch_size=1000, shuffle=True)  # 测试集无需打乱

## 训练神经网络

### 加载网络

网络初始化代码如下。其中：优化器为SGD，损失函数为交叉熵损失函数。

这里我们使用 Net_easy 作为我们的网络。

In [21]:
net = Net_easy()
if torch.cuda.is_available():  # 如果GPU可以使用
    net = net.cuda(1)
    print("CUDA is available.")
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
loss_func = torch.nn.CrossEntropyLoss()

print(net)

CUDA is available.
Net_easy(
  (conv): Sequential(
    (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Sequential(
    (0): Linear(in_features=800, out_features=500, bias=True)
    (1): Linear(in_features=500, out_features=10, bias=True)
  )
)


In [22]:
net.train()  # 启用train模式
for epoch in range(5):
    for batch_ndx, data in enumerate(train_loader):  # 按照一个batch = 1000来抽取数据
        img, label = data        
        # 前向传播
        if torch.cuda.is_available():
            img = img.cuda(1)
            label = label.cuda(1)
            # print("使用CUDA训练")
        else:
            pass
        output = net(img)
        loss = loss_func(output, label)
        
        # 反向传播
        optimizer.zero_grad()  # 梯度归零
        loss.backward()  # 损失函数反向传播
        optimizer.step()
        
        if batch_ndx%10 == 0:
            print('epoch: {}, batch_ndx: {}, loss: {:.4}'.format(epoch, batch_ndx, loss.data.item()))  

epoch: 0, batch_ndx: 0, loss: 2.304
epoch: 0, batch_ndx: 10, loss: 2.193
epoch: 0, batch_ndx: 20, loss: 2.051
epoch: 0, batch_ndx: 30, loss: 1.902
epoch: 0, batch_ndx: 40, loss: 1.698
epoch: 0, batch_ndx: 50, loss: 1.481
epoch: 1, batch_ndx: 0, loss: 1.217
epoch: 1, batch_ndx: 10, loss: 0.9898
epoch: 1, batch_ndx: 20, loss: 0.8328
epoch: 1, batch_ndx: 30, loss: 0.7654
epoch: 1, batch_ndx: 40, loss: 0.6457
epoch: 1, batch_ndx: 50, loss: 0.5622
epoch: 2, batch_ndx: 0, loss: 0.5362
epoch: 2, batch_ndx: 10, loss: 0.4983
epoch: 2, batch_ndx: 20, loss: 0.4756
epoch: 2, batch_ndx: 30, loss: 0.5093
epoch: 2, batch_ndx: 40, loss: 0.4057
epoch: 2, batch_ndx: 50, loss: 0.4103
epoch: 3, batch_ndx: 0, loss: 0.344
epoch: 3, batch_ndx: 10, loss: 0.3819
epoch: 3, batch_ndx: 20, loss: 0.4078
epoch: 3, batch_ndx: 30, loss: 0.3618
epoch: 3, batch_ndx: 40, loss: 0.3438
epoch: 3, batch_ndx: 50, loss: 0.3601
epoch: 4, batch_ndx: 0, loss: 0.3713
epoch: 4, batch_ndx: 10, loss: 0.3291
epoch: 4, batch_ndx: 20, 