本节目标
- 理解实现多层感知机
- 使用pytorch实现多层感知机

前两节使用的线性回归与softmax回归，都是单层神经网络，只有一个计算层----输出层

多层感知机在输入层和输出层间有多层隐藏层，本节的多层感知机只有一个隐藏层，且各层之间依然是全连接

映射依然是线性关系，区别在于隐藏层的输出要经过一个激活函数在传递给输出层
- 详解多层感知机与激活函数  http://tangshusen.me/Dive-into-DL-PyTorch/#/chapter03_DL-basics/3.8_mlp

常用的激活函数包括ReLU函数、sigmoid函数和tanh函数。

### 从零实现多层感知机

![多层感知机图示](http://tangshusen.me/Dive-into-DL-PyTorch/img/chapter03/3.8_mlp.svg)

继续使用Fashion-MNIST数据集。我们将使用多层感知机对图像进行分类。

In [8]:
import torch
import numpy as np
import sys
import torchvision
import torchvision.transforms as transforms

In [21]:
batch_size = 256
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True )
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

定义模型参数

输入个数为784，输出个数为10。实验中，我们设超参数隐藏单元个数为256。

In [11]:
num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)
b1 = torch.zeros(num_hiddens, dtype=torch.float)
W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)
b2 = torch.zeros(num_outputs, dtype=torch.float)
#  w1 b1 隐藏层的权重  784 * 256 矩阵
#  w2 b2 输出层权重    256 * 10  矩阵
params = [W1, b1, W2, b2]
for param in params:
    param.requires_grad_(requires_grad=True)

定义激活函数 使用max

In [12]:
def relu(X): #大于0则输出x，小于0则输出0
    return torch.max(input=X, other=torch.tensor(0.0)) 

定义模型

In [30]:
def net(X):
    X = X.view((-1, num_inputs))
    H = relu(torch.matmul(X, W1) + b1) # 隐藏层计算
    return torch.matmul(H, W2) + b2   # 输出层
# 经测试mm和mul都是矩阵乘法，区别暂不知道

损失函数   直接使用PyTorch提供的包括softmax运算和交叉熵损失计算的函数。

In [14]:
loss = torch.nn.CrossEntropyLoss()

训练模型

In [24]:
def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size # .data 返回和 x 的相同数据 tensor, 但不会加入到x的计算历史里

In [26]:
# 本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进：它的完整实现将在“图像增广”一节中描述
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        # y_hat.argmax(dim=1)返回矩阵y_hat每行中最大元素的索引
        # 与y比较 索引相同即判断正确为1，不等为0
        n += y.shape[0]
    return acc_sum / n
        # 求平均便是准确率

In [16]:
num_epochs, lr = 5, 100.0

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        # train_l_sum训练集整体交叉熵损失值 train_acc_sum训练集准确率
        for X, y in train_iter:
            y_hat = net(X)             # 模型运算
            l = loss(y_hat, y).sum()   # 损失函数

            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()

            l.backward()               # 梯度计算
            if optimizer is None:
                sgd(params, lr, batch_size)  #优化
            else:
                optimizer.step()  # “softmax回归的简洁实现”一节将用到


            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net) ####################
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))


In [31]:
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

epoch 1, loss 0.0011, train acc 0.891, test acc 0.861
epoch 2, loss 0.0011, train acc 0.895, test acc 0.877
epoch 3, loss 0.0011, train acc 0.897, test acc 0.862
epoch 4, loss 0.0011, train acc 0.900, test acc 0.865
epoch 5, loss 0.0010, train acc 0.901, test acc 0.870


### 多层感知机的简洁实现

In [32]:
import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys

定义模型

和softmax回归唯一的不同在于，我们多加了一个全连接层作为隐藏层。它的隐藏单元个数为256，并使用ReLU函数作为激活函数。

In [33]:
class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x shape: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

In [34]:
num_inputs, num_outputs, num_hiddens = 784, 10, 256

net = nn.Sequential(
        FlattenLayer(),               # 改变形状
        nn.Linear(num_inputs, num_hiddens),# 隐藏层
        nn.ReLU(),                         # 激活函数
        nn.Linear(num_hiddens, num_outputs), # 输出层
        )

读取数据

In [35]:
batch_size = 256
train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True )
test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False)

初始化参数

In [36]:
for params in net.parameters():
    init.normal_(params, mean=0, std=0.01)

定义损失与优化函数

In [37]:
loss = torch.nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(net.parameters(), lr=0.5)

训练模型

In [38]:
num_epochs = 5
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

epoch 1, loss 0.0031, train acc 0.698, test acc 0.740
epoch 2, loss 0.0019, train acc 0.821, test acc 0.820
epoch 3, loss 0.0017, train acc 0.842, test acc 0.847
epoch 4, loss 0.0015, train acc 0.857, test acc 0.822
epoch 5, loss 0.0014, train acc 0.863, test acc 0.835
