## 第四章 前馈神经网络

### 1、在课堂中我们学习了Logistic激活函数的前向和反向求解过程。接下来，请动手实现 LeakyReLU 激活函数的算子，支持前向计算和反向的梯度计算，并与Paddle API实现结果对比，验证实现的准确性。<span style="color:red">(必修题)</span>

In [1]:
import paddle
from nndl.op import Op

paddle.seed(10)

class myLeakyReLU(Op): 
    def __init__(self, alpha=0.01):
        super(Op).__init__()
        self.alpha = alpha
        self.outputs = None

    def __call__(self, X):
        return self.forward(X)

    def forward(self, inputs):
        a1 = (paddle.cast((inputs > 0), dtype='float32') * inputs)
        a2 = (paddle.cast((inputs <= 0), dtype='float32') * (self.alpha * inputs))
        self.outputs = a1 + a2
        return self.outputs

    def backward(self, grads):
        part_1 = (paddle.cast((self.outputs > 0), dtype='float32') * 1)
        part_2 = (paddle.cast((self.outputs <= 0), dtype='float32') * self.alpha)
        outputs_grad_inputs = part_1 + part_2
        return paddle.multiply(grads, outputs_grad_inputs)

In [2]:
x = paddle.rand([5,4])

my_leakyrelu = myLeakyReLU(alpha=0.5)
cal_y_1 = my_leakyrelu(x)
print(cal_y_1)

W0726 20:25:47.325665 12459 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0726 20:25:47.330221 12459 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.


Tensor(shape=[5, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.00022478, 0.85032451, 0.01352605, 0.91611278],
        [0.67955703, 0.89689773, 0.14975823, 0.98008388],
        [0.62906164, 0.43512452, 0.89547771, 0.60389930],
        [0.82229692, 0.09566654, 0.91918784, 0.11712553],
        [0.76633900, 0.95084548, 0.26211733, 0.66611362]])


In [3]:
paddle_leakyrelu = paddle.nn.LeakyReLU(negative_slope=0.5)
cal_y_2 = paddle_leakyrelu(x)
print(cal_y_2)
print(cal_y_2.backward())
print(cal_y_2)

Tensor(shape=[5, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.00022478, 0.85032451, 0.01352605, 0.91611278],
        [0.67955703, 0.89689773, 0.14975823, 0.98008388],
        [0.62906164, 0.43512452, 0.89547771, 0.60389930],
        [0.82229692, 0.09566654, 0.91918784, 0.11712553],
        [0.76633900, 0.95084548, 0.26211733, 0.66611362]])
None
Tensor(shape=[5, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.00022478, 0.85032451, 0.01352605, 0.91611278],
        [0.67955703, 0.89689773, 0.14975823, 0.98008388],
        [0.62906164, 0.43512452, 0.89547771, 0.60389930],
        [0.82229692, 0.09566654, 0.91918784, 0.11712553],
        [0.76633900, 0.95084548, 0.26211733, 0.66611362]])


可以看出实现的功能基本准确。

### 2.在课程中，我们利用前馈神经网络解决了鸢尾花分类任务。那请尝试基于 MNIST 手写数字识别数据集，设计合适的前馈神经网络进行实验，并 取得 95% 以上的准确率，并统计参数量和计算量。<span style="color:red">(附加题&加分题)</span>

In [4]:
from paddle.vision.transforms import Compose, Normalize

transform = Compose([Normalize(mean=[127.5],
                               std=[127.5],
                               data_format='CHW')])
# 使用transform对数据集做归一化
print('download training data and load training data')
train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=transform)
test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)
print('load finished')

download training data and load training data
load finished


In [5]:
import paddle
import paddle.nn.functional as F
class LinearNet(paddle.nn.Layer):
    def __init__(self):
        super(LinearNet, self).__init__()
        self.linear1 = paddle.nn.Linear(in_features=28*28, out_features=1500)
        self.linear2 = paddle.nn.Linear(in_features=1500, out_features=500)
        self.linear3 = paddle.nn.Linear(in_features=500, out_features=10)

    def forward(self, x):
        batch_size = x.shape[0]
        x = x.reshape([batch_size, -1])
        x = self.linear1(x)
        x = F.leaky_relu(x)
        x = self.linear2(x)
        x = F.leaky_relu(x)
        x = self.linear3(x)
        return x

In [6]:
from paddle.metric import Accuracy
model = paddle.Model(LinearNet())   # 用Model封装模型
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())

# 配置模型
model.prepare(
    optim,
    paddle.nn.CrossEntropyLoss(),
    Accuracy()
    )

In [7]:
# 训练模型
model.fit(train_dataset,
        epochs=3,
        batch_size=64,
        verbose=1
        )

The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/3
Epoch 2/3
Epoch 3/3


In [8]:
model.evaluate(test_dataset, batch_size=64, verbose=1)

Eval begin...
Eval samples: 10000


{'loss': [0.0007729138], 'acc': 0.968}

In [9]:
model.summary()

---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
   Linear-1         [[64, 784]]           [64, 1500]         1,177,500   
   Linear-2         [[64, 1500]]          [64, 500]           750,500    
   Linear-3         [[64, 500]]            [64, 10]            5,010     
Total params: 1,933,010
Trainable params: 1,933,010
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 0.98
Params size (MB): 7.37
Estimated Total Size (MB): 8.55
---------------------------------------------------------------------------



{'total_params': 1933010, 'trainable_params': 1933010}

模型参数量为1933010.

###  3.小明搭建了一个10层的前馈神经网络，激活函数为ReLU，学习率为2.0，但是发现模型不收敛，可能是什么原因？请帮忙分析。<span style="color:red">(附加题&简答题&加分题)</span>

+ 可能是**死亡ReLU问题**造成的，学习率过大会造成更新权重时，权重更新过多，一直小于0，总得不到更新。

+ 网络可能过深，参数无法得到充分的学习。

+ 解决方法：适当调整网络结构，调整学习率和优化器，也可以换成如LeakyReLU、PReLU等改进的激活函数。
