## 丢弃法的概念
在现代神经网络中，我们所指的丢弃法，通常是对输入层或者隐含层做以下操作：

* 随机选择一部分该层的输出作为丢弃元素；
* 把丢弃元素乘以0；
* 把非丢弃元素拉伸。

## 丢弃法的实现
丢弃法的实现很容易，例如像下面这样。这里的标量<font color=red > drop_probability </font>定义了一个<font color=red> X（NDArray类）</font>中任何一个元素被丢弃的概率

In [87]:
from mxnet import ndarray as nd

def dropout(X, drop_probability):
    keep_probability = 1 - drop_probability
    assert 0 <= keep_probability <= 1
    if keep_probability == 0:
        return X.zero_like()
    mask = nd.random.uniform(0, 1.0, X.shape, ctx = X.context) < keep_probability
    # 保证 E[dropout(X)] == X 期望不变
    scale = 1 / keep_probability
    return mask * X * scale

In [88]:
A = nd.arange(20).reshape((5,4))
dropout(A, 0.0)


[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]
 [ 16.  17.  18.  19.]]
<NDArray 5x4 @cpu(0)>

In [89]:
dropout(A, 0.5)


[[  0.   0.   4.   6.]
 [  8.   0.  12.   0.]
 [ 16.  18.  20.  22.]
 [  0.   0.   0.   0.]
 [ 32.   0.   0.   0.]]
<NDArray 5x4 @cpu(0)>

## 丢弃法的本质
了解了丢弃法的概念与实现，那你可能对它的本质产生了好奇。

如果你了解集成学习，你可能知道它在提升弱分类器准确率上的威力。一般来说，在集成学习里，我们可以对训练数据集有放回地采样若干次并分别训练若干个不同的分类器；测试时，把这些分类器的结果集成一下作为最终分类结果。

事实上，丢弃法在模拟集成学习。试想，一个使用了丢弃法的多层神经网络本质上是原始网络的子集（节点和边）.
我们在之前的章节里介绍过随机梯度下降算法：我们在训练神经网络模型时一般随机采样一个批量的训练数据。丢弃法实质上是对每一个这样的数据集分别训练一个原神经网络子集的分类器。与一般的集成学习不同，这里<font color=red>每个原神经网络子集的分类器用的是同一套参数</font>。因此丢弃法只是在模拟集成学习。

我们刚刚强调了，原神经网络子集的分类器在不同的训练数据批量上训练并使用同一套参数。因此，<font color =red>使用丢弃法的神经网络实质上是对输入层和隐含层的参数做了正则化：学到的参数使得原神经网络不同子集在训练数据上都尽可能表现良好</font>。

下面我们动手实现一下在多层神经网络里加丢弃层

In [90]:
"""
1. 数据获取
"""
import sys
sys.path.append('..')
import utils

batch_size = 256
train_data, test_data = utils.load_data_fashion_mnist(batch_size)

In [91]:
"""
2. 定义多层感知机
"""
num_inputs = 28 * 28
num_outputs = 10

num_hidden1 = 256
num_hidden2 = 256
weight_scale = .01

w1 = nd.random_normal(shape=(num_inputs, num_hidden1),scale= weight_scale)
b1 = nd.zeros(num_hidden1)

w2 = nd.random_normal(shape=(num_hidden1, num_hidden2), scale= weight_scale)
b2 = nd.zeros(num_hidden2)

w3 = nd.random_normal(shape=(num_hidden2, num_outputs), scale= weight_scale)
b3 = nd.zeros(num_outputs)

params = [w1, b1, w2, b2, w3,b3]
for para in params:
    para.attach_grad()

In [97]:
"""
3.定义包含丢弃层的模型
我们的模型就是将层（全连接）和激活函数（Relu）串起来，并在应用激活函数后添加丢弃层。每个丢弃层的元素丢弃概率可以分别设置。
一般情况下，我们推荐把更靠近输入层的元素丢弃概率设的更小一点。这个试验中，我们把第一层全连接后的元素丢弃概率设为0.2，
把第二层全连接后的元素丢弃概率设为0.5
"""
drop_prob1 = 0.4
drop_prob2 = 0.8

def net(X):
    X = X.reshape((-1, num_inputs))
    # first fully connected layer
    h1 = nd.relu(nd.dot(X, w1) + b1)
    # add dropout layer to the first layer behind
    if autograd.is_training:
        h1 = dropout(h1, drop_prob1)
    # second fully connected layer
    h2 = nd.relu(nd.dot(h1, w2) + b2)
    # add dropout layer to the second layer behind
    if autograd.is_training:
        h2 = dropout(h2, drop_prob2)
    return nd.dot(h2, w3) + b3

In [99]:
"""
4. 训练
"""
from time import time
from mxnet import autograd
from mxnet import gluon

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
learning_rate = .5
begin = time()
for epoch in range(50):
    train_acc = 0.
    train_loss = 0.    
    for data, label in train_data:
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        utils.SGD(params, learning_rate/batch_size)
        
        train_loss += nd.mean(loss).asscalar()
        train_acc += utils.accuracy(output, label)
    test_acc = utils.evaluate_accuracy(test_data, net)
    print("Epoch %d. Loss: %f, Train acc %f, Test acc %f" % (
        epoch, train_loss/len(train_data),
        train_acc/len(train_data), test_acc))
end = time()
print("Epoch cost time: %f"%(end - begin))

Epoch 0. Loss: 0.233198, Train acc 0.915398, Test acc 0.875100
Epoch 1. Loss: 0.237649, Train acc 0.913929, Test acc 0.863381
Epoch 2. Loss: 0.237582, Train acc 0.914830, Test acc 0.875100
Epoch 3. Loss: 0.228922, Train acc 0.916249, Test acc 0.872196
Epoch 4. Loss: 0.231357, Train acc 0.916116, Test acc 0.871895
Epoch 5. Loss: 0.229443, Train acc 0.916950, Test acc 0.872095
Epoch 6. Loss: 0.235752, Train acc 0.915281, Test acc 0.875100
Epoch 7. Loss: 0.230363, Train acc 0.916283, Test acc 0.874900
Epoch 8. Loss: 0.230902, Train acc 0.919788, Test acc 0.866987
Epoch 9. Loss: 0.230377, Train acc 0.917134, Test acc 0.878005
Epoch 10. Loss: 0.233277, Train acc 0.916834, Test acc 0.878005
Epoch 11. Loss: 0.231997, Train acc 0.917768, Test acc 0.874099
Epoch 12. Loss: 0.227141, Train acc 0.918787, Test acc 0.875401
Epoch 13. Loss: 0.227402, Train acc 0.918069, Test acc 0.874800
Epoch 14. Loss: 0.225577, Train acc 0.918286, Test acc 0.873097
Epoch 15. Loss: 0.230774, Train acc 0.917551, Test

### 未使用dropout 训练50轮的结果
* 在第34轮之后训练出现异常，Loss变成Nan，train_acc 和 test_acc异常


Epoch 26. Loss: 0.225771, Train acc 0.914413, Test acc 0.892127

Epoch 27. Loss: 0.214575, Train acc 0.918570, Test acc 0.893930

Epoch 28. Loss: 0.214624, Train acc 0.919655, Test acc 0.885116

Epoch 29. Loss: 0.216177, Train acc 0.918636, Test acc 0.891426

Epoch 30. Loss: 0.204961, Train acc 0.921875, Test acc 0.888321


Epoch 31. Loss: 0.199285, Train acc 0.924663, Test acc 0.895733

Epoch 32. Loss: 0.197781, Train acc 0.925047, Test acc 0.893229

Epoch 33. Loss: 0.192677, Train acc 0.927367, Test acc 0.892728

Epoch 34. Loss: 91424674556056240308355072.000000, Train acc 0.869541, Test acc 0.100060

Epoch 35. Loss: nan, Train acc 0.100461, Test acc 0.099760

Epoch 36. Loss: nan, Train acc 0.100077, Test acc 0.099760

Epoch 37. Loss: nan, Train acc 0.099977, Test acc 0.099760

Epoch 38. Loss: nan, Train acc 0.099960, Test acc 0.099760

Epoch 48. Loss: nan, Train acc 0.100027, Test acc 0.099760

Epoch 49. Loss: nan, Train acc 0.099993, Test acc 0.099760
### 使用dropout后训练150轮的结果
* 训练精度在持续提高，但测试精度基本稳定不变
Epoch 30. Loss: 0.138990, Train acc 0.946698, Test acc 0.891226

Epoch 36. Loss: 0.134528, Train acc 0.948584, Test acc 0.890224


Epoch 41. Loss: 0.132773, Train acc 0.950087, Test acc 0.890625

Epoch 46. Loss: 0.131675, Train acc 0.949770, Test acc 0.888922

Epoch 47. Loss: 0.128813, Train acc 0.951372, Test acc 0.895032

Epoch 48. Loss: 0.127396, Train acc 0.951973, Test acc 0.894431

Epoch 49. Loss: 0.130470, Train acc 0.951038, Test acc 0.892127