# Dropout ``gluon``

Dropout是一种特殊的层，因为它在训练和测试时候的表现不同，``gluon``可以捕获什么时候record计算图而什么时候不去record计算图

In [1]:
import mxnet as mx
import numpy as np

from mxnet import nd
from mxnet import autograd
from mxnet import gluon

import sys
sys.path.append('..')
import utils

In [2]:
ctx = mx.cpu()

## load dataset

In [3]:
batch_size = 128
train_data, test_data = utils.load_dataset(batch_size, data_type='mnist')

## define the model

In [4]:
num_outputs = 10
num_inputs = 784
num_hidden = 256
num_examples = 60000

drop_prob1 = 0.2
drop_prob2 = 0.5

net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Flatten())
    net.add(gluon.nn.Dense(num_hidden, activation='relu'))
    net.add(gluon.nn.Dropout(drop_prob1))
    net.add(gluon.nn.Dense(num_hidden, activation='relu'))
    net.add(gluon.nn.Dropout(drop_prob2))
    net.add(gluon.nn.Dense(10))

net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

## 使用``train_mode``和``predict_mode``

我们可以看到，同一个值经过训练一次后的网络后的值竟然没有改变，这是因为``mxnet``此时知道我们处于``predict_mode``，它不会去训练网络。

In [5]:
for x, _ in train_data:
    x = x.as_in_context(ctx)
    print(x.shape)
    break
print(net(x[0:1]))
print(net(x[0:1]))

(128, 1, 28, 28)

[[ 0.17661531 -0.04132318  0.09641108  0.00084138  0.01041453  0.28043959
  -0.24985664 -0.01252007 -0.27903724 -0.16061771]]
<NDArray 1x10 @cpu(0)>

[[ 0.17661531 -0.04132318  0.09641108  0.00084138  0.01041453  0.28043959
  -0.24985664 -0.01252007 -0.27903724 -0.16061771]]
<NDArray 1x10 @cpu(0)>


我们可以显示的指定这个过程``predict_mode``，以使得``mxnet``知道我们在进行这个过程

In [6]:
with autograd.predict_mode():
    print(net(x[:1]))
    print(net(x[:1]))


[[ 0.17661531 -0.04132318  0.09641108  0.00084138  0.01041453  0.28043959
  -0.24985664 -0.01252007 -0.27903724 -0.16061771]]
<NDArray 1x10 @cpu(0)>

[[ 0.17661531 -0.04132318  0.09641108  0.00084138  0.01041453  0.28043959
  -0.24985664 -0.01252007 -0.27903724 -0.16061771]]
<NDArray 1x10 @cpu(0)>


除非出现可怕的错误，否则你应该看到和以前一样的结果。我们也可以在``train_mode``下运行代码。这将告诉MXNet运行我们的``Block``，这会带来结果的改变

In [7]:
with autograd.train_mode():
    print(net(x[:1]))
    print(net(x[:1]))


[[ 0.01276705 -0.26997188  0.06107473  0.32555851 -0.20866846  0.1562108
  -0.13003586 -0.01198454 -0.35130394 -0.1841678 ]]
<NDArray 1x10 @cpu(0)>

[[ 0.19918826 -0.05950467 -0.07602523  0.03314148  0.20677546  0.48302743
  -0.3097747   0.16385019 -0.18783174 -0.20046568]]
<NDArray 1x10 @cpu(0)>


我们可以通过 ``is_training()`` 方法来判断到底是处在``predict_mode``还是``train_mode``，默认情况下MXNet会处在``predict_mode``，因为我们不是时时刻刻都想去训练一个模型，我们可能只是想看看预测的结果而已。

In [8]:
with autograd.predict_mode():
    print(autograd.is_training())
    print(autograd.is_recording())
    
with autograd.train_mode():
    print(autograd.is_training())
    print(autograd.is_recording())

False
False
True
False


# ``autograd.record()``

当我们训练神经网络时，我们都要``record``我们的``Block``，``record()``的目的是为了构建计算图，而``train()``的目的是为了表明我们正在训练我们的神经网络，二者并不冲突，这两个是高度相关的但是不应该混淆。例如，当我们生成对抗样本的时候(GAN中会讲到)，我们会进行``record``，但模型此时却表现为``predic_mode``。另一方面，即使我们没有``record``，我们依然想要评估模型的``train_mode``行为。

因此。由于``record()``和``train_mode()``是截然不同的，我们如何避免每次训练模型时必须声明两个范围？

In [9]:
with autograd.record():
    with autograd.train_mode():
        yhat = net(x)

``MXNet``的设计是：让``record()``方法默认接受一个参数``train_mode``,即当我们使用`autograd.record()``时，就相当于使用了``autograd.record(train_mode=True)``，我们可以声明``autograd.record(train_mode=False)`来改变这种行为(例如当我们使用GAN的时候)。

In [10]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

In [None]:
learning_rate = .1
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate})

In [None]:
epochs = 10

niter = 0
moving_loss = .0
smoothing_constant = 0.1

for epoch in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record(train_mode=True):
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(batch_size)
        
        niter += 1
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (1 - smoothing_constant) * moving_loss + smoothing_constant * curr_loss
        estimate_loss = moving_loss / (1 - (1-smoothing_constant)**niter)
        
    train_acc = utils.evaluate_accuracy_gluon(train_data, net, ctx)
    test_acc = utils.evaluate_accuracy_gluon(test_data, net, ctx)
    print("Epoch %s, Moving Avg Train loss %s, Train acc %s, Test acc %s." 
          % (epoch, estimate_loss, train_acc, test_acc))

Epoch 0, Moving Avg Train loss 0.342608848283, Train acc 0.9263, Test acc 0.9289.
Epoch 1, Moving Avg Train loss 0.240437685753, Train acc 0.949483333333, Test acc 0.9485.
Epoch 2, Moving Avg Train loss 0.180699554759, Train acc 0.960066666667, Test acc 0.9583.
Epoch 3, Moving Avg Train loss 0.192404750316, Train acc 0.9681, Test acc 0.9655.
Epoch 4, Moving Avg Train loss 0.130126285545, Train acc 0.972416666667, Test acc 0.9683.
Epoch 5, Moving Avg Train loss 0.115063072135, Train acc 0.97595, Test acc 0.9709.
Epoch 6, Moving Avg Train loss 0.0967161048028, Train acc 0.979, Test acc 0.9732.
Epoch 7, Moving Avg Train loss 0.0963559117516, Train acc 0.981433333333, Test acc 0.9739.
