# dropout
之前使用l2_penalty来正则化统计模型，当面对更多的特征而样本不足的时候，我们的线性模型往往会过拟合，如果我们能够给出更多样本，那么就不会 过拟合。泛化性和灵活性这种的基本权衡被称为方差-偏差权衡（bias-variance tradeoff），线性模型具有很高的偏差，他们只能够表示一小类函数，但是方差很低；神经网络属于另一端，偏差小，方差大。

## 什么是好的模型??
我们可以说好的模型在未知的数据集上面往往有很好的表现，经典的泛化理论认为，为了缩小训练和测试性能之间的差距，我们应该以简单的模型为目标。

简单性的另外一个角度是平滑性，我们的函数不应该对输入的微小变化敏感。

## 扰动的稳健性
在训练过程中，在计算后续层之前向网络的每一层注入噪声。

这个想法被称为**暂退法(Dropout)**。暂退法在前向传播的过程中，计算每一内部层的同时注入噪声，这是训练神经网络常用的技术，之所以被称为暂退法，是因为从表面上看我们是在训练过程中随机丢弃（Dropout）一些神经元。

In [11]:
#CODE
import torch as t
import torch.nn as nn
from pltutils import *
DEVICE = t.device("cuda:0" if t.cuda.is_available() else "cpu")


In [12]:
def dropout_layer(X:t.Tensor,dropout):
    assert 0<=dropout<=1
    if dropout==1:
        return t.zeros_like(X)
    if dropout==0:
        return X
    mask  = (t.rand(X.shape)>dropout).float()
    return mask.to(DEVICE)*X/(1.0-dropout)

In [13]:
# TEST of dropout_layer func

X=t.arange(16,dtype=t.float32,device=DEVICE)
print(X)
print(dropout_layer(X,0))
print(dropout_layer(X, .5))

print(dropout_layer(X, 1))


tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
        14., 15.], device='cuda:0')
tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
        14., 15.], device='cuda:0')
tensor([ 0.,  0.,  0.,  0.,  8.,  0., 12., 14., 16.,  0., 20., 22., 24.,  0.,
        28.,  0.], device='cuda:0')
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       device='cuda:0')


# 定义模型

In [14]:
class NET(nn.Module):
    def __init__(self,n_features,n_outputs,training=True) -> None:
        super().__init__()
        self.training=training
        self.n_features=n_features
        self.fc1=nn.Linear(n_features,256)
        self.fc2=nn.Linear(256,256)
        self.fc3=nn.Linear(256,n_outputs)
        self.relu=nn.ReLU()
    
    def forward(self,X:t.Tensor)->t.Tensor:
        H1=self.relu(self.fc1(X.reshape((-1,self.n_features))))
        if self.training:
            H1=dropout_layer(H1,0.2)
        H2=self.relu(self.fc2(H1))
        if self.training:
            H2=dropout_layer(H2,0.5)
        out=self.fc3(H2)
        return out


def accuracy(y_hat: t.Tensor, y: t.Tensor) -> t.Tensor:
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(dim=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

def train(net, train_iter: data.DataLoader, loss, updater, n_epochs=10):
    if isinstance(net, t.nn.Module):
        net.train()
    for i in range(n_epochs):
        for x, y in train_iter:
            x = x.to(DEVICE)
            y = y.to(DEVICE)
            y_hat = net(x)
            l = loss(y_hat, y)

            if isinstance(updater, t.optim.Optimizer):
                updater.zero_grad()
                l.sum().backward()
                updater.step()
            else:
                l.sum().backward()
                updater(x.shape[0])
            print("ep:{},accuracy:{},loss:{}".format(
                i, accuracy(y_hat, y)/y.shape[0], l.mean().item()))

# entity
net = NET(784,10,True)
net = net.to(DEVICE)
NUM_EPOCHS,LR,BATCH_SIZE=10,0.5,256
loss_func = nn.CrossEntropyLoss()
train_iter,test_iter=load_data_fashion_mnist(BATCH_SIZE,data_root="./dataset")
optimizer= t.optim.SGD(net.parameters(),lr=LR)
train(net,train_iter,loss_func,optimizer,NUM_EPOCHS)




ep:0,accuracy:0.109375,loss:2.3036935329437256
ep:0,accuracy:0.16015625,loss:2.2522637844085693
ep:0,accuracy:0.11328125,loss:2.214073419570923
ep:0,accuracy:0.2265625,loss:2.154238224029541
ep:0,accuracy:0.25390625,loss:2.0781025886535645
ep:0,accuracy:0.296875,loss:1.9953665733337402
ep:0,accuracy:0.3046875,loss:1.8810014724731445
ep:0,accuracy:0.4140625,loss:1.760034441947937
ep:0,accuracy:0.34375,loss:1.8152987957000732
ep:0,accuracy:0.3125,loss:1.957353949546814
ep:0,accuracy:0.32421875,loss:1.6923741102218628
ep:0,accuracy:0.3671875,loss:1.564056396484375
ep:0,accuracy:0.48828125,loss:1.4083831310272217
ep:0,accuracy:0.328125,loss:1.4944429397583008
ep:0,accuracy:0.22265625,loss:2.2791907787323
ep:0,accuracy:0.25390625,loss:2.4022557735443115
ep:0,accuracy:0.35546875,loss:1.982432246208191
ep:0,accuracy:0.3671875,loss:1.7815890312194824
ep:0,accuracy:0.38671875,loss:1.682921290397644
ep:0,accuracy:0.58203125,loss:1.446714162826538
ep:0,accuracy:0.5078125,loss:1.312359094619751
ep

In [15]:

def predict(net, test_iter):
    correct = 0.
    for x, y in test_iter:
        x = x.to(DEVICE)
        y = y.to(DEVICE)
        y_hat = net(x)
        correct += accuracy(y_hat, y)
    correct = correct/10000.  # 10000是测试集的长度
    print("test ACC:{}".format(correct))
    return correct


predict(net, test_iter)


test ACC:0.8482


0.8482