<a href="https://colab.research.google.com/github/Mrcold2002/colab_code/blob/main/%E5%A4%9A%E5%B1%82%E6%84%9F%E7%9F%A5%E5%99%A8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#多层感知器

## 隐藏层

$H=XW_h+b_h$

$O=HW_o+b_o$

联立得 $O=XW_hW_o+b_hW_o+b_o$,相当于一个单层神经网络权重层$W_hW_o$,$b_hW_o+b_o$,即便添加更多隐含层，多层感知器的设计与仅含输出层的单层神经网络等价

上述问题根源是全连接层只对数据做仿射变化，而多个仿射变换的叠加仍是一个仿射变换。解决问题的一个方法是引入非线性变化，例如对隐藏变量使用按元素运算的非线性函数进行变换，然后再作为下一个连接层的输入，这个非线性函数称为激活函数，下面介绍几个常用激活函数

## 激活函数

1. ReLU函数
$ReLU(x)=max(x,0)$
2. sigmoid函数
$sigmoid(x)=\frac{1}{1+exp(-x)}$
3. tanh函数
$tanh(x)=\frac{1-exp(-2x)}{1+exp(-2x)}$

## 多层感知器

多层感知器是含有至少一个隐藏层的一个有全连接层组成的神经网络，且每个颖仓曾的输出通过激活函数进行变换。多层感知器的层数和各隐含层中的隐藏单元个数都是超参数。
多层感知器按以下方式进行输出$$H=\phi(XW_h+b_h)$$$$O=HW_o+b_o$$
其中$\phi$表示激活函数。

在分类问题中，我们可以对输出O做softmax运算，并用softmax回归中的交叉熵损失函数，
在回归问题中，我们将输出层的输出个数设为1，并将输出O直接提供给线性回归中的平方损失函数。


## 多层感知器从零开始实现

In [4]:
# 0 导入包和模块
%matplotlib inline
import d2lzh as d2l
from mxnet import nd
from mxnet.gluon import loss as gloss,nn

In [5]:
# 1 读取数据集
batch_size=256
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size)

Downloading /root/.mxnet/datasets/fashion-mnist/train-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-images-idx3-ubyte.gz...
Downloading /root/.mxnet/datasets/fashion-mnist/train-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz...
Downloading /root/.mxnet/datasets/fashion-mnist/t10k-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/t10k-images-idx3-ubyte.gz...
Downloading /root/.mxnet/datasets/fashion-mnist/t10k-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/t10k-labels-idx1-ubyte.gz...


In [14]:
# 2 定义参数模型
num_inputs,num_outputs,num_hiddens=784,10,512
W1=nd.random.normal(scale=0.01,shape=(num_inputs,num_hiddens))
b1=nd.zeros(num_hiddens)
W2=nd.random.normal(scale=0.01,shape=(num_hiddens,num_outputs))
b2=nd.zeros(num_outputs)
params=[W1,b1,W2,b2]

for param in params:
  param.attach_grad()#标识要获取梯度

In [7]:
# 3 定义激活函数
def relu(X):
  return nd.maximum(X,0)

In [9]:
# 4 定义模型
def net(X):
  X=X.reshape((-1,num_inputs))
  H=relu(nd.dot(X,W1)+b1)
  return nd.dot(H,W2)+b2

In [10]:
# 5 定义损失函数
loss=gloss.SoftmaxCrossEntropyLoss()

In [15]:
# 6 训练模型
num_epochs,lr=5,0.5
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,params,lr)

epoch 1, loss 0.8164, train acc 0.699, test acc 0.812
epoch 2, loss 0.4903, train acc 0.821, test acc 0.845
epoch 3, loss 0.4295, train acc 0.841, test acc 0.858
epoch 4, loss 0.3967, train acc 0.854, test acc 0.867
epoch 5, loss 0.3682, train acc 0.864, test acc 0.869


hiddens=256

epoch 1, loss 0.8165, train acc 0.690, test acc 0.827

epoch 2, loss 0.4939, train acc 0.816, test acc 0.841

epoch 3, loss 0.4284, train acc 0.841, test acc 0.852

epoch 4, loss 0.3961, train acc 0.854, test acc 0.862

epoch 5, loss 0.3725, train acc 0.862, test acc 0.875

hiddens=512

epoch 1, loss 0.8164, train acc 0.699, test acc 0.812

epoch 2, loss 0.4903, train acc 0.821, test acc 0.845

epoch 3, loss 0.4295, train acc 0.841, test acc 0.858

epoch 4, loss 0.3967, train acc 0.854, test acc 0.867

epoch 5, loss 0.3682, train acc 0.864, test acc 0.869

并无过大差距

# 多层感知器的简洁实现

In [17]:
import d2lzh as d2l
from mxnet import gluon,init
from mxnet.gluon import loss as gloss,nn

In [18]:
# 1 定义模型
net =nn.Sequential()
net.add(nn.Dense(256,activation='relu'),nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))

In [20]:
# 2 训练模型
batch_size=256
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size)
loss=gloss.SoftmaxCrossEntropyLoss()
trainer=gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':0.5})
num_epochs=5
d2l.train_ch3(net,train_iter,test_iter,loss,num_epochs,batch_size,None,None,trainer)

epoch 1, loss 0.3533, train acc 0.869, test acc 0.856
epoch 2, loss 0.3390, train acc 0.875, test acc 0.881
epoch 3, loss 0.3270, train acc 0.880, test acc 0.883
epoch 4, loss 0.3156, train acc 0.884, test acc 0.887
epoch 5, loss 0.3063, train acc 0.886, test acc 0.884
