## 多层感知机 — 从0开始
前面我们介绍了包括线性回归和多类逻辑回归的数个模型，它们的一个共同点是全是只含有一个输入层，一个输出层。这一节我们将介绍多层神经网络，就是包含至少一个隐含层的网络。

### 数据获取
我们继续使用FashionMNIST数据集。

In [1]:
import sys

sys.path.append('../utils')
import utils

data_dir = '../data/fashion_mnist'
train_images, train_labels, test_images, test_labels = utils.load_data_fashion_mnist(data_dir, one_hot=True)
print train_images.shape
print train_labels.shape

from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet
train_dataset = DataSet(train_images, train_labels, one_hot=True)

Extracting ../data/fashion_mnist/train-images-idx3-ubyte.gz
Extracting ../data/fashion_mnist/train-labels-idx1-ubyte.gz
Extracting ../data/fashion_mnist/t10k-images-idx3-ubyte.gz
Extracting ../data/fashion_mnist/t10k-labels-idx1-ubyte.gz
(60000, 28, 28, 1)
(60000, 10)


## 多层感知机
多层感知机与前面介绍的多类逻辑回归非常类似，主要的区别是我们在输入层和输出层之间插入了一个到多个隐含层。

![image.png](http://zh.gluon.ai/_images/multilayer-perceptron.png)
这里我们定义一个只有一个隐含层的模型，这个隐含层输出256个节点。

In [2]:
import tensorflow as tf

# 输入图片是28*28
num_inputs = 28*28
num_outputs = 10

num_hidden = 256
    
with tf.name_scope('multi_layer_percetron'):
    W1 = tf.Variable(tf.random_normal([num_inputs, num_hidden], mean=0.0, stddev=1.0, seed=None, dtype=tf.float32), name='weights_hidden')
    b1 = tf.Variable(tf.constant(0.1, shape=[num_hidden]), name='bias_hidden')
    
    W2 = tf.Variable(tf.random_normal([num_hidden, num_outputs], mean=0.0, stddev=1.0, seed=None, dtype=tf.float32), name='weights_output')
    b2 = tf.Variable(tf.constant(0.1, shape=[num_outputs]), name='bias_output')
    
params = [W1, b1, W2, b2]

### 激活函数
如果我们就用线性操作符来构造多层神经网络，那么整个模型仍然只是一个线性函数。这是因为

$\hat{y} = X \cdot W_1 \cdot W_2 = X \cdot W_3$

这里$W_3 = W_1 \cdot W_2$。为了让我们的模型可以拟合非线性函数，我们需要在层之间插入非线性的激活函数。这里我们使用ReLU

$\textrm{rel}u(x)=\max(x, 0)$

In [3]:
def relu(X):
    return tf.maximum(X, 0)

### 定义模型
我们的模型就是将层（全连接）和激活函数（Relu）串起来：

In [10]:
def net(X, params):
    X = tf.reshape(X, (-1, num_inputs))
    h1 = relu(tf.matmul(X, params[0]) + params[1])
    output = tf.matmul(h1, params[2]) + params[3]
    return output

### Softmax和交叉熵损失函数
在多类Logistic回归里我们提到分开实现Softmax和交叉熵损失函数可能导致数值不稳定。这里我们直接使用tensorflow提供的函数

### 训练
训练跟之前一样。


In [14]:
import numpy as np

learning_rate = 1e0
max_steps = 10000
batch_size = 256

input_placeholder = tf.placeholder(tf.float32, [None, num_inputs])
gt_placeholder = tf.placeholder(tf.int64, [None, num_outputs])
logits = net(input_placeholder, params)
loss = tf.losses.softmax_cross_entropy(logits=logits,  onehot_labels=gt_placeholder)
acc = utils.accuracy(logits, gt_placeholder)
test_images_reshape = np.reshape(np.squeeze(test_images), (test_images.shape[0], num_inputs))

train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)

for step in range(max_steps):
    data, label = train_dataset.next_batch(batch_size)
    data = np.reshape(data, (batch_size, num_inputs))
    feed_dict = {input_placeholder: data, gt_placeholder: label}
    loss_, acc_, _ = sess.run([loss, acc, train_op], feed_dict=feed_dict)
    if step % 100 == 0:
        print("Batch %d, Loss: %f, Train acc %f " % (step, loss_, acc_))

test_loss_, test_acc_ = sess.run([loss, acc], feed_dict={input_placeholder: test_images_reshape / 255.0, gt_placeholder: test_labels})
print ("Test Loss: %f, Test acc %f " % (test_loss_, test_acc_))



Batch 0, Loss: 231.563156, Train acc 0.074219 
Batch 100, Loss: 1.231710, Train acc 0.613281 
Batch 200, Loss: 1.022023, Train acc 0.640625 
Batch 300, Loss: 0.935736, Train acc 0.613281 
Batch 400, Loss: 0.715150, Train acc 0.683594 
Batch 500, Loss: 1.021528, Train acc 0.683594 
Batch 600, Loss: 0.701485, Train acc 0.753906 
Batch 700, Loss: 0.724109, Train acc 0.703125 
Batch 800, Loss: 0.581032, Train acc 0.765625 
Batch 900, Loss: 0.539920, Train acc 0.792969 
Batch 1000, Loss: 0.602233, Train acc 0.781250 
Batch 1100, Loss: 0.618764, Train acc 0.781250 
Batch 1200, Loss: 0.724624, Train acc 0.785156 
Batch 1300, Loss: 0.522713, Train acc 0.855469 
Batch 1400, Loss: 0.537519, Train acc 0.781250 
Batch 1500, Loss: 0.547305, Train acc 0.796875 
Batch 1600, Loss: 0.590248, Train acc 0.757812 
Batch 1700, Loss: 0.574816, Train acc 0.777344 
Batch 1800, Loss: 0.491907, Train acc 0.789062 
Batch 1900, Loss: 0.605241, Train acc 0.835938 
Batch 2000, Loss: 0.487542, Train acc 0.796875 
Ba