# TensorFlow 识别手写数字

* 分享人：潘伟洲
* 参考资料：
  - [MNIST for ML Beginners](https://www.tensorflow.org/get_started/mnist/beginners)
  - 《TensorFlow实战》黄文坚、唐源 著

## 手写识别问题

<center>
<img src="http://onmw7y6f4.bkt.clouddn.com/mnistdigits.gif"/>
</center>

## 机器学习与“炼丹”

<div style="float:right">
<img src="http://onmw7y6f4.bkt.clouddn.com/muli.jpeg"/>
<center>MXNet作者 李沐</center>
</div>

* 灵材——训练数据；
* 丹方——训练模型；
* 真火——硬件；
* 丹炉——框架；
* 炼制——训练过程。

## 灵材——MNIST

<div style="float:right">
<img src="http://onmw7y6f4.bkt.clouddn.com/lecun.jpg"/>
<center>Yan Lecun（燕乐存）</center>
</div>
* 需求来源：自动识别银行支票
* Mixed National Institude of Standards and Technology database
* http://yann.lecun.com/exdb/mnist/
* Four files:
  - train-images-idx3-ubyte.gz:  training set images (9912422 bytes) 
  - train-labels-idx1-ubyte.gz:  training set labels (28881 bytes) 
  - t10k-images-idx3-ubyte.gz:   test set images (1648877 bytes) 
  - t10k-labels-idx1-ubyte.gz:   test set labels (4542 bytes)

## 认识 MNIST

<center><img src="http://onmw7y6f4.bkt.clouddn.com/MNIST.png"/>MNIST手写数字图片示例</center>

In [160]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

print(mnist.train.images.shape, mnist.train.labels.shape)
print(mnist.test.images.shape, mnist.test.labels.shape)
print(mnist.validation.images.shape, mnist.validation.labels.shape)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
((55000, 784), (55000, 10))
((10000, 784), (10000, 10))
((5000, 784), (5000, 10))


<center><img src="http://onmw7y6f4.bkt.clouddn.com/mnist-train-xs.png"/>MNIST训练数据的特征</center>

<br/>

<center><img src="http://onmw7y6f4.bkt.clouddn.com/mnist-train-ys.png"/>MNIST训练数据的Label</center>

In [161]:
print(mnist.train.images[0])

[ 0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.

<center><img src="http://onmw7y6f4.bkt.clouddn.com/MNIST-Matrix.png"/>手写数字灰度信息示例</center>

## 丹方——选择什么样的机器学习模型？

* 分类？
* 聚类？

<center><img src="http://onmw7y6f4.bkt.clouddn.com/models.png"/></center>

## 多分类模型——Softmax Regression

* one vs all
* 将可以判定为某类的特征相加，然后将这些特征转化为判定是这一类的概率
* 特征：所有像素的灰度值与分类的对应关系的加权。

<center><img src="http://onmw7y6f4.bkt.clouddn.com/softmax-weights.png"/>不同数字可能对应的特征权重</center>


### 特征的数学化表达

$$evidence_i=\sum_{j}W_{i,j}x_{j}+b_{i}$$
* $i$ ：第 $i$ 类；
* $j$ ：一张图片的第 j 个像素。
* $b_i$：bias（倾向）；
* $W_i$：权重

Softmax——综合每个像素点的 evidence 得到期望值，再进行标准化：

$$softmax(x) = normalize(exp(x))$$

判定为第 $i$ 类的概率：

$$y_i = softmax(x)_{i}=\frac{exp(x_{i})}{\sum_{j}exp(x_j)}$$

### Softmax Regression的流程示意图

<center><img src="http://onmw7y6f4.bkt.clouddn.com/softmax-regression-scalargraph.png"/>Softmax Regression的流程</center>

### Softmax Regression 元素乘法示例

<center><img src="http://onmw7y6f4.bkt.clouddn.com/softmax-regression-vectorequation.png"/>Softmax Regression元素乘法示例</center>

写成公式表达：

$$y=softmax(Wx+b)$$

## 丹炉——Tensorflow

<center><img src="http://onmw7y6f4.bkt.clouddn.com/tensorflow.png"/></center>

## 回顾TensorFlow的线性回归例子

线性回归 $y = W*x + b$

* x：[1, 2, 3, 4]
* y：[0, -1, -2, -3]

In [167]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

# define session(InteractiveSession)
sess = tf.Session()

# define tensors and flow
W = tf.Variable([.3] ,tf.float32)
b = tf.Variable([-.3], tf.float32)

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

model = W * x + b

# define loss
loss = tf.reduce_sum(tf.square(y - model))

# define training task
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# training
for i in range(1000):
    sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]})

# print result
print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))
print(sess.run((W, b)))

sess.close()

5.69997e-11
(array([-0.9999969], dtype=float32), array([ 0.99999082], dtype=float32))


### 可视化Graph

In [168]:
import numpy as np
import tensorflow as tf
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def


def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script src="http://7xj89i.com1.z0.glb.clouddn.com/platform.js"></script>
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
        """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

graph = tf.Graph()
with graph.as_default():
    sess = tf.InteractiveSession(graph=graph)
    W = tf.Variable([.3], tf.float32, name='W')
    b = tf.Variable([-.3], tf.float32, name='b')

    sess = tf.Session()
    init = tf.global_variables_initializer()
    sess.run(init)

    x = tf.placeholder(tf.float32, name='x')
    y = tf.placeholder(tf.float32, name='y')
    linear_module = W * x + b

    square_delta = tf.square(linear_module - y)
    loss = tf.reduce_sum(square_delta)

    optimizer = tf.train.GradientDescentOptimizer(0.01)
    train = optimizer.minimize(loss)

    for i in range(1000):
        sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}) 
    
    show_graph(graph)

## TensorFlow 训练神经网络的4个步骤

1. 定义算法公式，即神经网络 forward 时的计算
2. 定义损失函数 loss，选定优化器，并指定优化器优化 loss
3. 迭代训练
4. 准确率评估

## 开始炼丹 —— 使用 TensorFlow 识别手写

In [172]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784])
y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10])

# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# define training task
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y_: batch_ys})


# print result
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print sess.run(accuracy, {x: mnist.test.images, y_: mnist.test.labels})

sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.919


## 损失函数——cross-entropy

* 对于多分类问题，通常使用 cross-entropy 作为 loss-function
* 最早出自信息论（Information Theory）中的信息熵
* 定义：$$H_{y'}(y)=-\sum_{i}y'_{i}log(y_i)$$
  * $y$：概率分布，对应代码中的 `y`
  * $y'$：真实的概率分布，对应代码中的 `y_`

In [None]:
# reduce_mean 用来对每个 batch 求均值
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

### 训练 —— 分批进行

* 每次训练都使用全部样本，计算量太大，有时也不容易跳出局部最优
* 分批进行训练，收敛速度更快

In [None]:
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y_: batch_ys})

### 结果评估

In [None]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

* `tf.argmax`：从一个 tensor 中寻找最大值的序号
* `tf.equal`: 判断预测的类别是否正确

### 完整代码

In [None]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]), name='W')
b = tf.Variable(tf.zeros([10]), name='b')

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784], name='x')
y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10], name='y')


# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# define training task
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y_: batch_ys})

# print result
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, ({x: mnist.test.images, y_: mnist.test.labels})))

sess.close()

### 可视化Graph

In [173]:
import numpy as np
import tensorflow as tf
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def


def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script src="http://7xj89i.com1.z0.glb.clouddn.com/platform.js"></script>
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
        """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

graph = tf.Graph()
with graph.as_default():
    sess = tf.InteractiveSession(graph=graph)
    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

    # define session
    sess = tf.Session()

    # define tensors and flow
    W = tf.Variable(tf.zeros([784, 10]), name='W')
    b = tf.Variable(tf.zeros([10]), name='b')

    init = tf.global_variables_initializer()
    sess.run(init)

    x = tf.placeholder(tf.float32, [None, 784], name='x')
    y = tf.nn.softmax(tf.matmul(x, W) + b)

    y_ = tf.placeholder(tf.float32, [None, 10], name='y')


    # define loss
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

    
    # define training task
    train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

    # training
    for i in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        sess.run(train, {x: batch_xs, y_: batch_ys})

    # print result
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(sess.run(accuracy, ({x: mnist.test.images, y_: mnist.test.labels})))
    
    show_graph(graph)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.9183


## 进一步优化——多层感知器

* 92% 的准确率对于手写数字识别而言并不好！
* 回顾我们的灵方：

<center><img src="http://onmw7y6f4.bkt.clouddn.com/softmax-regression-scalargraph.png"/>没有隐层的Softmax Regression</center>

* 没有隐含层的神经网络是线性的，无法解决XOR的问题：

<center><img src="http://hahack.com/images/ann2/HmF6c.png"/></center>

* 加入隐层并使用非线性的激活函数（如 Sigmoid 、Relu）后，可以形成凸域划分：

<center><img src="http://hahack.com/images/ann2/XhmO7.png"/></center>

### 新的灵方

  - 增加一层隐层
    - 计算公式 $y = relu(W_{1}x+b_{1})$
  - 非线性激活函数
 
    - Sigmoid：常用于输出层
    - Relu：常用于隐层
  - Dropout（减轻过拟合）
  - Adagrad（自适应学习速率）
  
  <img src="http://hahack.com/images/ann1/BGCSG.png" height="400" width="400"/>

  <img src="http://onmw7y6f4.bkt.clouddn.com/sigmoid.png" height="400" width="400"/><img src="http://onmw7y6f4.bkt.clouddn.com/relu.png" height="400" width="400" />


### 实现

我们直接在上面的代码中修改：

In [None]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.InteractiveSession()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]), name='W')
b = tf.Variable(tf.zeros([10]), name='b')

tf.global_variables_initializer().run()

x = tf.placeholder(tf.float32, [None, 784], name='x')
y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10], name='y')

# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# define training task
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train.run({x: batch_xs, y_: batch_ys})

# print result
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels}))

* 设置 Variable 并进行初始化：

In [None]:
in_units = 784 # 输入节点数
h1_units = 300 # 隐层的输出节点数
# 隐层的权重的偏置
W1 = tf.Variable(tf.truncated_normal([in_units, h1_units], stddev=0.1))
b1 = tf.Variable(tf.zeros([h1_units]))
# 输出层的Softmax
W2 = tf.Variable(tf.zeros([h1_units, 10]))
b2 = tf.Variable(tf.zeros([10]))

* 定义 placeholder：

In [None]:
x = tf.placeholder(tf.float32, [None, in_units])
keep_prob = tf.placeholder(tf.float32) # Dropout 的比率

* 定义隐层模型结构

```
tf.nn.relu(tf.matmul(x, W1) + b1)
```



In [None]:
hidden1 = tf.nn.relu(tf.matmul(x, W1) + b1) # 隐层
hidden1_drop = tf.nn.dropout(hidden1, keep_prob) # Dropout(随机将一些点置0)
y = tf.nn.softmax(tf.matmul(hidden1_drop, W2) + b2) # 输出层

* 定义损失函数和，选择优化器来优化 loss

  - 损失函数：交叉信息熵（cross-entropy）
  - 优化器：自适应优化器（Adagrad），速率 0.3

In [None]:
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),
                                              reduction_indices=[1]))
train_step = tf.train.AdagradOptimizer(0.3).minimize(cross_entropy)

* 训练
  - 加入了 `keep_prob` 作为计算图的输入，在训练时抛弃 25% 的节点
  - 3000 个 batch，每个 batch 包含 100 条样本，一共 30 万 的样本
  - 5 轮迭代

In [None]:
tf.global_variables_initializer().run()
for i in range(3000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_step.run({x: batch_xs, y_:batch_ys, keep_prob: 0.75})

* 准确率评估

In [None]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

### 完整代码

In [174]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.InteractiveSession()

in_units = 784 # 输入节点数
h1_units = 300 # 隐层的输出节点数
# 隐层的权重的偏置
W1 = tf.Variable(tf.truncated_normal([in_units, h1_units], stddev=0.1))
b1 = tf.Variable(tf.zeros([h1_units]))
# 输出层的Softmax
W2 = tf.Variable(tf.zeros([h1_units, 10]))
b2 = tf.Variable(tf.zeros([10]))

x = tf.placeholder(tf.float32, [None, in_units])
keep_prob = tf.placeholder(tf.float32) # Dropout 的比率

hidden1 = tf.nn.relu(tf.matmul(x, W1) + b1) # 隐层
hidden1_drop = tf.nn.dropout(hidden1, keep_prob) # Dropout(随机将一些点置0)

y = tf.nn.softmax(tf.matmul(hidden1_drop, W2) + b2) # 输出层
y_ = tf.placeholder(tf.float32, [None, 10])


# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), 
                                              reduction_indices=[1]))

# define training task
train = tf.train.AdagradOptimizer(0.3).minimize(cross_entropy)

tf.global_variables_initializer().run()
# training
for i in range(3000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train.run({x: batch_xs, y_: batch_ys, keep_prob: 0.75})

# print result
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(accuracy.eval({x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.9798


### 可视化

In [175]:
import numpy as np
import tensorflow as tf
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def


def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script src="http://7xj89i.com1.z0.glb.clouddn.com/platform.js"></script>
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
        """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

graph = tf.Graph()
with graph.as_default():
    sess = tf.InteractiveSession(graph=graph)
    in_units = 784 # 输入节点数
    h1_units = 300 # 隐层的输出节点数
    # 隐层的权重的偏置
    W1 = tf.Variable(tf.truncated_normal([in_units, h1_units], stddev=0.1), name='W1')
    b1 = tf.Variable(tf.zeros([h1_units]), name='b1')
    # 输出层的Softmax
    W2 = tf.Variable(tf.zeros([h1_units, 10]), name='W2')
    b2 = tf.Variable(tf.zeros([10]), name='b2')

    x = tf.placeholder(tf.float32, [None, in_units], name='x')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob') # Dropout 的比率

    hidden1 = tf.nn.relu(tf.matmul(x, W1) + b1, name=u'hidden1') # 隐层
    hidden1_drop = tf.nn.dropout(hidden1, keep_prob) # Dropout(随机将一些点置0)

    y = tf.nn.softmax(tf.matmul(hidden1_drop, W2) + b2, name='y') # 输出层
    y_ = tf.placeholder(tf.float32, [None, 10], name='y_')


    # define loss
    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), 
                                              reduction_indices=[1]))

    # define training task
    train = tf.train.AdagradOptimizer(0.3).minimize(cross_entropy)

    tf.global_variables_initializer().run()
    
    # training
    for i in range(3000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        train.run({x: batch_xs, y_: batch_ys, keep_prob: 0.75})

    # print result
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    show_graph(graph)

## 总结 

* 只加了一层隐层，调整少量代码，准确率提高到 98% ！

<center><img src="http://hahack.com/images/ann2/XhmO7.png"/></center>

* 后续进阶：
  - 引入卷积层、池化层（LeNet CNN）：99%
  - State of Art：99.8%