# TensorFlow 识别手写数字

author：潘伟洲

## 手写识别问题

<center>
![](./images/mnistdigits.gif)
</center>

## 机器学习与“炼丹”

<div style="float:right">
![](images/muli.jpeg)
<center>MXNet作者 李沐</center>
</div>

* 灵材——训练数据；
* 丹方——训练模型；
* 真火——硬件；
* 丹炉——框架；
* 炼制——训练过程。

## 灵材——MNIST

* Mixed National Institude of Standards and Technology database
* http://yann.lecun.com/exdb/mnist/
* Four files are available on this site:
  - train-images-idx3-ubyte.gz:  training set images (9912422 bytes) 
  - train-labels-idx1-ubyte.gz:  training set labels (28881 bytes) 
  - t10k-images-idx3-ubyte.gz:   test set images (1648877 bytes) 
  - t10k-labels-idx1-ubyte.gz:   test set labels (4542 bytes)

## 认识 MNIST

<center>![MNIST手写数字图片示例](images/MNIST.png)MNIST手写数字图片示例</center>

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

print(mnist.train.images.shape, mnist.train.labels.shape)
print(mnist.test.images.shape, mnist.test.labels.shape)
print(mnist.validation.images.shape, mnist.validation.labels.shape)

<center>![MNIST训练数据的特征](images/mnist-train-xs.png)MNIST训练数据的特征</center>

<br/>

<center>![MNIST训练数据的Label](images/mnist-train-ys.png)MNIST训练数据的Label</center>

In [None]:
print(mnist.train.images[0])

<center>![手写数字灰度信息示例](images/MNIST-Matrix.png)手写数字灰度信息示例</center>

## 丹方——选择什么样的机器学习模型？

* 分类？
* 聚类？

<center><img src="images/models.png"/></center>

## 多分类模型——Softmax Regression

* one vs all
* 将可以判定为某类的特征相加，然后将这些特征转化为判定是这一类的概率
* 特征：所有像素的灰度值与分类的对应关系的加权。

<center>![](images/softmax-weights.png)</center>

### 特征的数学化表达

$$evidence_i=\sum_{j}W_{i,j}x_{j}+b_{i}$$

* $i$ ：第 $i$ 类；
* $j$ ：一张图片的第 j 个像素。
* $b_i$：bias（倾向）；
* $W_i$：权重

计算 Softmax：

$$softmax(x) = normalize(exp(x))$$

### Softmax Regression的流程示意图

<center>![Softmax Regression的流程](images/softmax-regression-scalargraph.png)Softmax Regression的流程</center>

### Softmax Regression 元素乘法示例

<center>![](images/softmax-regression-vectorequation.png)Softmax Regression元素乘法示例</center>

写成公式表达：

$$y=softmax(Wx+b)$$

## 丹炉——Tensorflow

<center>![](images/tensorflow.png)</center>

## 回顾TensorFlow的线性回归例子

In [None]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, name='x')
linear_model = W * x + b
y = tf.placeholder(tf.float32, name='y')

# define loss
loss = tf.reduce_sum(tf.square(linear_model - y))

# define training task
train = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# training
for i in range(1000):
    sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]})

# print result
print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

sess.close()

## 开始炼丹 —— 使用 TensorFlow 识别手写

In [None]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

model = tf.nn.softmax(tf.matmul(x, W) + b)

# define loss

# define training task

# training

# print result

sess.close()

## 损失函数——cross-entropy

* 对于多分类问题，通常使用 cross-entropy 作为 loss-function
* 最早出自信息论（Information Theory）中的信息熵
* 定义：$$H_{y'}(y)=-\sum_{i}y'_{i}log(y_i)$$
  * $y$：概率分布，对应代码中的 `model`
  * $y'$：真实的概率分布，对应代码中的 `y`

In [None]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(model), reduction_indices=[1]))

In [51]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

model = tf.nn.softmax(tf.matmul(x, W) + b)

# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(model), reduction_indices=[1]))

# define training task

# training

# print result

sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### 训练 —— 分批进行

* 每次训练都使用全部样本，计算量太大，有时也不容易跳出局部最优
* 分批进行训练，收敛速度更快

In [None]:
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y: batch_ys})

In [54]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]), name='W')
b = tf.Variable(tf.zeros([10]), name='b')

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784], name='x')
y = tf.placeholder(tf.float32, [None, 10], name='y')

model = tf.nn.softmax(tf.matmul(x, W) + b)

# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(model), reduction_indices=[1]))

# define training task
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y: batch_ys})

# print result


sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### 结果评估

In [None]:
correct_prediction = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

* `tf.argmax`：从一个 tensor 中寻找最大值的序号
* `tf.equal`: 判断预测的类别是否正确

In [60]:
# import numpy and tensorflow
import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# define session
sess = tf.Session()

# define tensors and flow
W = tf.Variable(tf.zeros([784, 10]), name='W')
b = tf.Variable(tf.zeros([10]), name='b')

init = tf.global_variables_initializer()
sess.run(init)

x = tf.placeholder(tf.float32, [None, 784], name='x')
y = tf.placeholder(tf.float32, [None, 10], name='y')

model = tf.nn.softmax(tf.matmul(x, W) + b)

# define loss
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(model), reduction_indices=[1]))

# define training task
train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train, {x: batch_xs, y: batch_ys})

# print result
correct_prediction = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, ({x: mnist.test.images, y: mnist.test.labels})))

sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.9174
