# *Digit Recognition*

#### What is the Digit Recognition?

A standard spit of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.   
It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes to predict.  

#### Describe Neural Network
Most introductory texts to Neural Networks brings up brain analogies when describing them. Without delving into brain analogies, I find it easier to simply describe Neural Networks as a mathematical function that maps a given input to a desired output.  

Neural Networks consist of the following components:  
* An input layer, x
* An arbitrary amount of hidden layers
* An output layer, ŷ
* A set of weights and biases between each layer, W and b
* A choice of activation function for each hidden layer, σ. In this tutorial, we’ll use a Sigmoid activation function.  

The diagram below shows the architecture of a 2-layer Neural Network (note that the input layer is typically excluded when counting the number of layers in a Neural Network)

![image](https://github.com/Tianle97/Emerging-Thecnology-Project/blob/master/WechatIMG116.png?raw=true)  


#### About the MNIST Dataset



MNIST($M$odified $N$ational $I$nstitute of $S$tandards and $T$echnology) is a sub data set of NIST($N$ational $I$nstitute of $S$tandards and $T$echnology), a large database of handwritten digits. MNIST is used to train image processing systems and is basically the "hello world" of machine learning and computer vision.
MNIST contains 60,000 training images and 10,000 testing images. Training images are used to train a system, and testing images are used to test the trained system.The images are grayscale, 28x28 pixels, and centered to reduce preprocessing and get started quicker. The dataset is made up of images of handwritten digits from 0-9 with a scale of 28x28 pixels.

![mnist_dataset](https://cdn-images-1.medium.com/max/1200/1*9Mjoc_J0JR294YwHGXwCeg.jpeg)


### Expain the script code

#### Firstly import the packages

In [2]:
# Import tensorflow
import tensorflow as tf
# Import mnist data from tensorflow
from tensorflow.examples.tutorials.mnist import input_data

# For ignore the warnings .
# Let the resulr look like more better
# For Ignore the warnings
# So we add these 2 libraris for ignore the warnings
import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

### Secondly get the MNIST dataset

store it in current folder, named it 'data'

In [3]:
# A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. 
# In this case, the nth digit will be represented as a vector which is 1 in the nth dimension.
# For example, 3 would be [0,0,0,1,0,0,0,0,0,0]
mnist = input_data.read_data_sets('data',one_hot=True)
# Set batch_size, every time put 100 images for training
batch_size = 100
# Calculate number of batches from data set
n_batch = mnist.train.num_examples // batch_size

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data\train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting data\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


### define two placehoder

In [4]:
#images  784 = 28 * 28 pixels
x = tf.placeholder(tf.float32,[None,784]) 
#labels  10 stands for 0~9
y = tf.placeholder(tf.float32,[None,10])  

## Define a nural network          
#### tf.zeros() method:
tf.zeros(  
      shape,  
      dtype=tf.float32,  
      name=None     
)

#### Args:
* shape: A list of integers, a tuple of integers, or a 1-D Tensor of type int32.
* dtype: The type of an element in the resulting Tensor.
* name: A name for the operation (optional).  
#### Example
tf.zeros([3, 4], tf.int32)  # [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

#### We use tf.Variable() for incoming the initial value. In this example, we initialize both W and b to a zero vector.   
#### W is a 784x10 matrix (because we have 784 features and 10 output values).   
#### b is a 10-dimensional vector (since we have 10 categories).
我们在调用tf.Variable的时候传入初始值。在这个例子里，我们把W和b都初始化为零向量。W是一个784x10的矩阵（因为我们有784个特征和10个输出值）。b是一个10维的向量（因为我们有10个分类）。

#### cross_entropy method:
![image](https://i.stack.imgur.com/GKdbq.png)
where $N$ is the number of samples, k is the number of classes, log is the natural logarithm, $t_i$,$j$ is 1 if sample $i$ is in class $j$ and 0 otherwise, and $p_i$,$j$ is the predicted probability that sample $i$ is in class $j$. To avoid numerical issues with logarithm, clip the predictions to $[10^{−12}, 1 − 10^{−12}]$ range.

In [5]:
# tf.zeros -> initial value is 0
# More details: https://www.tensorflow.org/api_docs/python/tf/zeros
# W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it 
# to produce 10-dimensional vectors of evidence for the difference classes. 
W = tf.Variable(tf.zeros([784,10])) 
# b:bisa, has a shape of [10] so we can add it to the output.
b = tf.Variable(tf.zeros([10])) 

### Category prediction and loss function

### Softmax Introduce

As we all know in the MNIST Dataset each image all present a number, 0-9.  
We want to get the probability that a given picture represents each number. 
我们知道MNIST的每一张图片都表示一个数字，从0到9。我们希望得到给定图片代表每个数字的概率。

For example, our model might speculate that a picture containing 9 represents a probability that the number 9 is 80% but the probability of judging it to be 8 is 5% (because both 8 and 9 have a small circle in the upper part), then give it A lesser probability of representing other numbers.  
比如说，我们的模型可能推测一张包含9的图片代表数字9的概率是80%但是判断它是8的概率是5%（因为8和9都有上半部分的小圆），然后给予它代表其他数字的概率更小的值。

这是一个使用softmax回归（softmax regression）模型的经典案例。softmax模型可以用来给不同的对象分配概率。即使在之后，我们训练更加精细的模型时，最后一步也需要用softmax来分配概率。

现在我们可以实现我们的回归模型了。这只需要一行！我们把向量化后的图片x和权重矩阵W相乘，加上偏置b，然后计算每个分类的softmax概率值。

我们也需要加入一个额外的偏置量（bias），因为输入往往会带有一些无关的干扰量。因此对于给定的输入图片 x 它代表的是数字 i 的证据可以表示为:  
We need to add an extra $bias$ because the input tends to have some extraneous interference. So for a given input picture $x$ it represents the evidence that the number i can be expressed as:

$$evidencei=∑Wi,jxj+bi$$

$Wi$,$j$ represents the $weight$, $bi$ represents the offset of the numeric $i$ class, and $j$ represents the pixel index of the given picture ,$x$ for pixel summation. Then use the softmax function to convert this evidence into a probability $y$:  
其中$Wi$,$j$
 代表$权重$，$bi$
 代表数字, $i$
 类的偏置量，$j$
 代表给定图片,$ x$
 的像素索引用于像素求和。然后用softmax函数可以把这些证据转换成概率 y
：

$$y=softmax(evidence)$$  

这里的softmax可以看成是一个激励（activation）函数或者链接（link）函数，把我们定义的线性函数的输出转换成我们想要的格式，也就是关于10个数字类的概率分布。因此，给定一张图片，它对于每一个数字的吻合度可以被softmax函数转换成为一个概率值。softmax函数可以定义为：  
The softmax here can be thought of as an activation function or a link function that converts the output of our defined linear function into the format we want, that is, the probability distribution of 10 numeric classes. Therefore, given a picture, its fit for each number can be converted to a probability value by the softmax function. The softmax function can be defined as:  
$$ softmax(x)=normalize(exp(x)) $$  

展开等式右边的子式，可以得到：
Finally we can get the formula:  
$$ softmax(x)=exp(xi)∑jexp(xj)  $$  

但是更多的时候把softmax模型函数定义为前一种形式：把输入值当成幂指数

In [6]:
prediction = tf.nn.softmax(tf.matmul(x,W)+b)

### tf.nn.softmax_cross_entropy_with_logits Details:

#### Args:
* _sentinel: Used to prevent positional parameters. Internal, do not use.
* labels: Each vector along the class dimension should hold a valid probability distribution e.g. for the case in which labels are of shape [batch_size, num_classes], each row of labels[i] must be a valid probability distribution.
* logits: Unscaled log probabilities.
* dim: The class dimension. Defaulted to -1 which is the last dimension.
* name: A name for the operation (optional).  

#### Returns:
A Tensor that contains the softmax cross entropy loss. Its type is the same as logits and its shape is the same as labels except that it does not have the last dimension of labels.


In [7]:
# Calculate cost(loss), and minimize loss
# Use  quadratic cost method --fomular (y-mx-b)^2  ---  suit for linear
#     loss = tf.reduce_mean(tf.square(y-prediction))  --- suit for s-shape
# use cross_entropy method: tf.nn.softmax_cross_entropy_with_logits method
# more details: https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



### What is the Adam optimization algorithm?
Adam is an optimization algorithm that can used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data.

Adam was presented by Diederik Kingma from OpenAI and Jimmy Ba from the University of Toronto in their 2015 ICLR paper (poster) titled “Adam: A Method for Stochastic Optimization“. I will quote liberally from their paper in this post, unless stated otherwise.

The algorithm is called Adam. It is not an acronym and is not written as "ADAM".
#### Adam Configuration Parameters
* alpha. Also referred to as the learning rate or step size. The proportion that weights are updated (e.g. 0.001). Larger values (e.g. 0.3) results in faster initial learning before the rate is updated. Smaller values (e.g. 1.0E-5) slow learning right down during training
* beta1. The exponential decay rate for the first moment estimates (e.g. 0.9).
* beta2. The exponential decay rate for the second-moment estimates (e.g. 0.999). This value should be set close to 1.0 on problems with a sparse gradient (e.g. NLP and computer vision problems).
* epsilon. Is a very small number to prevent any division by zero in the implementation (e.g. 10E-8).

In [8]:
# use AdamOptimizer to train
# More details about AdamOptimizer: https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
train_step = tf.train.AdamOptimizer(1e-2).minimize(loss)

In [9]:
# initial glabal variables
init = tf.global_variables_initializer()

In [14]:
# calculate accuracy
# correct return true, otherwise return false
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
# true->1.0   false->0
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

sum_acc = 0

# create loop to train
with tf.Session() as sess:
    sess.run(init)
    for eposh in range(21):
        for batch in range(n_batch):
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step,{x:batch_xs, y:batch_ys})

# calculate accuracy
        acc = sess.run(accuracy,{x:mnist.test.images, y:mnist.test.labels})
        print("Eposh " + str(eposh) + ",Testing Accuracy " + str(acc))
        sum_acc += acc
print("Accuracy: " + str(sum_acc / 21))

Eposh 0,Testing Accuracy 0.9209
Eposh 1,Testing Accuracy 0.9257
Eposh 2,Testing Accuracy 0.9253
Eposh 3,Testing Accuracy 0.9282
Eposh 4,Testing Accuracy 0.9286
Eposh 5,Testing Accuracy 0.9277
Eposh 6,Testing Accuracy 0.9304
Eposh 7,Testing Accuracy 0.9303
Eposh 8,Testing Accuracy 0.932
Eposh 9,Testing Accuracy 0.9309
Eposh 10,Testing Accuracy 0.9328
Eposh 11,Testing Accuracy 0.9305
Eposh 12,Testing Accuracy 0.9326
Eposh 13,Testing Accuracy 0.9336
Eposh 14,Testing Accuracy 0.9306
Eposh 15,Testing Accuracy 0.9316
Eposh 16,Testing Accuracy 0.9309
Eposh 17,Testing Accuracy 0.9315
Eposh 18,Testing Accuracy 0.9269
Eposh 19,Testing Accuracy 0.9316
Eposh 20,Testing Accuracy 0.9311
Accuracy: 0.9296999971071879


## Conclusion 


***
## Resource:
* https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6  
* http://yann.lecun.com/exdb/mnist/  
* https://en.wikipedia.org/wiki/MNIST_database
* https://www.tensorflow.org/api_docs/python/tf/zeros
* https://stackoverflow.com/questions/47377222/cross-entropy-function-python
* https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/

# End