# Digit Recognition Notebook

___
## [Introduction](https://www.digitalocean.com/community/tutorials/how-to-build-a-neural-network-to-recognize-handwritten-digits-with-tensorflow)
### [Neural networks](http://neuralnetworksanddeeplearning.com)
Neural networks are a method of deep learning used to replicate how a brain functions. The neurons simulated are connected in layers and they have weights that show how they respond to signals that pass through the network.

Neural networks and deep learning are used to recognise handwritten images based on observational data such as MNIST.

___
## [Configuration](https://corochann.com/mnist-dataset-introduction-1138.html)

### Dependencies
They are they key for the handwritten digit recognition program to work.

1. Create a folder called "tensorflow-demo" and access it
2. Libraries with specific versions need to be installed
    * *Image library* - version 1.5.20
    * *Numpy library* - version 1.14.3
    * *Tensorflow library* - version 1.4.0

___
## [MNIST Dataset](https://corochann.com/mnist-dataset-introduction-1138.html)

### About the dataset
The MNIST (Modified National Institute of Standards and Technology) database is a large database of handwritten digits used for "classification", "image recognition" task. It is also often used to compare algorithm performances in research.
The dataset is made up of images of handwritten digits from 0-9 with a scale of 28x28 pixels.
![image](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

### Libraries
**The tensorflow library and MNIST dataset need to be imported in the program**
* ***Code***: 
    import tensorflow as tf 
* ***Code***: 
    from tensorflow.examples.tutorials.mnist import input_data

**stored in a variable - called mnist in this case - and saved in a folder - called MNIST_data (you can change the name of the folder if you wish)**
* ***Code***: 
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 

### Reading the dataset
#### One-hot-encoding
It uses a vector to represent the labels from the MNIST dataset. Because the labels are made up of digits from 0 to 9, our vector that represents those digits also contains 10 values represented in binary code. 

### Recognising the digits
#### How are the digits recognised by the binary code?
The way this works is by having 1 out of the 10 binary values represent the handwritten digit while the other values remain 0.
For example, the following vector [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] represents number 5 as it is in the position of where 5 would be, provided it were a list of decimal numbers from 0-10 .

### Explaining the visual side of the images
In order to represent the images the 28x28 pixels are placed into a 1D vector of 784 pixels. Each of the 784 pixels is stored as a value between 0 and 255 that represents the picture's grayscale. All the pictures in this dataset are black and white, a black pixel represented by 255, a white pixel by 0 and grey shades in between.

### Subsets
#### Dataset has been split into three subsets
1. Training images - 55,000 images
    * ***Code***: 
        n_train = mnist.train.num_examples 
2. Validation images - 5,000 images
    * ***Code***: 
        n_validation = mnist.validation.num_examples
3. Testing images - 10,000 images
    * ***Code***: 
        n_test = mnist.test.num_examples

We can find out the size of the dataset by looking at the num_samples on each of the above subsets and we can split the dataset into sets of images for each subset.

___
## [Neural Network Architecture](https://www.dspguide.com/ch26/2.htm)
### Layers
This neural network is formed in three layers
1. Input layer - the image containing the handwritten image (28x28 pixels)
    * ***Code***: 
        n_input = 784
2. Hidden layer - everything between the input and output
    * ***Code***: 
        n_hidden1 = 512
3. Output layer - the number predicted to the user (between 0 and 9)
    * ***Code***: 
        n_output = 10
    
Each layer consists of one or more nodes. The lines between the nodes are the flow of information. In our case, the information only flows from input to output, but in case of feedback, the information can flow bothways.
![image](https://www.dspguide.com/graphics/F_26_5.gif)

### Hyperparameters
[Hyperparameters](https://www.quora.com/What-are-hyperparameters-in-machine-learning) are constants. <br>
They cannot be changed or learned from the training process because of the "high-level" properties: 
* they are too complex 
* high learning speed

#### The hyperparameters are the following:
1. Learning rate - how much the parameters will adjust after each learning process
    * ***Code***: 
        learning_rate = 1e-4
2. Number of iterations - amount of times we go through training
    * ***Code***: 
        n_iterations = 1000
3. Batch size - amount of training images used
    * ***Code***: 
        batch_size = 128
4. Dropout variable - threshold at which some units are randomly eliminated
    * ***Code***: 
        dropout = 0.5

___
## TensorFlow Graph
A TensorFlow graph has to be set up in order to build the network.  <br>
TensorFlow is a framework that uses tensors - a data structure similar arrays or lists - to define and run computations.

### Placeholders
In this program we will define 3 tensors as placeholders:
* ***Code***: 
    X = tf.placeholder("float", [None, n_input])
* ***Code***: 
    Y = tf.placeholder("float", [None, n_output])
* ***Code***: 
    keep_prob = tf.placeholder(tf.float32) 
    
    The most important parameters are the ones that show the size of the data. <br>
    In this case **None** is any amount of images inputed or outputed with either an **n_input** (28x28 pixel) shape or an **n_output** (10 digits). <br>
    The last tensor controls the dropout rate and it allows us to *train* (dropout = 0.5) and *test* (dropout = 1.0) our probability distribution.

### Weights
The paratemers that will be updated during training are the weight and bias values. These parameters have fixed values representing the location of the network's learning process.

The values will be set to 0 at the start which means that they can adjust in either a positive or negative direction thus making sure that the model's learning ability is improved.
* ***Code***: <br>
    weights = { <br>
    'w1': tf.Variable(tf.truncated_normal([n_input, n_hidden1], stddev=0.1)),<br>
    'w2': tf.Variable(tf.truncated_normal([n_hidden1, n_hidden2], stddev=0.1)),<br>
    ... <br>
    }

### Biases
The values used are small to make sure that the tensors activate at the starting stages.
* ***Code***: <br>
    biases = { <br>
    'b1': tf.Variable(tf.constant(0.1, shape=[n_hidden1])), <br>
    'b2': tf.Variable(tf.constant(0.1, shape=[n_hidden2])), <br>
    ... <br>
    }

### Layers
These are used to manipulate the tensors.
1. Matrix multiplication on the previous layer’s outputs and the current layer’s weights
    * ***Code***: <br>
        layer_1 = tf.add(tf.matmul(X, weights['w1']), biases['b1']) <br>
        layer_2 ...
2. This last hidden layer will apply a dropout operation (using keep_prob = 0.5)
    * ***Code***: <br>
        layer_drop = tf.nn.dropout(layer_3, keep_prob)
3. The last layer is a matrix multiplication between the weights' and bias' outputs
    * ***Code***: <br>
        output_layer = tf.matmul(layer_3, weights['out']) + biases['out']

### Loss function
The lower the loss, the higher the accuracy of the prediction. <br>
[Cross-entropy](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html) is a loss function which measures the performance of a classification model with a probability value between 0 and 1. 
![image](https://ml-cheatsheet.readthedocs.io/en/latest/_images/cross_entropy.png)

[Gradient descent](https://towardsdatascience.com/gradient-descent-in-a-nutshell-eaf8c18212f0) is an optimization algorithm that finds the values of a function's parameters and use them that minimizes the loss function. In other words, it measures the changes in your outputs in function of the changes in your inputs. How close to the local minimum the Gradient Descent is, is determined by the learning rate.
![image](https://cdn-images-1.medium.com/max/1600/0*QwE8M4MupSdqA3M4.png)

[Adam optimizer](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/) is an algorithm used to update network weights based on the data trained

* ***Code***: <br>
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=output_layer)) <br>
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

___
## Train
The MNIST dataset is used to optimize the loss function. Every time the program iterates through the batch of training images from the dataset, the parameters are updated thus reducing loss which in turn results in the digits being predicted more accurately.

### Evaluate the accuracy
First and foremost we have to define our method of evaluating the accuracy. <br>
We can print out the accuracy on mini-batches of data to check the increase in accuracy while having a decrease in loss. <br>

1. Compares which images are predicted correctly by looking at the predictions (Y) and labels (output_layer) and outputs a list of booleans
    * ***Code***:
        correct_pred = tf.equal(tf.argmax(output_layer, 1), tf.argmax(Y, 1)) <br>
    and use this 
2. Cast the above to a list of floats and obtain the accuracy
    * ***Code***: 
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Initialize session
This will be fed training examples of our own in order to run the graph and determine the model's accuracy.
* ***Code***: <br>
    init = tf.global_variables_initializer() <br>
    sess = tf.Session() <br>
    sess.run(init)

There are 4 mains steps to go through in order to minimize the loss function:
1. Propagate values forward through the network
2. Compute the loss
3. Propagate values backward through the network
4. Update the parameters

Now we can run the session and check the accuracy of our program using a dropout rate of 1.0 to ensure all units are active.
* ***Code***: <br>
    test_accuracy = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels, keep_prob:1.0}) <br>
    print("\nTest Accuracy:", test_accuracy)    

___
## Test
Once the images are trained, we can run the dataset and keep track of the amount of images that are predicted accurately. Those images can be used to calculate the accuracy of the program.

### Libraries
In order to be able to test images of your own, you need to import the libraries required
* ***Code***: 
    import numpy as np
* ***Code***:
    from PIL import Image

### Load test image
Now, to check if the prediction works, test it on an image of your own. <br>
**Note:** The image should have a white background and the handwritten number should be black.

* ***Code***:
    img = np.invert(Image.open("test_img.png").convert('L')).ravel()
    
The open function of the reads in the image with a different representation as the one used when reading in the MNIST dataset. <br>
This means that we need to convert the sample image to grayscale using the L parameter and store it as a numpy array. That image is then inverted from white background and black writing to black background and white writing. After that we flatten the array with ravel.

### Prediction
Now that the testing is done we can finally predict our image
* ***Code***: <br>
    prediction = sess.run(tf.argmax(output_layer,1), feed_dict={X: [img]}) <br>
    print ("Prediction for test image:", np.squeeze(prediction))
    
The function np.squeeze is more or less used to convert the prediction from an array to an integer.