# Introduction

Let's perform classifiction with two Neural Networks on the MNIST dataset.

 - Multi-Layer Perceptron
 - Convolutional Neural Network


# Data

## What is MNIST?

It's a database of handwritten digits that has a training set of 60.000 examples and a test set of 10.000 examples. The digits have been size-normalized and centered in a fixed-size image.

### Import with TensorFlow

We will get the dataset directly from the TensorFlow API, so it really doesn't contain images. It contain the transformed-into-arrays images.

One-Hot encoding will be used for the labels

#### One-Hot
<pre>
Number representation:    5
One-hot encoding:        [5]   [4]    [3]    [2]    [1]    [0]  
Array/vector:             1     0      0      0      0      0   
</pre>

In [1]:
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data',one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


A fast overview of the data

In [10]:
print('For example: The first "image" of the dataset:\n %s' %mnist.train.images[0])


For example: The first "image" of the dataset:
 [ 0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.

## Understanding the imported data

The imported data can be divided as follow:

 - Training (mnist.train): Use the given dataset with inputs and related outputs for training of NN.
      - 55.000 observations
      - mnist.train.images for inputs
      - mnist.train.labels for outputs
 - Validation (mnist.validation): The same as training
      - 5.000 observations
      - mnist.validation.images for inputs
      - mnist.validation.labels for outputs
 - Test (mnist.test): The model does not have access to this informations prior to the test phase. It is used to evaluate the performance and accuracy of the model.
     - 10.000 observations
     - mnist.test.images for inputs
     - mnist.test.labels for outputs

# 1st: Multi-Layer Perceptron

Simple type of Neural Network to perform classification tasks on the MNIST digits dataset.

### Interactive session

Instead of do all the set-up and then execute a session to evaluate tensors and run operations (as usual) we will use and interactive session to create the code and run on the fly.

In [11]:
sess = tf.InteractiveSession()

### Creating Placeholders

Note: The 'shape' argument defines the tensor size by its dimensions

**Placeholder 'X':** Represents the input images
 - Each image is 28 x 28 px -> 784 pixels
 - 1st dimension: Indicates the **batch size** => None (any size)
 - 2nd dimension: Indicates the **number of pixels (features in general)** on a single flattened MNIST image => 784 (28*28)
 
**Placeholder 'Y':** Represents the output labels
 - 10 possible classes (0,1,2,3,4,5,6,7,8,9)
 - 1st dimension: Indicates the **batch size** => None (any size)
 - 2nd dimension: Indicates the number of **possible classes** => 10
 
**dtype:**: In general, use tf.float32. The limitation is that some functions only accepts float32 or float64 representations.

In [21]:
print('Training dataset shape:', mnist.train.images.shape)
numTrainInstances = mnist.train.images.shape[0]
numFeatures = mnist.train.images.shape[1]
print('Num instances: %d \nNum features: %d' %(numTrainInstances,numFeatures))
print('Training dataset labels shape:', mnist.train.labels.shape)
numLabels = mnist.train.labels.shape[1]
print('Num possible classes: %d' %(numLabels))

x = tf.placeholder(dtype=tf.float32,shape=[None,numFeatures],name='x_placeholder')
y_ = tf.placeholder(dtype=tf.float32,shape=[None,numLabels],name='y_gold_placeholder')

Training dataset shape: (55000, 784)
Num instances: 55000 
Num features: 784
Training dataset labels shape: (55000, 10)
Num possible classes: 10


### Weights and Bias

This time, with zeros initialization

In [24]:
#Weight Tensor
W = tf.Variable(tf.zeros(dtype=tf.float32,shape=[numFeatures,numLabels]))
#Bias Tensor
b = tf.Variable(tf.zeros(dtype=tf.float32,shape=[numLabels]))

Initialize variables (Note that we are in an interactive session)

In [26]:
tf.global_variables_initializer().run()

### Adding Weights and Biases to input

Representation of the operations that are: A matrix multiplication between x (inputs) and W (weights) and posterior biases add

<img src="https://ibm.box.com/shared/static/88ksiymk1xkb10rgk0jwr3jw814jbfxo.png" alt="HTML5 Icon" style="width:350px;height:306px;"> 

In [31]:
#Mathematical representation of image above
tf.matmul(x,W) + b;

### Softmax Regression

Softmax is an **activation function** that is normally used in classification problems. 

It "squashes" a K-dimensional vector of arbitrary real values to a K-dimensional vector of real values in the range [0, 1] that add up to 1. It generates a distribution of probabilities for the output, for example:

In [32]:
#SoftMax
y = tf.nn.softmax(tf.matmul(x,W) + b)

Logistic function is used for the binary classification 0/1

Softmax function is a generalized type of logistic function that can output a multiclass categorical probabilty distribution.

### Cost function

It is a function that is used to **minimize the difference between the right answers and estimated outputs** by the Network

---
**But before let's see what the next functions do**

The following code shows an example of cross-entropy for a minibatch of size 2 with 3 possible classes.

In [16]:
import numpy as np

y_gold_test = [[1.0,0.0,0.0],[1.0,0.0,0.0]]


output_test = [[0.9,0.1,0.1],[0.9,0.1,0.1]]
print('Some good predictions give as a cross-entropy of...: %.3f' % np.mean(-np.sum(y_gold_test * np.log(output_test),1)))

output_test = [[0.5,0.2,0.1],[0.4,0.3,0.1]]
print('Some bad predictions give as a cross-entropy of...: %.3f' % np.mean(-np.sum(y_gold_test * np.log(output_test),1)))

Some good predictions give as a cross-entropy of...: 0.105
Some bad predictions give as a cross-entropy of...: 0.805


**reduce_sum** computes the sum of elements of **(y_ * tf.log(y))** across second dimension of the tensor, and **reduce_mean** computes the mean of all elements in the tensor

---

In [34]:
#Cost function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),axis=[1])) 
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

#axis=1 => The outputs, columns (axis=0 are the observations, rows)

### Type of optimization: Gradient Descent

Configure the optimizer. There are several but Gradient Descent will be used.

In [35]:
learning_rate = 0.5
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

### Training batches

Train using minibatch Gradient Descent.

Gradient Descent will find the **global minimum** but it's **computationally expensive** so minibatches will be used.

First, let's take a brief look on the batch

In [62]:
batch = mnist.train.next_batch(50)
# print all dictionaries in the list
batch_inputs = batch[0]
batch_labels = batch[1]

print('Batch inputs shape: ',batch_inputs.shape)
print('The first image on batch:\n %s' % batch_inputs[0])
print('Batch labels shape: ',batch_labels.shape)
print('The label of the first image on batch:\n %s' %batch_labels[0])

Batch inputs shape:  (50, 784)
The first image on batch:
 [ 0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.          0.          0.          0.          0.          0.
  0.          0.  

**Batch summary**

It's a tuple where:
 - batch[0] => Observations
  - batch[0][0] => First observation
 - batch[1] => Labels
  - batch[1][0] => First observation label

### Test

Let's get some results



In [64]:
#Load 50 training examples for each training iteration
batch_size = 50
epochs = 1000
for step in range(epochs):
    #Get batch
    batch = mnist.train.next_batch(batch_size)
    
    #Prepare feed dict => We need to feed [x,y_]
    feed = {x : batch[0], y_ : batch[1]}
    
    #Run the optimizer
    train_step.run(feed_dict=feed)

In [66]:
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

acc = accuracy.eval(feed_dict={x : mnist.test.images, y_ : mnist.test.labels}) * 100

print('The final accuracy for the simple ANN model is: %.002f' % acc)

The final accuracy for the simple ANN model is: 91.35


In [68]:
#End session - Good bye
sess.close()