In [9]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# The MNIST data


The MNIST data is comprised of pictures tha trepresent an number, and it includes the number label asociated to each picture. The data set is split into three parts: training (`mnist.train`), testing (`mnist.test`) and validation (`mnist.validation`) data. 

This split is very important: it's essential in machine learning that we have separate data which we don't learn from.

In [10]:
%%capture
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Create the model

Common numerical computing libraries in Python use external code that are implemented in other languages. The latter to take advantage of languages that are more efficient. However, switching back to Python every operation causes an overhead. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner (there is a high cost to transfere data).


TensorFlow also does its heavy lifting outside Python. However, it does not run a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python.

#### What to do here?
For this step. You need to create a TensorFlow graph that represents a Neural Network with no hidden layers, and an output layer comprised of 10 nodes.

In [11]:
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W) + b

# Define cost funtion and optimizer

In order to train the model, we need to define what it means to improve the results after each iteration. For that, we use a cost function an we try to minimize with respect to it. The cost function represents how far we are from our desired outcome. Minizing the error leads us to a better the model.

A common cost function is called "cross-entropy". Cross-entropy takes advantage of large errors and reduces the learning slow-down that is caused because of traditional cost functions (i.e. quadratic cost function). In summary, it will take less to train a good model.

#### What to do here?
Create a tensor to represent the cross-entropy function. After that, create a tensor to represent a `GradientDescentOptimizer` that minimizes the cross-entropy.

In [12]:
cross_entropy = tf.reduce_mean(
  tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Create a Tensorflow session

We already have defined all our model (thus, we created a complete tensorflow graph). Now, we need to launch it. We create an interactive session and initialize all the variables defined before.

#### What to do here?
Create an `InteractiveSession` and run the global variables initializer.

In [13]:
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# Train

MNIST is a large dataset. To train using a batch learning method would take a lot of time in between epochs. Therefore, we will use small batches of random data. This method is called stochastic training.

#### What to do here?
In a for loop, take 100 of random samples from mnist and run the train step using the resulting batches. Repeat the process as many times as needed to, in the end, present all the training dataset.

In [14]:
for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate our model

In order to figure out our model's precision, we need to compare our results with the expected output. To calculate the precision, you need to sum the correct classifications over the size of the testing dataset.

#### What to do here?
Create a Tensor that compares the model's output with the expected output. Then, determine the fraction that are correct.

In [7]:
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

0.9207


# Conclusion

During the excercise we learned how to create a neural network and train it. We tested the performance of our model, and we found that the network arquitecture definition is important to obtain better results.

We learned the importance of the initialilzation step (random numbers instead of zeroes), and how it impacts performance. Finally, it was showed that activation functions impact in performance too.