In [None]:
#!pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl

### Hello, Tensor World!
Let’s analyze the Hello World script you ran. For reference, I’ve added the code below.

In [1]:
import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

Hello World!


### Tensor

In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of ```hello_constant = tf.constant('Hello World!')```, ```hello_constant!')```, is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:

In [2]:
# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([ [123,456,789] ]) 
 # C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

### Session

TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process. Let’s take the TensorFlow code you ran and turn that into a graph:
<img src="https://d17h27t6h515a5.cloudfront.net/topher/2016/October/580feadb_session/session.png" alt="" class="index--image--1xyr4" style="height: 312.419px; width: 539px;">
A "TensorFlow Session", as shown above, is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.

In [3]:
with tf.Session() as sess:
    output = sess.run(hello_constant)

The code has already created the tensor, ```hello_constant```, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, sess, using ```tf.Session```,. The ```sess.run()```, function then evaluates the tensor and returns the results.

# MNIST For ML Beginners

This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this faster paced tutorial. Be sure to install TensorFlow before starting either tutorial.

When one learns how to program, there's a tradition that the first thing you do is print "Hello World." Just like programming has Hello World, machine learning has MNIST.

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:
<img style="width:100%" src="https://www.tensorflow.org/images/MNIST.png">

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.

In this tutorial, we're going to train a model to look at images and predict what digits they are. Our goal isn't to train a really elaborate model that achieves state-of-the-art performance -- although we'll give you code to do that later! -- but rather to dip a toe into using TensorFlow. As such, we're going to start with a very simple model, called a Softmax Regression.

The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.

### About this tutorial

This tutorial is an explanation, line by line, of what is happening in the mnist_softmax.py code.

You can use this tutorial in a few different ways, including:

- Copy and paste each code snippet, line by line, into a Python environment as you read through the explanations of each line.
- Run the entire mnist_softmax.py Python file either before or after reading through the explanations, and use this tutorial to understand the lines of code that aren't clear to you.
What we will accomplish in this tutorial:

- Learn about the MNIST data and softmax regressions
- Create a function that is a model for recognizing digits, based on looking at every pixel in the image
- Use Tensorflow to train the model to recognize digits by having it "look" at thousands of examples (and run our first Tensorflow session to do so)
- Check the model's accuracy with our test data

In [None]:
#!pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp27-none-linux_x86_64.whl

The MNIST data is hosted on Yann LeCun's website. If you are copying and pasting in the code from this tutorial, start here with these two lines of code which will download and read in the data automatically:

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Import tensorflow library:

In [2]:
import tensorflow as tf


We describe these interacting operations by manipulating symbolic variables. Let's create one:

In [3]:
x = tf.placeholder(tf.float32, [None, 784])


x isn't a specific value. It's a placeholder, a value that we'll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)

We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable. A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.

In [4]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

We create these Variables by giving tf.Variable the initial value of the Variable: in this case, we initialize both W and b as tensors full of zeros. Since we are going to learn W and b, it doesn't matter very much what they initially are.

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. b has a shape of [10] so we can add it to the output.

We can now implement our model. It only takes one line to define it!

In [5]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

### Training

In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad. We call this the cost, or the loss, and it represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.

To implement cross-entropy we need to first add a new placeholder to input the correct answers:

In [5]:
y_ = tf.placeholder(tf.float32, [None, 10])

Then we can implement the cross-entropy function

In [7]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

Now that we know what we want our model to do, it's very easy to have TensorFlow train it to do so. Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.

In [8]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In [10]:
init = tf.initialize_all_variables()


In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as simple as tweaking one line.

What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.

We can now launch the model in an InteractiveSession:

In [11]:
sess = tf.Session()
sess.run(init)

Let's train -- we'll run the training step 1000 times!

In [12]:
for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

### Evaluating Our Model

How well does our model do?

Well, first let's figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.

In [13]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

In [14]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [15]:
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

0.9176
