# Tensorflow Beginner's Tutorial
This is the beginner's tutorial into Tensorflow, a software toolkit for constructing and executing computational graph. We will closely follow one of tensorflow's turorial [available here](https://www.tensorflow.org/versions/r1.3/get_started/get_started). In this notebook we cover basic ingredients you have to know to use Tensorflow. By the end of this notebook, you will learn how to build a simple computational graph for a linear regression by fitting a straight line to data points.

### Preparation?
You can run this notebook on [google colaboratory](https://colab.research.google.com) and you do not need to run on your own machine (you certainly can use your machine if you prefer!). 

Let's start by importing tensorflow.

In [1]:
import tensorflow as tf

## Constants
Tensorflow builds a computational graph. Everything you define in tensorflow can be considered as an operation node in a graph. What the hell does that mean? Let's start with the simplest, a constant number.

In [2]:
node0 = tf.constant(1.0, dtype=tf.float32)
node1 = tf.constant(2.0, dtype=tf.float32)

We just defined 2 nodes (`node0` and `node1`) for very simple operation: _return constant values_. Let's check their type and also string descriptions.

In [3]:
print 'Type:', type(node0)
print 'Descr:', node0

Type: <class 'tensorflow.python.framework.ops.Tensor'>
Descr: Tensor("Const:0", shape=(), dtype=float32)


## Session
"Execution" of those node means simply return constant values. In tensorflow, an execution of a graph is done using a _session_. Here's an example of how.

In [4]:
sess = tf.InteractiveSession()
sess.run(node0)

1.0

You can also evaluate multiple operations at once. 

In [5]:
sess.run([node0,node1])

[1.0, 2.0]

## Addition of constants: simplest graph
Let's build a simple computation graph: addition of two constants.

In [6]:
node_addition0 = tf.add(node0,node1)
node_addition1 = node0 + node1
sess.run([node_addition0, node_addition1])

[3.0, 3.0]

You can see we could use either tf.add operation or simply `+` which maps to the same operation (`+` operator of `Tensor`, the type of `node0` and `node1`, is internally defined for convenience).

## Placeholder
This is something you will use a lot in future if you stick with tensorflow. In the previous example, we did an addition of two constant values. However, it may be more useful to have an abstract "addition" operation for any values instead of constants. A placeholder is a `Tensor` which only defines the shape and not a value. 

In [7]:
node_placeholder0 = tf.placeholder(dtype=tf.float32)
node_placeholder1 = tf.placeholder(dtype=tf.float32)

We just defined placeholders that do not specify values of operation. Unlike constants, therefore, executing these operations require specification of values. This can be provided at _run time_ of a session.

In [8]:
print sess.run([node_placeholder0], feed_dict={node_placeholder0 : 0})

[array(0., dtype=float32)]


Below is how a simple addition graph looks like

In [9]:
node_addition2 = node_placeholder0 + node_placeholder1
print sess.run(node_addition2, feed_dict={node_placeholder0: 1, node_placeholder1: 2})

3.0


## Variables
`Variable` is yet another type of tensor. It can hold a value which is allowed to change, unlike constant. Of course, you can specify the initialization value or even operation to feed the initial value.

In [10]:
slope = tf.Variable(0.3, dtype=tf.float32)
const = tf.Variable(-0.3, dtype=tf.float32)
x     = tf.placeholder(dtype=tf.float32)
prediction = slope * x + const

If we try to evaluate `y` now, we will get an error message that says "cannot use variables without initialization". This is very important: all these tensors are _operations_ and they have a notion of _state_. They need to be _initialized_ before used. Constants are initialized when it is defined. Variables, on the other hand, are all initialized together like this:

In [11]:
init_operation = tf.global_variables_initializer()
sess.run(init_operation)

Now we can evaluate `prediction`.

In [12]:
print sess.run(prediction, feed_dict={x: [1.]})

[0.]


Or multiple `y` for multiple `x`. Tensorflow is good at parallelizing computations, a benefit of computational graph model.

In [13]:
print sess.run(prediction, feed_dict={x: [0., 1., 2.]})

[-0.3  0.   0.3]


One more thing about initialization. Note that I mentioned that we can specify an operation to inigialize a variable as opposed to giving an actual value. This is what you typically do when you want to initialize variables used to train a machine learning algorithm. Such an operation could be gaussian distribution function. Running `init_operation` then will draw values from a gaussian distribution to fill the initial values.

## Linear regression: error calculation
Having a linear model (`prediction`), we can next learn linear regression. Our approach is to define an error, and minimize the error. Let's first start by defining an error.

In [14]:
# This is the answer = true value
y      = tf.placeholder(tf.float32)
# This is squared distance between the answer and prediction (y)
errors = tf.square(prediction - y)
# This is a sum of errors over multiple data points
loss   = tf.reduce_sum(errors)

print sess.run(loss, feed_dict={x:[1.,2.],y:[1.,2.]})

3.89


Just to be clear, what we did is same as the following line

In [15]:
import numpy as np
error0 = np.power((0.3 * 1. - 0.3) - 1., 2) # for data point (x,y) = (1., 1.)
error1 = np.power((0.3 * 2. - 0.3) - 2., 2) # for data point (x,y) = (2., 2.)
average_error = (error0 + error1)
print average_error

3.8899999999999997


## Linear regression: optimization
Now that we defined an error, the next step is to learn how to minimize this error. This process, called _regression_, is an iterative process of calculating an error and change parameters (variables) accordingly. A single execution of this process is often called a _step_ or <strong>_train step_</strong>. A function to calculate how much change should be made in what direction is called an _optimizer_. There are multiple optimizers available in tensorflow but covering them is not in the scope of this notebook. We will pick one of the most popular one, called <strong>GradientDescentOptimizer</strong>.

In [16]:
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

The argument value 0.01 in GradientDescentOptimizer is called a <strong>_learning rate_</strong> and is responsible for determining how much change should be made to the values of variables. `train` is an operation that changes the variables in the computation graph in order to minimize the loss. Let's run a single train step.

In [17]:
# Run a train step
sess.run(train, feed_dict={x: [1.,2.], y:[1.,2.]})
# Let's see if the loss value is decreased with the current model
print 'Loss:', sess.run(loss, feed_dict={x:[1.,2.],y:[1.,2.]})

Loss: 2.8970642


Hooray! We see that `train` operation changed variables, and loss seems to be smaller. Let's run this for many steps.

In [18]:
for _ in xrange(1000):
    sess.run(train, feed_dict={x: [1.,2.], y:[1.,2.]})
print 'Loss:', sess.run(loss, feed_dict={x: [1.,2.], y:[1.,2.]})

Loss: 5.346064e-06


Now the error is very small. We fit a line to two points: (1,1) and (2,2). So we expect (`slope`,`const`) to be (1.,0.). Let's take a look!

In [19]:
print sess.run([slope,const])

[0.9968176, 0.005149256]


In this notebook we learned basic ingredients of building a computational graph.
* Session
* Constant
* Variable
* Placeholder
* Optimizer
You are ready to attack the next level example of MNIST.