# Overview

TensorFlow is a general-purpose system for graph-based computation. A typical use is machine learning. In this notebook, we'll introduce the basic concepts of TensorFlow using some simple examples.

TensorFlow gets its name from [tensors](https://en.wikipedia.org/wiki/Tensor), which are arrays of arbitrary dimensionality. A vector is a 1-d array and is known as a 1st-order tensor. A matrix is a 2-d array and a 2nd-order tensor. The "flow" part of the name refers to computation flowing through a graph. Training and inference in a neural network, for example, involves the propagation of matrix computations through many nodes in a computational graph.

When you think of doing things in TensorFlow, you might want to think of creating tensors (like matrices), adding operations (that output other tensors), and then executing the computation (running the computational graph). In particular, it's important to realize that when you add an operation on tensors, it doesn't execute immediately. Rather, TensorFlow waits for you to define all the operations you want to perform. Then, TensorFlow optimizes the computation graph, deciding how to execute the computation, before generating the data. Because of this, a tensor in TensorFlow isn't so much holding the data as a placeholder for holding the data, waiting for the data to arrive when a computation is executed.

A TensorFlow graph is a description of computations. To compute anything, a graph must be launched in a Session. A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as [numpy](www.numpy.org) ndarray objects in Python, and as tensorflow::Tensor instances in C and C++.

To use TensorFlow you need to understand how TensorFlow:

- Represents computations as graphs.
- Executes graphs in the context of Sessions.
- Represents data as tensors.
- Maintains state with Variables.
- Uses feeds and fetches to get data into and out of arbitrary operations.

# The computation graph

TensorFlow programs are usually structured into a construction phase, that assembles a graph, and an execution phase that uses a session to execute ops in the graph.

For example, it is common to create a graph to represent and train a neural network in the construction phase, and then repeatedly execute a set of training ops in the graph in the execution phase.

TensorFlow can be used from C, C++, and Python programs. It is presently much easier to use the Python library to assemble graphs, as it provides a large set of helper functions not available in the C and C++ libraries.

The session libraries have equivalent functionalities for the three languages.

## Building the graph

To build a graph start with ops that do not need any input (source ops), such as Constant, and pass their output to other ops that do computation.

The ops constructors in the Python library return objects that stand for the output of the constructed ops. You can pass these to other ops constructors to use as inputs.

The TensorFlow Python library has a default graph to which ops constructors add nodes. The default graph is sufficient for many applications. See the [Graph class](https://www.tensorflow.org/versions/r0.10/api_docs/python/framework.html#Graph) documentation for how to explicitly manage multiple graphs.

In [None]:
import tensorflow as tf

# Create a Constant op that produces a 1x2 matrix.  The op is
# added as a node to the default graph.
#
# The value returned by the constructor represents the output
# of the Constant op.
matrix1 = tf.constant([[3., 3.]])

# Create another Constant that produces a 2x1 matrix.
matrix2 = tf.constant([[2.],[2.]])

# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
# The returned value, 'product', represents the result of the matrix
# multiplication.
product = tf.matmul(matrix1, matrix2)

The default graph now has three nodes: two constant() ops and one matmul() op. To actually multiply the matrices, and get the result of the multiplication, you must launch the graph in a session.

In [None]:
# Print the operations included in the default graph. Additional operations will be added to the same graph until we
# restart the kernel. 
print tf.Graph.as_graph_def(tf.get_default_graph())

## Launching the graph in a session

Launching follows construction. To launch a graph, create a Session object. Without arguments the session constructor launches the default graph.

See the [Session class](https://www.tensorflow.org/versions/r0.10/api_docs/python/client.html#session-management) for the complete session API.

In [None]:
# Launch the default graph.
sess = tf.Session()

# To run the matmul op we call the session 'run()' method, passing 'product'
# which represents the output of the matmul op.  This indicates to the call
# that we want to get the output of the matmul op back.
#
# All inputs needed by the op are run automatically by the session.  They
# typically are run in parallel.
#
# The call 'run(product)' thus causes the execution of three ops in the
# graph: the two constants and matmul.
#
# The output of the op is returned in 'result' as a numpy `ndarray` object.
result = sess.run(product)
print(result)

# Close the Session when we're done.
sess.close()

Sessions should be closed to release resources. You can also enter a Session with a "with" block. The Session closes automatically at the end of the with block.

In [None]:
with tf.Session() as sess:
  result = sess.run(product)
  print "run: ", result
    
# Equivalent version (in the case of one output tensor)
with tf.Session():
  result = product.eval()
  print "eval: ", result

The TensorFlow implementation translates the graph definition into executable operations distributed across available compute resources, such as the CPU or one of your computer's GPU cards. In general you do not have to specify CPUs or GPUs explicitly. TensorFlow uses your first GPU, if you have one, for as many operations as possible.

If you have more than one GPU available on your machine, to use a GPU beyond the first you must assign ops to it explicitly. Use with...Device statements to specify which CPU or GPU to use for operations:

In [None]:
import tensorflow as tf
tf.reset_default_graph()

with tf.Session():
  with tf.device("/cpu:0"):    
    matrix1 = tf.constant([[3., 3.]])
    matrix2 = tf.constant([[2.],[2.]])
    product = tf.matmul(matrix1, matrix2)
    print product.eval()

Devices are specified with strings. The currently supported devices are:

* "/cpu:0": The CPU of your machine.
* "/gpu:0": The GPU of your machine, if you have one.
* "/gpu:1": The second GPU of your machine, etc.

See [Using GPUs](https://www.tensorflow.org/versions/r0.10/how_tos/using_gpu/index.html) for more information about GPUs and TensorFlow.

## Fetches

To fetch the outputs of operations, execute the graph with a run() call on the Session object and pass in the tensors to retrieve. In the previous example we fetched the single node state, but you can also fetch multiple tensors:



In [None]:
import tensorflow as tf
tf.reset_default_graph()

input1 = tf.constant([3.0])
input2 = tf.constant([2.0])
input3 = tf.constant([5.0])
intermed = tf.add(input2, input3)
mul = tf.mul(input1, intermed)

with tf.Session() as sess:
  result = sess.run([mul, intermed])
  print(result)


## Feeds

The examples above introduce tensors into the computation graph by storing them in Constants. TensorFlow also provides a feed mechanism for patching a tensor directly into any operation in the graph.

A feed temporarily replaces the output of an operation with a tensor value. You supply feed data as an argument to a run() call. The feed is only used for the run call to which it is passed. The most common use case involves designating specific operations to be "feed" operations by using tf.placeholder() to create them:

In [None]:
# A placerholder behaves is a tensor that needs to be specified when we run the graph. It generates an error if it
# is evaluated and it's not in the feeds.
a = tf.constant([3.0])
b = tf.placeholder(tf.float32)
c = a + b  # This is the same as tf.add(a, b)
d = tf.abs(a)
e = tf.mul(c, d)  # Entry multiplication.
f = tf.square(c)

with tf.Session() as sess:
  print(sess.run([f], feed_dict={b:[7.]}))

# But we can feed to any node in the graph, e.g.:
with tf.Session() as sess:
  print(sess.run([e], feed_dict={c:[8.], d:[3.0]}))



All the ops needed to produce the values of the requested tensors are run once (not once per requested tensor).

## Tensors

TensorFlow programs use a tensor data structure to represent all data -- only tensors are passed between operations in the computation graph. You can think of a TensorFlow tensor as an n-dimensional array or list. A tensor has a static type, a rank, and a shape.

### Rank

In the TensorFlow system, tensors are described by a unit of dimensionality known as rank. Tensor rank is not the same as matrix rank. Tensor rank (sometimes referred to as order or degree or n-dimension) is the number of dimensions of the tensor. For example, the following tensor (defined as a Python list) has a rank of 2:

t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

A rank two tensor is what we typically think of as a matrix, a rank one tensor is a vector. For a rank two tensor you can access any element with the syntax t[i, j]. For a rank three tensor you would need to address an element with t[i, j, k].


| Rank	| Math entity	| Python example |
| ------------- |:-------------:| ---------------|
|1	| Vector (magnitude and direction)	| v = [1.1, 2.2, 3.3] |
|2	| Matrix (table of numbers)	| m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] |
|3	| 3-Tensor (cube of numbers) |	t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]] |
|n	| n-Tensor (you get the idea) |	.... |

### Shape

The TensorFlow documentation uses three notational conventions to describe tensor dimensionality: rank, shape, and dimension number. The following table shows how these relate to one another:


| Rank	| Shape	| Dimension number	| Example |
| ------------- |:-------------:| |:-------------:| ---------------|
|0	| []	| 0-D	| A 0-D tensor. A scalar. |
|1	| [D0]	| 1-D	| A 1-D tensor with shape [5]. |
|2	| [D0, D1]	| 2-D	| A 2-D tensor with shape [3, 4]. |
|3	| [D0, D1, D2]	| 3-D	| A 3-D tensor with shape [1, 4, 3]. |
|n	| [D0, D1, ... Dn-1]	| n-D 	| A tensor with shape [D0, D1, ... Dn-1]. |

### Data types

In addition to dimensionality, Tensors have a data type. You can assign any one of the following data types to a tensor:

| Data type	| Python type	| Description |
| ------------- |:-------------:| ---------------|
|DT_FLOAT	|tf.float32	|32 bits floating point.|
|DT_DOUBLE	|tf.float64	|64 bits floating point.|
|DT_INT8	|tf.int8	|8 bits signed integer.|
|DT_INT16	|tf.int16	|16 bits signed integer.|
|DT_INT32	|tf.int32	|32 bits signed integer.|
|DT_INT64	|tf.int64	|64 bits signed integer.|
|DT_UINT8	|tf.uint8	|8 bits unsigned integer.|
|DT_STRING	|tf.string	|Variable length byte arrays. Each element of a Tensor is a byte array.|
|DT_BOOL	|tf.bool	|Boolean.|
|DT_COMPLEX64	|tf.complex64	|Complex number made of two 32 bits floating points: real and imaginary parts.|
|DT_COMPLEX128	|tf.complex128	|Complex number made of two 64 bits floating points: real and imaginary parts.|
|DT_QINT8	|tf.qint8	|8 bits signed integer used in quantized Ops.|
|DT_QINT32	|tf.qint32	|32 bits signed integer used in quantized Ops.|
|DT_QUINT8	|tf.quint8	|8 bits unsigned integer used in quantized Ops.|


## Tensor Transformations

In the documentation a full list of [transformations](https://www.tensorflow.org/versions/r0.10/api_docs/python/array_ops.html#tensor-transformations) is available. Some useful operations are given here.



In [None]:
import tensorflow as tf
tf.reset_default_graph()

with tf.Session():   
    a = tf.constant(range(0, 12))
    print "a = ", a.eval()
    
    # tf.reshape(x, shape) reshapes x to a new shape
    b = tf.reshape(a, [4, 3])
    print "b = ", b.eval()
    c = tf.reshape(a, [2, 2, 3])
    print "c = ", c.eval()
    
    # tf.size(x) is the size of x, tf.shape(x) the shape
    size_c = tf.size(c)
    shape_c = tf.shape(c)
    print "size c = ", size_c.eval()
    print "shape c = ", shape_c.eval()
    
    # tf.slice(x, begin, size). E.g. take last 2 columns of b:
    d = tf.slice(b, [0, 1], [4, 2])
    print "d = ", d.eval()
    
    # tf.reduce_sum(x, reduce_indices)
    e = tf.reduce_sum(b, 0)  # sums the rows of b
    print "e = ", e.eval()
    f = tf.reduce_sum(b, 1)  # sum the cols of b
    print "f = ", f.eval()

## Excercise

Let's use what we learn so far to complete the following example. We pretend that the input features and labels are defined as constants (we will see later how to load them from file) and we try to solve a regression problem.

In [30]:
import tensorflow as tf
tf.reset_default_graph()

# input_features contains the 4 feature values for 3 samples. We use the first dimension to index the points and the
# second to index the features
input_features = tf.constant([[0.5, 0.0, 1, 0.7],
                              [0.2, 1.0, 0.8, 0.5],
                              [0.3, 1.0, 0.5, 0.4]])
# input_groundtruth contains the target values for the 3 points. We use a rank 1 tensor.
input_groundtruth = tf.constant([0.0, 1.0, 1.0])

# We transform every point x by applying a linear transformation, i.e.:
# y = sum w[i] * x[i] + bias
# This needs to be done for every x in input_features (rows.) and the result must be stored in a row of the prediction
# tensor.
w = tf.random_normal([4, 1], stddev=0.35)
bias = tf.random_uniform([1], minval=-1, maxval=1)

# YOUR CODE HERE. The answers are at the end of the notebook.
# prediction has shape [3], every row is one of the vector y defined above
prediction = tf.squeeze(tf.matmul(input_features, w)) + bias




# YOUR CODE HERE.
# Assume we want to solve a regression problem. We want to compare the values of prediction with input_groundtruth.
# quadratic_error is a rank 1 tensor containing sum (input_groundtruth[i] - prediction[i]) ^ 2 
quadratic_error = tf.reduce_sum(tf.square(prediction - input_groundtruth))

with tf.Session() as sess:
    [w_eval, bias_eval, prediction_eval, quadratic_error_eval] = sess.run([w, bias, prediction, quadratic_error])
    print "w = ", w_eval
    print "bias = ", bias_eval
    print "prediction = ", prediction_eval
    print "quadratic_error_eval = ", quadratic_error_eval

w =  [[-0.01011819]
 [ 0.40888894]
 [-0.14174329]
 [-0.32189864]]
bias =  [-0.52110577]
prediction =  [-0.89323717 -0.38858443 -0.31488338]
quadratic_error_eval =  4.45496


## Variables

Variables maintain state across executions of the graph. When you train a model, you use variables to hold and update parameters. Variables are in-memory buffers containing tensors. They must be explicitly initialized and can be saved to disk during and after training. You can later restore saved values to exercise or analyse the model.

See [Variables](https://www.tensorflow.org/versions/r0.10/how_tos/variables/index.html) for details about variables.
 
### Creation

When you create a Variable you pass a Tensor as its initial value to the Variable() constructor. TensorFlow provides a collection of ops that produce tensors often used for initialization from constants or random values.

Note that all these ops require you to specify the shape of the tensors. That shape automatically becomes the shape of the variable. Variables generally have a fixed shape, but TensorFlow provides advanced mechanisms to reshape variables.


In [31]:
import tensorflow as tf

# Reset graph.
tf.reset_default_graph()

# Same as above.
input_features = tf.constant([[0.5, 0.0, 1, 0.7],
                              [0.2, 1.0, 0.8, 0.5],
                              [0.3, 1.0, 0.5, 0.4]])
input_groundtruth = tf.constant([0.0, 1.0, 1.0])

# Create two variables.
w = tf.Variable(tf.random_normal([4, 1], stddev=0.35),
                name="weights")
bias = tf.Variable(tf.zeros([1]), name="bias")

# Same as above.
prediction = tf.squeeze(tf.matmul(input_features, w)) + bias
quadratic_error = tf.reduce_sum(tf.square(prediction - input_groundtruth))


### Initialization

Variable initializers must be run explicitly before other ops in your model can be run. The easiest way to do that is to add an op that runs all the variable initializers, and run that op before using the model.

You can alternatively restore variable values from a checkpoint file.

Use tf.initialize_all_variables() to add an op to run variable initializers. Only run that op after you have fully constructed your model and launched it in a session.

In [32]:
# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()


### Update

Variables can be updated using the assign operation, e.g.:

In [33]:
# This is a list of update operations.
update = [w.assign(w + 0.01), bias.assign(bias - 0.1)]

In [34]:
with tf.Session() as sess:
    # Runs the init op. Note that run takes also operations.
    sess.run(init_op)
    for i in range(3):
        # We need this line only to get the values to print.
        [w_eval, bias_eval, prediction_eval, quadratic_error_eval] = sess.run([w, bias, prediction, quadratic_error])
        print "Iteration ", i
        print "w = ", w_eval
        print "bias = ", bias_eval
        print "prediction = ", prediction_eval
        print "quadratic_error_eval = ", quadratic_error_eval
        # Runs the op that updates 'w' and 'bias'.
        sess.run(update)
        

Iteration  0
w =  [[ 0.32333207]
 [-0.3069433 ]
 [-0.28330067]
 [ 0.06128601]]
bias =  [ 0.]
prediction =  [-0.07873443 -0.43827441 -0.32707959]
quadratic_error_eval =  3.83597
Iteration  1
w =  [[ 0.33333206]
 [-0.29694331]
 [-0.27330068]
 [ 0.07128601]]
bias =  [-0.1]
prediction =  [-0.15673444 -0.51327443 -0.4050796 ]
quadratic_error_eval =  4.28881
Iteration  2
w =  [[ 0.34333205]
 [-0.28694332]
 [-0.26330069]
 [ 0.08128601]]
bias =  [-0.2]
prediction =  [-0.23473446 -0.58827442 -0.48307967]
quadratic_error_eval =  4.77724


The assign() operation in this code is a part of the expression graph just like the matmul() operation, so it does not actually perform the assignment until run() executes the expression.

You typically represent the parameters of a statistical model as a set of Variables. For example, you would store the weights for a neural network as a tensor in a Variable. During training you update this tensor by running a training graph repeatedly.

## Optimization

You can already run the graph for multiple values of the parameters and find those the minimize the quadratic error. However, this is slow and inefficient. Instead, you can use one of the provided optimizers. You will see more about this in the next tutorials.

In [35]:
update = tf.train.GradientDescentOptimizer(0.01).minimize(quadratic_error)

with tf.Session() as sess:
    # Runs the init op. Note that run takes also operations.
    sess.run(init_op)
    for i in range(3):
        [w_eval, bias_eval, prediction_eval, quadratic_error_eval] = sess.run([w, bias, prediction, quadratic_error])
        print "Iteration ", i
        print "w = ", w_eval
        print "bias = ", bias_eval
        print "prediction = ", prediction_eval
        print "quadratic_error_eval = ", quadratic_error_eval
        # Runs the op that updates 'w' and 'bias'
        sess.run(update)
    [w_eval, bias_eval, prediction_eval, quadratic_error_eval] = sess.run([w, bias, prediction, quadratic_error]) 
    print "quadratic_error_eval = ", quadratic_error_eval

Iteration  0
w =  [[-0.26500654]
 [-0.15438341]
 [-0.18121405]
 [ 0.25496244]]
bias =  [ 0.]
prediction =  [-0.13524359 -0.22487473 -0.22250742]
quadratic_error_eval =  3.01313
Iteration  1
w =  [[-0.25141957]
 [-0.10543576]
 [-0.14668611]
 [ 0.27888465]]
bias =  [ 0.05165252]
prediction =  [-0.02552414 -0.08197372 -0.09099831]
quadratic_error_eval =  2.3616
Iteration  2
w =  [[-0.24029045]
 [-0.06197632]
 [-0.11795406]
 [ 0.29878971]]
bias =  [ 0.09562244]
prediction =  [ 0.06667595  0.04061963  0.02209783]
quadratic_error_eval =  1.88115
quadratic_error_eval =  1.52562


## Saver

The [Saver](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#Saver) adds ops to save and restore variables to and from checkpoints. It also provides convenience methods to run these ops. Note that there is no need to save other tensors, since they can be recomputed from the graph.

Checkpoints are binary files in a proprietary format which map variable names to tensor values. The best way to examine the contents of a checkpoint is to load it using a Saver.

Savers can automatically number checkpoint filenames with a provided counter. This lets you keep multiple checkpoints at different steps while training a model. For example you can number the checkpoint filenames with the training step number. To avoid filling up disks, savers manage checkpoint files automatically. For example, they can keep only the N most recent files, or one checkpoint for every N hours of training.

You number checkpoint filenames by passing a value to the optional global_step argument to save():

In [28]:
saver = tf.train.Saver()  # Creates the saver

with tf.Session() as sess:
    # Runs the init op. Note that run takes also operations.
    sess.run(init_op)
    for i in range(123):
        sess.run(update)
    [w_eval, bias_eval] = sess.run([w, bias])
    print "Final parameters\nw = ", w_eval, "\nbias = ", bias_eval
    s = saver.save(sess, 'my-model', global_step=123)  # Saves to my-model-123
    print "Saved to ", s
    

Final parameters
w =  [[-0.55853534]
 [ 0.75005877]
 [ 0.14311016]
 [-0.2382962 ]] 
bias =  [ 0.3707647]
Saved to  my-model-123


The variables can be restored later from one of the checkpoints:

In [29]:
# Reset graph.
tf.reset_default_graph()

# To load the variables I need to recreate them. Normally we recreate the whole graph, but some parts may not be
# necessary.
w = tf.Variable(tf.random_normal([4, 1], stddev=0.35),
                name="weights")
bias = tf.Variable(tf.zeros([1]), name="bias")
saver = tf.train.Saver() 
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint("."))  # Same as "my-model-123".
    [w_eval, bias_eval] = sess.run([w, bias])
    print "Restored parameters\nw = ", w_eval, "\nbias = ", bias_eval

Restored parameters
w =  [[-0.55853534]
 [ 0.75005877]
 [ 0.14311016]
 [-0.2382962 ]] 
bias =  [ 0.3707647]


## Summaries

A summary allows to collect information useful to analyze the values flowing through the graph. Typically summaries are collected and saved for analysis with Tensorboard. There is support for multiple types of summaries, e.g. images, audio, hisotgrams. The [summary operation](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#summary-operations) documentation describes this in detail. Let's see how summaries work rewriting the training graph.

In [None]:
# Reset graph.
tf.reset_default_graph()

# Same as above.
input_features = tf.constant([[0.5, 0.0, 1, 0.7],
                              [0.2, 1.0, 0.8, 0.5],
                              [0.3, 1.0, 0.5, 0.4]])
input_groundtruth = tf.constant([0.0, 1.0, 1.0])
w = tf.Variable(tf.random_normal([4, 1], stddev=0.35),
                name="weights")
bias = tf.Variable(tf.zeros([1]), name="bias")
prediction = tf.matmul(input_features, w) + bias
quadratic_error = tf.reduce_sum(tf.square(prediction - input_groundtruth))
init_op = tf.initialize_all_variables()
update = tf.train.GradientDescentOptimizer(0.01).minimize(quadratic_error)

# Adds a scalar_summary operation, which creates a Summary object.
tf.scalar_summary("quadratic_error", quadratic_error)
tf.histogram_summary("w values", w)
# Convenience function. It adds an operation that depends on each summary.
merged = tf.merge_all_summaries()

# Object that allows to write summaries.
train_writer = tf.train.SummaryWriter('.', sess.graph)  # We write to the current dir.

with tf.Session() as sess:
    # Runs the init op. Note that run takes also operations.
    sess.run(init_op)
    for i in range(123):
        sess.run(update)
        # At every iteration, we need to collect the summaries.
        merged_eval = sess.run(merged)
        train_writer.add_summary(merged_eval, i)
    [w_eval, bias_eval, merged_eval] = sess.run([w, bias, merged])
    print "Final parameters\nw = ", w_eval, "\nbias = ", bias_eval
    
    
    

## Tensorboard

To run tensorboard, you need to open a terminal and specify the directory used for logging with logdir, e.g:

$ tensorboard --port 8081 --logdir .       

You should be able to see the tensorboard output at http://localhost:8081

#### The answer to the excercise:

prediction = tf.squeeze(tf.matmul(input_features, w)) + bias
quadratic_error = tf.reduce_sum(tf.square(prediction - input_groundtruth))