I wrote this notebook after reading this very instructional article: https://jacobbuckman.com/post/tensorflow-the-confusing-parts-1/

Most of the code here is taken from this great original article, comments are mine.

The main abstractions of Tensorflow are clearly presented, with constant reference to the computation graph which result in a good understanding of what happens under the hood for each instruction.

This notebook summarises the main points of the article and may be used for future reference.

In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


# The computation graph

The computation graph can be seen as a series of instructions to perform computations.

In [2]:
two_node = tf.constant(2)
print(two_node)

Tensor("Const:0", shape=(), dtype=int32)


Everytime `tf.constant`is called, it adds a new node to the graph. The following instructions will create two distinct nodes:

In [3]:
two_node = tf.constant(2)
two_node = tf.constant(2)

On the other hand, assigning a new variable to an existing node just copies the pointer to that node:

In [4]:
two_node = tf.constant(2)  # create the node
another_pointer_at_two_node = two_node  # no node added to the graph, just copied the pointer

In [5]:
two_node = tf.constant(2)
another_pointer_at_two_node = two_node
two_node = None
print(two_node)
print(another_pointer_at_two_node)

None
Tensor("Const_4:0", shape=(), dtype=int32)


The `+` operator is overloaded in Tensorflow, so that it actually performs tensor addition (similar to using `tf.add(tensor_1, tensor_2)`)

In [6]:
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node

# The session

The session handles memory allocation and optimization to actually perform the operations described by the computation graph.

A graph and a session are the mandaroty elements to perform computations in Tensorflow.

The session contains a pointer to the graph which contains pointers to all the nodes.

In [7]:
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print(sess.run(sum_node))

5


Several nodes can be evaluated at once:

In [8]:
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print(sess.run([two_node, sum_node]))

[2, 5]


`sess.run()` is usually a bottleneck in Tensorflow: whenever possible, it is better to return multiple values from a single call to `sess.run()`rather than calling it multiple times

# Placeholders and feed_dict

A placeholder node is a node that is meant to receive external input. The external input is provided using the `feed_dict` argument in `sess.run()``

In [9]:
input_placeholder = tf.placeholder(tf.int32)
sess = tf.Session()

print(sess.run(input_placeholder, feed_dict={input_placeholder: 2}))

2


# Variables

Variables are usually used to store model parameters: the variables change values during the training phase and then are held constant during the prediction phase. `tf.get_variable(name, shape)` is used to create a new variable. `name` should be a unique variable name for the while graph, `shape` sets the size of the tensor, `[]` corresponding to a scalar.

A variable can only be evaluated *after* storing a value into it, using either initializers or `tf.assign()`.

## Using `tf.assign()`

In [10]:
count_variable = tf.get_variable("count", [])
zero_node = tf.constant(0.)
assign_node = tf.assign(count_variable, zero_node)
sess = tf.Session()
sess.run(assign_node)
print(sess.run(count_variable))

0.0


`tf.assign(node, value)` returns `value`

`tf.assign()` has a side effect: when computation runs through the `assign_node` node, it changes the value of `count_variable` to `zero_node`.



In [11]:
print(sess.run(assign_node))

0.0


## Using initializers

In [12]:
const_init_node = tf.constant_initializer(0.)
count_variable = tf.get_variable("count2", [], initializer=const_init_node)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print(sess.run(count_variable))

0.0


`tf.global_variables_initialier()` is a node with side effects: it goes through all the initializers and initialises the variables.

# Optimizers

Example below uses a linear regression example, demonstraing the use of optimizers.

In [13]:
import random

### build the graph
## first set up the model parameters
m = tf.get_variable("m", [], initializer=tf.constant_initializer(0.))
b = tf.get_variable("b", [], initializer=tf.constant_initializer(0.))
init = tf.global_variables_initializer()  # add the initialization node to the graph

## then set up the computations
input_placeholder = tf.placeholder(tf.float32)
output_placeholder = tf.placeholder(tf.float32)

x = input_placeholder
y = output_placeholder
y_guess = m * x + b

loss = tf.square(y - y_guess)

## finally, set up the optimizer and minimization node
optimizer = tf.train.GradientDescentOptimizer(1e-3)  # this is NOT a node, just a python object
train_op = optimizer.minimize(loss)  # this is a node in the graph

### start the session
sess = tf.Session()
sess.run(init)

### perform the training loop


## set up problem
true_m = random.random()
true_b = random.random()

for update_i in range(100):
  ## (1) get the input and output
  input_data = random.random()
  output_data = true_m * input_data + true_b

  ## (2), (3), and (4) all take place within a single call to sess.run()!
  _loss, _ = sess.run([loss, train_op], feed_dict={input_placeholder: input_data, output_placeholder: output_data})
  print('Step %d, loss %.2f' % (update_i, _loss))

### finally, print out the values we learned for our two variables
print("True parameters:     m=%.4f, b=%.4f" % (true_m, true_b))
print("Learned parameters:  m=%.4f, b=%.4f" % tuple(sess.run([m, b])))

Step 0, loss 0.45
Step 1, loss 0.47
Step 2, loss 1.04
Step 3, loss 1.05
Step 4, loss 0.74
Step 5, loss 1.34
Step 6, loss 0.71
Step 7, loss 1.32
Step 8, loss 0.61
Step 9, loss 1.02
Step 10, loss 1.27
Step 11, loss 0.95
Step 12, loss 0.60
Step 13, loss 0.85
Step 14, loss 0.62
Step 15, loss 0.65
Step 16, loss 1.29
Step 17, loss 0.67
Step 18, loss 0.69
Step 19, loss 1.04
Step 20, loss 1.04
Step 21, loss 1.13
Step 22, loss 0.59
Step 23, loss 0.87
Step 24, loss 0.37
Step 25, loss 0.42
Step 26, loss 0.42
Step 27, loss 0.55
Step 28, loss 0.77
Step 29, loss 0.74
Step 30, loss 0.74
Step 31, loss 0.63
Step 32, loss 0.39
Step 33, loss 1.14
Step 34, loss 0.73
Step 35, loss 1.19
Step 36, loss 0.41
Step 37, loss 0.95
Step 38, loss 1.15
Step 39, loss 0.70
Step 40, loss 0.88
Step 41, loss 0.73
Step 42, loss 0.62
Step 43, loss 0.94
Step 44, loss 0.34
Step 45, loss 0.74
Step 46, loss 0.40
Step 47, loss 1.06
Step 48, loss 1.10
Step 49, loss 0.84
Step 50, loss 1.08
Step 51, loss 0.37
Step 52, loss 0.83
Ste

Notes on the code above:
* `optimizer` points to a Python object, it is not a node added to the graph
* `train_op = ...` on the other hand adds a node added to the graph, and has side effects when evaluated: it performs one step of gradient descend, by evaluating the gradient of the loss function wrt `m` and `b` and updating the `m = m - alpha * dJ/dm` and `b = b - alpha * dJ / db`

# Debugging with `tf.Print()`

`tf.Print(node_to_copy, [nodes to print])` adds a node to the graph which has both an output and side effect. `tf.Print()` will create a copy of the `node_to_copy` and as a side effect prints the values of the nodes in nodes to print.

In [15]:
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
print_sum_node = tf.Print(sum_node, [two_node, three_node])
sess = tf.Session()
print(sess.run(print_sum_node))

5


`tf.Print()` adds a node to the graph. For the side effect to occur, the node has to be on the computation path (otherwise nothing happens ;)).

Additional resources for debugging: https://wookayin.github.io/tensorflow-talk-debugging/#1