# Visualizing Learning and Graph Visualization

This tutorial is taken from "Hello, Tensorflow" by Aaron Schumacher, 2016.

From the first tutorial, we have built a neuron that takes the input 1.0 and returns 0.8. 

In [7]:
import tensorflow as tf
x = tf.constant(1.0, name='input')
w = tf.Variable(0.8, name='weight')
y = tf.mul(w, x, name='output')
sess = tf.Session()
sess.run(tf.global_variables_initializer())

Let's say the correct output should be zero. In short, we have a very simple training set of just one example with one feature (which has value one) and one label (zero). We want this neuron to learn the function taking one to zero.

The system is not working perfectly, it returns 0.8 instead of zero, given the input of 1.0. We will use "loss" as a measure of how "wrong"" the system is.

In you opinion, what are some ways to measure this "loss"?

answer:

One way to define "loss" is taking the square of the difference between the current output and the desired output.

In [8]:
y_ = tf.constant(0.0)
loss = (y - y_)**2
loss

<tf.Tensor 'pow:0' shape=() dtype=float32>

What is the value of "loss" in this example?

In [9]:
sess.run(loss)

0.64000005

To help the graph to learn, we need an [optimizer](https://www.tensorflow.org/api_docs/python/train/optimizers). We'll use a [gradient descent optimizer](https://www.tensorflow.org/api_docs/python/train/optimizers#GradientDescentOptimizer) that we can update the weight based on the derivative of the loss.
What is the derivative of the loss?

answer: 

The optimizer takes a learning rate to moderate the size of the updates, which we’ll set at 0.025.

In [18]:
optim = tf.train.GradientDescentOptimizer(learning_rate=0.025)
grads_and_vars = optim.compute_gradients(loss)
grads_and_vars

[(None, <tensorflow.python.ops.variables.Variable at 0xc622d297f0>),
 (None, <tensorflow.python.ops.variables.Variable at 0xc6264db898>),
 (<tf.Tensor 'gradients_7/output_2_grad/tuple/control_dependency:0' shape=() dtype=float32>,
  <tensorflow.python.ops.variables.Variable at 0xc61dbde4e0>)]

The optimizer applies the appropriate gradients through a whole network, carrying out the backward step for learning.

In [19]:
sess.run(grads_and_vars[-1][0])

1.6

This value of the gradient should match your answer above, which is the derivative of the loss function at 0.8.
Let's apply the gradient to finish the backpropagation.

In [20]:
sess.run(optim.apply_gradients(grads_and_vars))

We can now check the updated weight of this neuron.

In [21]:
sess.run(w)

0.75999999

Compared to the initial weight of 0.8, this updated weight is 0.04 lower. Does the decrease make sense? Why a decrease of 0.4?

answer: 

To make the system learn multiple times, we can make one operation that calculates and applies the gradients: the train_step, and execute the train_step operation as many times as we want.

In [22]:
train_step = tf.train.GradientDescentOptimizer(0.025).minimize(loss)
for i in range(100):
    sess.run(train_step)

We just run the train_step 100 times. Let's check if the current output is closer to the desired one.

In [26]:
sess.run(y) # current output

0.0044996012

Compared to the initial output of 0.8, the updated output after many train steps is closer to zero - the desired output.

In [24]:
sess.run(w) # current weight

0.0044996012

What will happen to the output if we increase the iteration of train_step?

answer:

Let's add in more code to see the output after each train_step in the following format:

before step 0, y is 0.800000011921

before step 1, y is 0.759999990463

...

before step 98, y is 0.00524811353534

before step 99, y is 0.00498570781201

In [30]:
# answer

before step 0, y is 0.800000011920929
before step 1, y is 0.7599999904632568
before step 2, y is 0.722000002861023
before step 3, y is 0.6858999729156494
before step 4, y is 0.651604950428009
before step 5, y is 0.6190246939659119
before step 6, y is 0.5880734324455261
before step 7, y is 0.5586697459220886
before step 8, y is 0.5307362675666809
before step 9, y is 0.5041994452476501


It's hard to see how quickly the output decreases by looking at this long list. The TensorBoard helps us by providing a nice plot using FileWriter.

In [33]:
file_writer = tf.summary.FileWriter("log_simple_stats")
sess.run(tf.global_variables_initializer())
summary_y = tf.summary.scalar('output', y)
for i in range(100):
    summary_str = sess.run(summary_y)
    file_writer.add_summary(summary_str, i)
    sess.run(train_step)

Start up the TensorBoard and we'll see the plot.
![log_simple_stats](https://github.com/chauvm/tensorflow_tutorials/raw/master/images/log_simple_stats.png "TensorBoard's diagram of our training progress")

More resources:

https://en.wikipedia.org/wiki/Activation_function

https://www.tensorflow.org/tutorials/

https://www.tensorflow.org/how_tos/