In [None]:
import numpy as np
import seaborn as sb
import pandas
import sys
import itertools
import matplotlib.pyplot as plt
import nltk
import csv
import datetime
import tensorflow as tf
%matplotlib notebook

# Introduction into tensorflow

Let's extend our Python journey into deep networks to one of the standard neural network packages.

First, installation:

## Installing tensorflow

You can follow the many examples given here:

https://www.tensorflow.org/install/

To reduce the hassle, I would recommend a **CPU-installation**, since GPU-installs are notoriously difficult and may result in you spending several hours tweaking drivers and installation files. 

I actually had good success with installing the pip-version on my computer, so I did:

`pip3 install tensorflow` 

and it just rolled - after downloading for several minutes on a high-speed network. 

## tensorflow basics: the computational graph

In tensorflow, everything is structured as a computational graph, which is just a fancy word for a flow-chart.

A computational graph is a series of operations arranged into a graph with nodes. You can visualize this graph as a flow-chart with tools shipped in tensorflow as well, which makes for nice debugging and a different intuition about the computations that are going on in your code.

### tensors

As the name says, tensorflow works on tensors, which are mathematical entities that basically generalize matrices. In Python terms that means roughly that they are like multi-dimensional arrays - a matrix has two indices `m[i][j]`. This matrix is actually simply a tensor of Rank 2.

Hence, a tensor of Rank 3 would be `t[i][j][k]`. Things get a little tricky with summations and multiplications of tensors, but in principle tensors are basically multi-dimensionally-indexed "matrices".

### nodes

Each node in tensorflow takes zero or more tensors as input, and produces a tensor as an output.

The most basic node is a "constant" node that takes zero inputs (since it is constant) and produces a Rank 1 tensor as output (a number).

Let's create two of these very simple nodes:

In [None]:
nodeBoring1 = tf.constant(5.0)
nodeBoring2 = tf.constant(10.0)
print(nodeBoring1,nodeBoring2)

The important thing is that printing the nodes does not print their values. Instead a node is a structure in a computational graph that needs to be evaluated in order to produce output!

We evalulate the computational graph by running a session, like so:

In [None]:
sess = tf.Session()
print(sess.run([nodeBoring1,nodeBoring2]))

### simple computations with nodes

Let's multiply two nodes together:

In [None]:
nodeMult = tf.multiply(nodeBoring1,nodeBoring2)
print(nodeMult)
print(sess.run(nodeMult))

Ain't that awesome? We can use a multi-megabyte code-based to multiply two numbers in about 4 lines of code...

Ok, sarcasm off. Let's try to visualize the computational graph. Since we are using jupyter, we first have to teach it how to display the input from tensorboard, which is tensorflow's official visualization tool for these graphs.

The following code should allow us to use its visualization in jupyter:

In [None]:
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def

def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

In [None]:
show_graph(tf.get_default_graph().as_graph_def())

Awesome again! We have two constant input nodes, and apparently they are combined in a multiply node! <"Totally faints"> 

### adding variable nodes

Ok, enough with the constants. Let's use variable nodes in our graph, so that we can go forward with something interesting.

Let's try to do a simple linear regression.

For this, we need: 

* two variables $w$ and $b$ that hold the slope and the intercept of the line

* a placeholder $x$ that will hold our input data

* a model $ym=wx+b$ that combines everything together

* a loss function $l=\sum(y-ym)^2$ that evaluates how the predictions fit the actual data $y$

Here's the full code in tensorflow:

In [None]:
# components of the model
w = tf.Variable([.5])
b = tf.Variable([-.5])

# input data - we need to tell tensorflow the datatype!
x = tf.placeholder(tf.float32)

# actual data
y = tf.placeholder(tf.float32)


# linear model that produces predictions
ym = w*x + b

##### IMPORTANT
# we need to initialize variables before use!!!
init = tf.global_variables_initializer()
sess.run(init)
#####

# let's see the output of the model with some input data
print("linear predictions",sess.run(ym, {x:[0,1,2,3,4,5]}))

# now how good are we?
l = tf.reduce_sum(tf.square(ym-y))
print("loss =",sess.run(l, {x:[0,1,2,3,4,5], y:[0,0.3,0.6,0.9,1.2,1.5]}))

So the predictions are off, since the model parameters are, of course, not ideal. So let's change them using the `tf.assign` method, which changes already-initialized variables:

In [None]:
optw = tf.assign(w,[0.3])
optb = tf.assign(b,[0])
sess.run([optw,optb])
print("loss =",sess.run(l, {x:[0,1,2,3,4,5], y:[0,0.3,0.6,0.9,1.2,1.5]}))

## tensorflow basics: training something

Of course, we would like to get $w,b$ automatically from the input data and the actual data! For this, we need to use some sort of optimization scheme in tensorflow.

The most general (and simple) optimization scheme is gradient descent, so let's use this:

In [None]:
# choose the optimizer and the learning rate
optimizer = tf.train.GradientDescentOptimizer(0.01)
# determine the loss function to optimize
train = optimizer.minimize(l)
# this will return our variables to the initial state!!
sess.run(init)

for i in np.arange(1000):
    sess.run(train,{x:[0,1,2,3,4,5], y:[0,0.3,0.6,0.9,1.2,1.5]})

print("final parameters:",sess.run([w,b]))
print("final loss:",sess.run(l,{x:[0,1,2,3,4,5], y:[0,0.3,0.6,0.9,1.2,1.5]}))

That's better - after a few iterations, the optimizer has successfully converged and we get our optimal solutions.

The computational graph of our problem so far, however, now looks vastly more complicated due to the inclusion of the gradient descent optimizer:

In [None]:
show_graph(tf.get_default_graph().as_graph_def())