# TensorFlow Basics
TensorFlow is a Deep Learning library which works on the concept of computational graphs! Every operation done on a Tensor (n-d array) is represented using a node of the graph!

Let's learn it by an example. Say you want to perform following operation on a matrix `X` to get matrix `Y`:
$$Y = X\times2 + 3$$

Here,
* $\times$: represents element-wise multiplication and
* $+$: represents element-wise addition

TensorFlow behind the scenes represent this operation inform of a computational graph as this:

![Computational Graph](../images/computational-graph.png)

Let's code this out:

In [1]:
import tensorflow as tf

X = tf.Variable([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]) # make a variable
const2 = tf.constant(2.)  # make a constant
const3 = tf.constant(3.)  # make other constat

operation1 = X * const2 # elementwise multiplication operation
y = operation1 + const3 # result

Let's check the result:

In [2]:
y

<tf.Tensor 'add:0' shape=(3, 3) dtype=float32>

What the...! Where's our result? TensorFlow works on what we call a static computational graph! This means that whenever we create a variable `X = tf.Variable(...)` or create a constant `const_x = tf.constant(...)` or any other operation, we are only building nodes of a graph and are **not executing them**.

To execute a complete graph we need to make a `tf.Session()`. Consider this session as an environment which makes the computational graph go active!

Let's try it out!

In [3]:
sess = tf.Session() # make a session object

sess.run(tf.global_variables_initializer()) # initialize variables
sess.run(y) # run all nodes that meet up till y

array([[ 5.,  7.,  9.],
       [11., 13., 15.],
       [17., 19., 21.]], dtype=float32)

But why are we running `tf.global_variables_initializer()`? Remember when we make a variable using `tf.Variable(...)`, we are telling the computational graph that session should initialize variable with data, i.e., we aren't initializing variable until and unless we tell session to do so!

In [4]:
sess.close()

This is all fine and hunky-dory but there are situations where we don't like to write these extra session lines. And what's with all those constant declartions? Can't we just use values directly?

Answer to all this is yes we can. To improvise such scenario's TensorFlow provides us with `tf.enable_eager_execution()`. Eager execution allows us to run the nodes of graph then and there in a more *pythonic* manner!

> Now remember that eager execution should be ran on startup! So for this sake let's restart our kernel and start again from next cell!

In [1]:
import tensorflow as tf
tf.enable_eager_execution()

# contrib call will change to tf.Variable soon
X = tf.contrib.eager.Variable([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]) # make a variable
y = X * 2. + 3.

y, y.numpy()

(<tf.Tensor: id=13, shape=(3, 3), dtype=float32, numpy=
 array([[ 5.,  7.,  9.],
        [11., 13., 15.],
        [17., 19., 21.]], dtype=float32)>, array([[ 5.,  7.,  9.],
        [11., 13., 15.],
        [17., 19., 21.]], dtype=float32))

Let's go through some advantages or disadvantages of the two computational graphs: static and dynamic


|                 |Static Computational Graph|Dynamic Computational Graph|
|:----------------|:-------------------------|:--------------------------|
|**Advantages**   |1. We only have one computational graph that is called over and over on each call|1. Easy to debug|
|                 |2. Each backward pass doesn't require creation of new graph|2. Flexible in controlling flow of program|
|                 |3. static graphs can optimize the graph up front while dynamic graph can not do it.|3. Easy to understand than static graphs. Especially for new comers who are familiar with python.|
|**Disadvantages**|1. Sometimes become difficult to debug (we need to run complete graph to find out a logical error at first step)|1. Each forward pass defines a new computational graph. This slows it down|
|                 |2. Flow of control requires extra care. Example: creating nodes inside a loop can lead to large number of nodes lying in the memory as old nodes aren't deleted!|2. Each backward pass creates new computational graph as well. This slows down training!|
|                 |3. Not so pythonic and difficult to understand for new comers|3. Static graphs can optimize up from while static graphs can not do it!|

Enough with the theory let's implement a linear regression model here. First we will be looking at dynamic computational graph example and then we will be looking at static computational graph example!

## Linear Regression - Dynamic Computational Graph

First let's create a synthetic dataset to work on! We will be generating a dataset that defines this line:

$$y = 2x + 3$$

but we will add some noise to the mix.

In [2]:
import numpy as np

X = np.random.randn(2000, 1)
y = X * 2 + 3 + np.random.randn(*X.shape)

In [3]:
X.shape, y.shape

((2000, 1), (2000, 1))

Let's define prediction function and loss function.

In [4]:
def predict(inputs, weight, bias):
    return inputs * weight + bias

def loss(inputs, weight, bias, outputs):
    error = predict(inputs, weight, bias) - outputs
    return tf.reduce_mean(tf.square(error))

Finally we will define our weight and bias - coefficient of X (slope of line) and bias (y intercept)!

In [5]:
w = tf.contrib.eager.Variable(tf.random_normal([1]))
b = tf.contrib.eager.Variable(tf.zeros([1]))

w, b

(<tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([-0.24387714], dtype=float32)>,
 <tf.Variable 'Variable:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>)

Let's train!!

You will notice that the below cell is ran multiple times! We run it again and again until we get to satisfactory loss value.

But what's with `GradientTape`? According to TensorFlow's official documentation we get a jargon like this:

> `tf.GradientTape` is an opt-in feature to provide maximal performance when not tracing. Since different operations can occur during each call, all forward-pass operations get recorded to a "tape". To compute the gradient, play the tape backwards and then discard. A particular `tf.GradientTape` can only compute one gradient; subsequent calls throw a runtime error.

Tl;dr - Use `tf.GradientTape` it to boost performance and get your gradients. Just call it once for every gradient call.

In [11]:
epochs = 200
lr = 0.001

for epoch in range(epochs):
    with tf.GradientTape() as tape:
        l = loss(X, w, b, y)
    dw, db = tape.gradient(l, [w, b])
    w.assign_sub(dw * lr)
    b.assign_sub(db * lr)
    if epoch % 20 == 0:
        print('Epoch: {}/{}, loss: {}'.format(epoch, epochs, loss(X, w, b, y)))

print("Final loss: {:.3f}".format(loss(X, w, b, y)))
print("W = {}, B = {}".format(w.numpy(), b.numpy()))

Epoch: 0/200, loss: 1.2570668458938599
Epoch: 20/200, loss: 1.2370402812957764
Epoch: 40/200, loss: 1.2185415029525757
Epoch: 60/200, loss: 1.2014540433883667
Epoch: 80/200, loss: 1.1856707334518433
Epoch: 100/200, loss: 1.1710915565490723
Epoch: 120/200, loss: 1.157624363899231
Epoch: 140/200, loss: 1.1451847553253174
Epoch: 160/200, loss: 1.1336944103240967
Epoch: 180/200, loss: 1.123080849647522
Final loss: 1.114
W = [1.799801], B = [2.7217016]


## Linear Regression - Static Computational Graph

We are currently executing graphs in dynamic mode. Let's change to static mode! To do we will need to restart kernel. After restarting kernel let's start building our graph.

In [1]:
import tensorflow as tf

In [2]:
import numpy as np

X = np.random.randn(2000, 1)
y = X * 2 + 3 + np.random.randn(*X.shape)

In [3]:
X.shape, y.shape

((2000, 1), (2000, 1))

In [4]:
def predict(inputs, weight, bias):
    return inputs * weight + bias

def loss(inputs, weight, bias, outputs):
    error = predict(inputs, weight, bias) - outputs
    return tf.reduce_mean(tf.square(error))

In [5]:
w = tf.Variable(tf.random_normal([1]))
b = tf.Variable(tf.zeros([1]))

w, b

(<tf.Variable 'Variable:0' shape=(1,) dtype=float32_ref>,
 <tf.Variable 'Variable_1:0' shape=(1,) dtype=float32_ref>)

Computational graphs require inputs nodes to interact with incoming data. This data is fed into the graph using special nodes called placeholders. These variables takes into account input shape and the data type of the input.

Our graph require inputs:
* X
* y
* learning rate

In [6]:
inputs = tf.placeholder(shape=X.shape, dtype=tf.float32) # input node for X
outputs = tf.placeholder(shape=y.shape, dtype=tf.float32) # input node for y
lr = tf.placeholder(shape=[1], dtype=tf.float32) # input node for learning rate

## define loss graph.
# after calling this function we have a global graph that computes loss
# we can execute this global graph multiple times to get the loss depending on inputs and outputs
l = loss(inputs, w, b, outputs) 

## get gradients of loss function w.r.t w and b
dw, db = tf.gradients(l, [w, b])

## define update nodes for w and b
update_w = w.assign(w - lr * dw)
update_b = b.assign(b - lr * db)

## now define initialization node
init = tf.global_variables_initializer()

# define session and initialize variables
sess = tf.Session()
sess.run(init)

To feed values into the placeholders use `feed_dict` keyword argument of the `sess.run(...)` function.

`feed_dict` takes a dictionary with placeholders as keys and their corresponding inputs as values!

In [12]:
epochs = 200
learning_rate = np.array([0.001])

for epoch in range(epochs):
    sess.run([update_w, update_b], feed_dict={
        lr: learning_rate,
        inputs: X,
        outputs: y
    })
    if epoch % 20 == 0:
        print('Epoch: {}/{}, loss: {}'.format(
            epoch, epochs, sess.run(l, feed_dict={
                lr: learning_rate,
                inputs: X,
                outputs: y
            })))

w_final, b_final = sess.run([w, b])
print("Final loss: {:.3f}".format(sess.run(l, feed_dict={
    inputs: X,
    outputs: y
})))
print("W = {}, B = {}".format(w_final, b_final))

Epoch: 0/200, loss: 1.4100929498672485
Epoch: 20/200, loss: 1.3785699605941772
Epoch: 40/200, loss: 1.3495323657989502
Epoch: 60/200, loss: 1.322784662246704
Epoch: 80/200, loss: 1.2981462478637695
Epoch: 100/200, loss: 1.275450348854065
Epoch: 120/200, loss: 1.254543423652649
Epoch: 140/200, loss: 1.2352851629257202
Epoch: 160/200, loss: 1.2175456285476685
Epoch: 180/200, loss: 1.2012044191360474
Final loss: 1.187
W = [1.6865658], B = [2.750188]


In [13]:
sess.close()