# Chapter 9: Running TensorFlow

This jupyter notebook contains samples codes on the book. I personally retype them here to make me familiar with coding using TensorFlow.

## Create a graph and execute it in a session.

In TensorFlow, the definition and execution of computing is separated. There are mainly three steps to use a computing graph. First you need to define the computation process, then perform initialization, and finally execute the process.

In [3]:
import tensorflow as tf

# define a graph
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

# create a session
sess = tf.Session()
# initialization
sess.run(x.initializer)
sess.run(y.initializer)
# execute the graph
result =sess.run(f)
print(result)
sess.close()

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


42


<bound method BaseSession.close of <tensorflow.python.client.session.Session object at 0x11a915c88>>

You can use Python context manager to automatically create and close a session.

In [4]:
with tf.Session() as sess:
    x.initializer.run()    # same as sess.run(x.initializer)
    y.initializer.run()
    result = f.eval()
    print(result)

42


If you do not want to initialize the variables one at a time, you can use the global variables initializer to help you complete the initialization.

In [5]:
# use global initializer
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    print(result)

42


## Manage graphs

The nodes you create will be added to the default graph automatically.

In [6]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

If you want to manage multiple independent graphs, you can create a new graph, and set it as default temporally.

In [7]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    
print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


## Lifecycle of nodes

When you want to evaluate a node, TensorFlow will automatically detect the nodes that this node is relying on, and evaluates those nodes first.

In [8]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


In the above code, when it comes to ```y.eval()```, TensorFlow discovers that ```y``` depends on ```x```, and ```x``` depends on ```w```, so it computes ```w``` first, then ```x``` and next ```y```. Same thing happens when it comes to ```z.eval()```. **Note that TensorFlow will not store previously calculated values**, which means here ```x``` and ```w``` will be computed **twice**.

## Linear Regression in TensorFlow

To perform Linear Regression with TensorFlow, all you need to do is to define how to compute the pseudo-inverse and TensorFlow will do the rest.

In [10]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()
    print(theta_value)

[[-3.7225266e+01]
 [ 4.3568176e-01]
 [ 9.3872147e-03]
 [-1.0598953e-01]
 [ 6.3939309e-01]
 [-4.1104349e-06]
 [-3.7780963e-03]
 [-4.2437303e-01]
 [-4.3785891e-01]]


## Gradient Descent in TensorFlow

Let's see how gradient descent algorithm works in TensorFlow. Remember in both scenarios, you need to scale the input feature vector to 0-1, otherwise the training process will be extremely slow.

### Calculate gradient manually

The code below is self-explained but several points need to be mentioned.

- Function ```random_uniform()``` creates a node in the computing graph, which will generate a Tensor. The function will fill the tensor according to given shape and value scope. This is quite similar to ```rand()``` of ```Numpy```, but a computing graph version of it.
- Function ```assign()``` creates a node for assigning variable. Here it realizes batch gradient descent.
- The main part will loop till it reaches the number of given training epochs and prints the current MSE (Mean Square Error) every 100 loops. This value should be decreasing over time.

In [13]:
n_epochs = 1001
learning_rate = 0.01

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias )

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

# Calculate gradients manually.
gradients = 2/m * tf.matmul(tf.transpose(X), error)

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE=", mse.eval())
        
        sess.run(training_op)
        
    best_theta = theta.eval()
    print(best_theta)

Epoch 0 MSE= 9.59326
Epoch 100 MSE= 4.928685
Epoch 200 MSE= 4.818237
Epoch 300 MSE= 4.810825
Epoch 400 MSE= 4.8085265
Epoch 500 MSE= 4.8070526
Epoch 600 MSE= 4.8059998
Epoch 700 MSE= 4.8052406
Epoch 800 MSE= 4.8046923
Epoch 900 MSE= 4.8042955
Epoch 1000 MSE= 4.8040075
[[ 0.29979324]
 [ 0.81933147]
 [ 0.12745057]
 [-0.22621013]
 [ 0.26444152]
 [-0.00120573]
 [-0.03984256]
 [-0.84612375]
 [-0.8146199 ]]


### Calculate gradient automatically

The better way to compute gradients is to use autodiff. Function ```gradients()``` receives two parameters, respectively the target function and a list of variables. Autodiff will calculate the gradient of target function to all variables.

In [14]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias )

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

# Calculate gradients automatically
gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE=", mse.eval())
        
        sess.run(training_op)
        
    best_theta = theta.eval()
    print(best_theta)

Epoch 0 MSE= 7.7393413
Epoch 100 MSE= 4.948269
Epoch 200 MSE= 4.898545
Epoch 300 MSE= 4.8719373
Epoch 400 MSE= 4.8528666
Epoch 500 MSE= 4.839103
Epoch 600 MSE= 4.8291645
Epoch 700 MSE= 4.8219857
Epoch 800 MSE= 4.8168015
Epoch 900 MSE= 4.813056
Epoch 1000 MSE= 4.810349
[[ 0.2556944 ]
 [ 0.7958407 ]
 [ 0.14514397]
 [-0.14049844]
 [ 0.17539024]
 [ 0.00553657]
 [-0.04083863]
 [-0.7392124 ]
 [-0.70297533]]


## Using Optimizers

You can use an optimizer to determine how to update gradients in a different way.

In [15]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias )

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

# Use optimizer to update theta
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE=", mse.eval())
        
        sess.run(training_op)
        
    best_theta = theta.eval()
    print(best_theta)

Epoch 0 MSE= 7.814382
Epoch 100 MSE= 5.0518966
Epoch 200 MSE= 4.9267826
Epoch 300 MSE= 4.890997
Epoch 400 MSE= 4.8677335
Epoch 500 MSE= 4.8509054
Epoch 600 MSE= 4.838587
Epoch 700 MSE= 4.829542
Epoch 800 MSE= 4.822883
Epoch 900 MSE= 4.81797
Epoch 1000 MSE= 4.8143315
[[ 0.2117393 ]
 [ 0.8445777 ]
 [ 0.15581304]
 [-0.22971392]
 [ 0.24790294]
 [ 0.00878442]
 [-0.04278013]
 [-0.615663  ]
 [-0.5849596 ]]


## Feeding algorithm with training data

Use placeholder to help feed data into the model. The number of samples is not fixed, but other dimensions must match the requirement. Here is one simple example.

In [17]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
    print("B_val_1:")
    print(B_val_1)
    print("B_val_2:")
    print(B_val_2)

B_val_1:
[[6. 7. 8.]]
B_val_2:
[[ 9. 10. 11.]
 [12. 13. 14.]]


## Saving and Restoring Models

Once the model is well trained, you can save the model to disk so that you can use it directly later. You may also want to save the checkpoint of the model so that you can resume the training process if accidentally stopped, rather than restart the training process.

To save or restore the checkpoint, just create a ```Saver``` node and call its ```save()``` or ```restore()``` method.
``` Python
[...]
saver = tf.train.Saver()
with tf.Session as sess:
    for epoch in range(n_epochs):
        [...]
        save_path = saver.save(sess, '/tmp/my_model.ckpt')
        
    save_path = saver.save(sess, '/tmp/my_final_model.ckpt')
    
    
with tf.Session() as sess:
    saver.restore(sess, '/tmp/my_final_model.ckpt')
    [...]
```

## Using TensorBoard to Visualize

Up till now we're using ```print``` to visualize the training process. In fact, TensorFlow has a built-in tool called TensorBoard to help visualize the training process of our model.

## Namespace

When your graph grows bigger and bigger, it will be helpful to group relevant nodes by creating namespace.

In [22]:
with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')
    
    
print(error.op.name)

loss_1/sub


## Modulization

To reuse existing code and avoid cut-and-paste error, you need to modulize your code into several code blocks.

In [None]:
def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0, name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

## Sharing Variables

Sometimes you may need to share a variables between different graphs, for instance, the threshold of all relu in different graph. Here is the new version of ```relu()``` function above.

In [None]:
def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        [...]
        return tf.maximum(z, threshold, name="max")
    
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))

relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")