**Up and running with TensorFlow**

_This notebook is based on Hands On chapter 9._

# Setup

First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:

In [1]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "tensorflow"

def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

# Creating and running a graph

The following code creates the graph represented in Fig 9-1 which we saw in today's slides:

In [2]:
import tensorflow as tf

# defined above
reset_graph()

# create the graph
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

Instructions for updating:
Colocations handled automatically by placer.


The most important thing to understand is that this code does
not actually perform any computation, even though it looks like it does (especially the last line). **It just creates a computation graph!**

In [3]:
# Let's look at the output
f

<tf.Tensor 'add_1:0' shape=() dtype=int32>

To evaluate this graph, you need to open a TensorFlow _session_ and use it
to initialize the variables and evaluate _f_. 

A TensorFlow _session_ takes care of placing the operations onto devices such as CPUs and GPUs and running them, and it holds
all the variable values. 

The following code creates a session, initializes the variables,
and evaluates _f_, then closes the session (which frees up resources):

In [4]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)

42


In [5]:
sess.close()

Having to repeat sess.run() all the time is a bit cumbersome, but fortunately there is
a better way:

In [6]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

In [7]:
result

42

Inside the with block, the session is set as the default session. 

Calling x.initializer.run() is equivalent to calling f.get_default_session().run(x.initializer), and similarly
f.eval() is equivalent to calling tf.get_default_session().run(f). 
This makes the code easier to read. Moreover,
the session is automatically closed at the end of the block.

Instead of manually running the initializer for every single variable, you can use the
global_variables_initializer() function. Note that it does not actually perform
the initialization immediately, but rather creates a node in the graph that will initialize
all variables when it is run:

In [8]:
init = tf.global_variables_initializer() # prepare an init node

with tf.Session() as sess:
    init.run() # actually initialize all the variables
    result = f.eval()

In [9]:
result

42

Inside Jupyter or within a Python shell you may prefer to create an InteractiveSession . The only difference from a regular Session is that when an InteractiveSession is created it automatically sets itself as the default session, so you don’t need a with block (but you do need to close the session manually when you are done with it):

In [10]:
init = tf.global_variables_initializer()

In [11]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)

42


In [12]:
sess.close()

In [13]:
result

42

Now you have seen mulitple ways to start up a TF session, which all provide the same result in the end. Now let's take a further look at how we can use TF to do some more complicated stuff.

# Managing graphs

Let's take a look at how TF utilizes the computational graphs that we create for our problems.

Any node you create is automatically added to the default graph:

In [14]:
reset_graph()

x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

In most cases this is fine, but sometimes you may want to manage multiple independent graphs. You can do this by creating a new Graph and temporarily making it the default graph inside a with block, like so:

In [15]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    # check if x2 is the default graph
    print(x2.graph is tf.get_default_graph()) 
    

True


In [16]:
x2.graph is tf.get_default_graph() # check if x2 is the default graph

False

## Lifecycle of a Node Value

When you evaluate a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first. For example, consider the following code:

In [17]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())  # 10
    print(z.eval())  # 15

10
15


First, this code defines a very simple graph. Then it starts a session and runs the graph to evaluate y: TensorFlow automatically detects that y depends on w, which depends on x, so it first evaluates w, then x, then y, and returns the value of y. Finally, the code runs the graph to evaluate z. Once again, TensorFlow detects that it must
first evaluate w and x. It is important to note that it will not reuse the result of the previous evaluation of w and x . In short, the preceding code evaluates w and x twice.

**All node values are dropped between graph runs**, except variable values, which are maintained by the session across graph runs. **A variable starts its life when its initializer is run,
and it ends when the session is closed.**

If you want to evaluate y and z efficiently, without evaluating w and x twice as in the previous code, you must ask TensorFlow to evaluate both y and z in just one graph run, as shown in the following code:

In [18]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)  # 10
    print(z_val)  # 15

10
15


# Linear Regression

TensorFlow operations (also called ops for short) can take any number of inputs and produce any number of outputs. The inputs and outputs are multidimensional arrays, called **tensors** (hence the name “tensor flow”).

Let's look at an example to illustrate that we can perform computations on arrays of any shape. The following code manipulates 2D arrays to perform Linear Regression on the California housing dataset (introduced in Chapter 2).

## Using the Normal Equation

### Using TF

In [19]:
import numpy as np
from sklearn.datasets import fetch_california_housing

reset_graph()

# fetch housing data and add extra bias input feature
housing = fetch_california_housing() 
m, n = housing.data.shape 
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

#  create TF constant nodes to hold the data and targets
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)

# Calc theta using the nNrmal Equation
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

# Evaluate theta
with tf.Session() as sess:
    theta_value = theta.eval()

In [20]:
theta_value

array([[-3.6786983e+01],
       [ 4.3703085e-01],
       [ 9.4685089e-03],
       [-1.0764918e-01],
       [ 6.4608419e-01],
       [-3.8850994e-06],
       [-3.7904985e-03],
       [-4.1972896e-01],
       [-4.3274990e-01]], dtype=float32)

### Compare with pure NumPy

In [21]:
# Set data and targets
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)

# Apply the normal equation
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


### Compare with Scikit-Learn

In [22]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()

# use the data and targets to fit a linear regression model
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

# use the model to get theta 
print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


**Question:** How does the scikit-learn code compare with the normal equation used in the other two coding examples? 

**Question:** Do scikit-learn, numpy, and TF all provide the same result for theta? 

## Using Batch Gradient Descent

Let's try using Batch Gradient Descent instead of the Normal Equation.

Gradient Descent requires scaling the feature vectors first. We could do this using TF, but let's just use Scikit-Learn for now.

In [23]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [24]:
print(scaled_housing_data_plus_bias.mean(axis=0))
print(scaled_housing_data_plus_bias.mean(axis=1))
print(scaled_housing_data_plus_bias.mean())
print(scaled_housing_data_plus_bias.shape)

[ 1.00000000e+00  6.60969987e-17  5.50808322e-18  6.60969987e-17
 -1.06030602e-16 -1.10161664e-17  3.44255201e-18 -1.07958431e-15
 -8.52651283e-15]
[ 0.38915536  0.36424355  0.5116157  ... -0.06612179 -0.06360587
  0.01359031]
0.11111111111111005
(20640, 9)


### Manually computing the gradients

In [25]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

# setup data and targets
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

# create a node in the graph with random values for theta to start
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")

# predict the target
y_pred = tf.matmul(X, theta, name="predictions")

# calc error, mse, and gradient
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)

# create a node that will assign a new value to a variable (the Batch GD step)
training_op = tf.assign(theta, theta - learning_rate * gradients)

# TF setup
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # run training step n_epochs times
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.71450055
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


**Question:** What kind of trend to we see happening with the MSE during training? What does this mean about how our model is performing?

In [26]:
# Print out what our best theta was during our training
best_theta

array([[ 2.0685523 ],
       [ 0.8874027 ],
       [ 0.14401656],
       [-0.34770885],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.66145283],
       [-0.6375278 ]], dtype=float32)

### Using autodiff

TF's autodiff feature can automatically and
efficiently compute the gradients for you.
This comes in handy when you complicated functions,
such as deep neural networks (which we will see next time).

The gradients() function takes an op (in this case mse ) and a list of variables (in this
case just theta ), and it creates a list of ops (one per variable) to compute the gradi‐
ents of the op with regards to each variable. So the gradients node will compute the
gradient vector of the MSE with regards to theta.

Same as above except for the `gradients = ...` line:

In [27]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [28]:
gradients = tf.gradients(mse, [theta])[0]

In [29]:
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
Best theta:
[[ 2.0685525 ]
 [ 0.8874027 ]
 [ 0.14401658]
 [-0.34770882]
 [ 0.36178368]
 [ 0.00393811]
 [-0.04269556]
 [-0.6614528 ]
 [-0.6375277 ]]


How could you find the partial derivatives of the following function with regards to `a` and `b`?

In [30]:
def my_func(a, b):
    z = 0
    for i in range(100):
        z = a * np.cos(z + i) + z * np.sin(b - i)
    return z

In [31]:
my_func(0.2, 0.3)

-0.21253923284754914

In [32]:
reset_graph()

a = tf.Variable(0.2, name="a")
b = tf.Variable(0.3, name="b")
z = tf.constant(0.0, name="z0")
for i in range(100):
    z = a * tf.cos(z + i) + z * tf.sin(b - i)

grads = tf.gradients(z, [a, b])
init = tf.global_variables_initializer()

Let's compute the function at $a=0.2$ and $b=0.3$, and the partial derivatives at that point with regards to $a$ and with regards to $b$:

In [33]:
with tf.Session() as sess:
    init.run()
    print(z.eval())
    print(sess.run(grads))

-0.21253741
[-1.1388494, 0.19671395]


### Using a `GradientDescentOptimizer`

TensorFlow computes the gradients for you. But it gets even easier: it also provides
a number of optimizers out of the box, including a Gradient Descent optimizer. You
can simply replace the preceding gradients = ... and training_op = ... lines
with the following code, and once again everything will just work fine:

In [34]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [35]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

In [36]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
Best theta:
[[ 2.0685525 ]
 [ 0.8874027 ]
 [ 0.14401658]
 [-0.34770882]
 [ 0.36178368]
 [ 0.00393811]
 [-0.04269556]
 [-0.6614528 ]
 [-0.6375277 ]]


### Using a momentum optimizer

If you want to use a different type of optimizer, you just need to change one line. For
example, you can use a momentum optimizer:

In [37]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [38]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)

In [39]:
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [40]:
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.53056407
Epoch 200 MSE = 0.52501124
Epoch 300 MSE = 0.52441067
Epoch 400 MSE = 0.52433294
Epoch 500 MSE = 0.5243226
Epoch 600 MSE = 0.5243211
Epoch 700 MSE = 0.524321
Epoch 800 MSE = 0.524321
Epoch 900 MSE = 0.524321
Best theta:
[[ 2.068558  ]
 [ 0.8296286 ]
 [ 0.11875337]
 [-0.26554456]
 [ 0.3057109 ]
 [-0.00450251]
 [-0.03932662]
 [-0.89986444]
 [-0.87052065]]


**Question:** Which optimizer coverges faster, Gradient Descent or Momentum? How can you tell?

# Feeding data to the training algorithm

Let’s try to modify the previous code to implement Mini-batch Gradient Descent. For
this, we need a way to replace X and y at every iteration with the next mini-batch. The
simplest way to do this is to use _placeholder nodes_.

These nodes are special because
they don’t actually perform any computation, they just output the data you tell them
to output at runtime. They are typically used to pass the training data to TensorFlow
during training. If you don’t specify a value at runtime for a placeholder, you get an
exception.

To create a placeholder node, you must call the placeholder() function and specify
the output tensor’s data type. Optionally, you can also specify its shape, if you want to enforce it.

## Placeholder nodes

In [41]:
reset_graph()

# A must have three cols, but any number of rows:
#    Shape with None in a dimension means "any size"
A = tf.placeholder(tf.float32, shape=(None, 3)) 
B = A + 5

# evaluate B
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)

[[6. 7. 8.]]


In [42]:
print(B_val_2)

[[ 9. 10. 11.]
 [12. 13. 14.]]


**Question** Why is it necessary to use feed_dict to pass a value for A when we evaluate B?

## Mini-batch Gradient Descent

In [43]:
n_epochs = 1000
learning_rate = 0.01

In [44]:
reset_graph()

# We have to change X and y to be placeholders instad of constants like before.
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

In [45]:
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [46]:
n_epochs = 10

In [47]:
# We have to define the batch size and compute the total number of batches.
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [48]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

In [49]:
best_theta

array([[ 2.0703337 ],
       [ 0.8637145 ],
       [ 0.12255151],
       [-0.31211874],
       [ 0.38510373],
       [ 0.00434168],
       [-0.01232954],
       [-0.83376896],
       [-0.8030471 ]], dtype=float32)

**Question:** What does the fetch_batch function do? 

**Question:** How does the sess.run line change when we are using the placeholders for our data instead of constants? 

# Saving and restoring a model

Once you have trained your model, you should save its parameters to disk so you can come back to it whenever you want, use it in another program, compare it to other models, and so on. Moreover, you probably want to save checkpoints at regular intervals during training so that if your computer crashes during training you can continue from the last checkpoint rather than start over from scratch.


TensorFlow makes saving and restoring a model very easy. Just create a Saver node at the end of the construction phase (after all variable nodes are created); then, in the execution phase, just call its save() method whenever you want to save the model, passing it the session and path of the checkpoint file:

In [50]:
#---------------------------
# BEFORE RUNNING THIS, YOU WILL NEED TO CHANGE PATH!!!!!
# you might need to make a new tmp folder in your user if you haven't yet
temp_path = "/home/YOURUSERHERE/tmp/"
#---------------------------


reset_graph()

n_epochs = 1000                                                                       # not shown in the book
learning_rate = 0.01                                                                  # not shown

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            # not shown
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            # not shown
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                      # not shown
error = y_pred - y                                                                    # not shown
mse = tf.reduce_mean(tf.square(error), name="mse")                                    # not shown
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            # not shown
training_op = optimizer.minimize(mse)                                                 # not shown

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, temp_path + "my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, temp_path + "my_model_final.ckpt")

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.55557173
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962904
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473


In [51]:
best_theta

array([[ 2.0685525 ],
       [ 0.8874027 ],
       [ 0.14401658],
       [-0.34770882],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.6614528 ],
       [-0.6375277 ]], dtype=float32)

In [52]:
#Store the model
with tf.Session() as sess:
    saver.restore(sess, temp_path + "my_model_final.ckpt")
    best_theta_restored = theta.eval() 

Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /home/jblankenburg/tmp/my_model_final.ckpt


In [53]:
np.allclose(best_theta, best_theta_restored)

True

If you want to have a saver that loads and restores `theta` with a different name, such as `"weights"`:

In [54]:
saver = tf.train.Saver({"weights": theta})

By default the saver also saves the graph structure itself in a second file with the extension `.meta`. You can use the function `tf.train.import_meta_graph()` to restore the graph structure. This function loads the graph into the default graph and returns a `Saver` that can then be used to restore the graph state (i.e., the variable values):

In [55]:
reset_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph(temp_path + "my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") # not shown in the book

with tf.Session() as sess:
    saver.restore(sess, temp_path + "my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval() # not shown in the book

INFO:tensorflow:Restoring parameters from /home/jblankenburg/tmp/my_model_final.ckpt


In [56]:
np.allclose(best_theta, best_theta_restored)

True

This means that you can import a pretrained model without having to have the corresponding Python code to build the graph. This is very handy when you keep tweaking and saving your model: you can load a previously saved model without having to search for the version of the code that built it.

# Visualizing the graph

So far, we are still relying on the print() function to visualize progress during training. There is a better way: enter TensorBoard. If
you feed it some training stats, it will display nice interactive visualizations of these stats in your web browser (e.g., learning curves). You can also provide it the graph’s definition and it will give you a great interface to browse through it. This is very useful to identify errors in the graph, to find bottlenecks, and so on.


## inside Jupyter

To visualize the graph within Jupyter, we will use a TensorBoard server available online at https://tensorboard.appspot.com/ (so this will not work if you do not have Internet access).  As far as I can tell, this code was originally written by Alex Mordvintsev in his [DeepDream tutorial](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb). Alternatively, you could use a tool like [tfgraphviz](https://github.com/akimach/tfgraphviz).

In [57]:
from tensorflow_graph_in_jupyter import show_graph

In [58]:
show_graph(tf.get_default_graph())

## Using TensorBoard

The first step is to tweak your program a bit so it writes the graph definition and some training stats—for example, the training error (MSE)—to a log directory that TensorBoard will read from. 

**You need to use a different log directory every time you
run your program, or else TensorBoard will merge stats from different runs, which will mess up the visualizations.** The simplest solution for this is to include a timestamp in the log directory name. Add the following code at the beginning of the program:

In [59]:
reset_graph()

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

In [60]:
# same as before
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

Next, we add the following code at the very end of the construction phase:

The first line creates a node in the graph that will evaluate the MSE value and write it
to a TensorBoard-compatible binary log string called a summary. The second line creates a FileWriter that you will use to write summaries to logfiles in the log directory.

In [61]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [62]:
# same as before
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

Next you need to update the execution phase to evaluate the mse_summary node regularly during training (e.g., every 10 mini-batches). This will output a summary that you can then write to the events file using the file_writer. Here is the updated code:

In [63]:
with tf.Session() as sess:                                                        # not shown in the book
    sess.run(init)                                                                # not shown

    for epoch in range(n_epochs):                                                 # not shown
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()                                                     # not shown

Finally, you want to close the FileWriter at the end of the program:

In [64]:
file_writer.close()

In [65]:
best_theta

array([[ 2.0703337 ],
       [ 0.8637145 ],
       [ 0.12255151],
       [-0.31211874],
       [ 0.38510373],
       [ 0.00434168],
       [-0.01232954],
       [-0.83376896],
       [-0.8030471 ]], dtype=float32)

**Question:** Open another shell and login to banyan normally (no -L) and go to your working directory. Type **ls -l tf_logs/run*** to list the contents of the log directory. Run the above code a few more times and type the command again. What is listed in your directory?

### Using the TensorBoard Server

Now it’s time to fire up the TensorBoard server. We will use the same shell that you opened for the previous Question. 

Note that this part of the code might not work correctly, due to the port forwarding and such we are using. We will probably be debugging this connection in class, so let us know if you are having trouble here!

You need to activate your virtualenv environment, then start the server by running the tensor board command, pointing it to the root log directory. This starts the TensorBoard web server, listening on port 6006 (which is “goog” written upside down):

Commands to run:
   * source env/bin/activate
   * tensorboard --logdir tf_logs/


# Name scopes

When dealing with more complex models such as neural networks, the graph can easily become cluttered with thousands of nodes. To avoid this, you can create name scopes to group related nodes. For example, let’s modify the previous code to define the error and mse ops within a name scope called "loss" :

In [66]:
with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

In [67]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [68]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

file_writer.flush()
file_writer.close()
print("Best theta:")
print(best_theta)

Best theta:
[[ 2.0703337 ]
 [ 0.8637145 ]
 [ 0.12255151]
 [-0.31211874]
 [ 0.38510373]
 [ 0.00434168]
 [-0.01232954]
 [-0.83376896]
 [-0.8030471 ]]


The name of each op defined within the scope is now prefixed with "loss/":

In [90]:
print(error.op.name)

loss/sub


In [91]:
print(mse.op.name)

loss/mse


In [71]:
reset_graph()

a1 = tf.Variable(0, name="a")      # name == "a"
a2 = tf.Variable(0, name="a")      # name == "a_1"

with tf.name_scope("param"):       # name == "param"
    a3 = tf.Variable(0, name="a")  # name == "param/a"

with tf.name_scope("param"):       # name == "param_1"
    a4 = tf.Variable(0, name="a")  # name == "param_1/a"

for node in (a1, a2, a3, a4):
    print(node.op.name)

a
a_1
param/a
param_1/a


**Question** What happens if you re-define a variable in the same scope with a different value?

# Modularity

Suppose you want to create a graph that adds the output of two rectified linear units
(ReLU). A ReLU computes a linear function of the inputs, and outputs the result if it
is positive, and 0 otherwise. Let's see how to do this:

An ugly flat code:

In [72]:
reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z1, 0., name="relu2")  # Oops, cut&paste error! Did you spot it?

output = tf.add(relu1, relu2, name="output")

**Question:** Fix the line with the cut&paste error and run the above cell again. What did you change?

Such repetitive code is hard to maintain and error-prone (as we just saw). We can make this much easier bu using a function to build the ReLUs:

In [73]:
reset_graph()

def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

In [74]:
file_writer = tf.summary.FileWriter("logs/relu1", tf.get_default_graph())

# Use this file to visualize the graph in TensorBoard!

Note that when you create a node, TensorFlow checks whether its name already exists, and if it does it appends an underscore followed by an index to make the name unique.

Using name scopes, you can make the graph much clearer. Simply move all the content of the relu() function inside a name scope.

In [75]:
reset_graph()

def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")    # not shown
        b = tf.Variable(0.0, name="bias")                             # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                      # not shown
        return tf.maximum(z, 0., name="max")                          # not shown

In [76]:
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

file_writer = tf.summary.FileWriter("logs/relu2", tf.get_default_graph())
file_writer.close()

# Use this file to visualize the graph in TensorBoard!

## Sharing Variables

Sharing a `threshold` variable the classic way, by defining it outside of the `relu()` function then passing it as a parameter:

In [77]:
reset_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

This works fine: now you can control the threshold for all ReLUs using the threshold
variable. However, if there are many shared parameters such as this one, it will be
painful to have to pass them around as parameters all the time.

One option is to set the shared variable as an attribute of the relu() function upon the first call, like so:

In [78]:
reset_graph()

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, relu.threshold, name="max")

In [79]:
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

TensorFlow offers another option, which may lead to slightly cleaner and more modular code than the previous solutions. 5 This solution is a bit tricky to understand at
first, but since it is used a lot in TensorFlow it is worth going into a bit of detail. The
idea is to use the get_variable() function to create the shared variable if it does not
exist yet, or reuse it if it already exists. The desired behavior (creating or reusing) is
controlled by an attribute of the current variable_scope() . For example, the following code will create a variable named "relu/threshold" (as a scalar, since shape=() ,
and using 0.0 as the initial value):

In [80]:
reset_graph()

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))

Note that if the variable has already been created by an earlier call to get_variable() , this code will raise an exception. This behavior prevents reusing variables by
mistake. If you want to reuse a variable, you need to explicitly say so by setting the
variable scope’s reuse attribute to True (in which case you don’t have to specify the
shape or the initializer):

In [81]:
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

This code will fetch the existing "relu/threshold" variable, or raise an exception if it
does not exist or if it was not created using get_variable() . Alternatively, you can
set the reuse attribute to True inside the block by calling the scope’s reuse_variables() method:

In [82]:
with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

Now you have all the pieces you need to make the relu() function access the threshold variable without having to pass it as a parameter:

In [83]:
reset_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

In [84]:
file_writer = tf.summary.FileWriter("logs/relu6", tf.get_default_graph())
file_writer.close()

# Use this file to visualize the graph in TensorBoard!

This code first defines the relu() function, then creates the relu/threshold variable
(as a scalar that will later be initialized to 0.0 ) and builds five ReLUs by calling the
relu() function. The relu() function reuses the relu/threshold variable, and creates the other ReLU nodes.

In [85]:
reset_graph()

def relu(X):
    with tf.variable_scope("relu"):
        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("", default_name="") as scope:
    first_relu = relu(X)     # create the shared variable
    scope.reuse_variables()  # then reuse it
    relus = [first_relu] + [relu(X) for i in range(4)]
output = tf.add_n(relus, name="output")

file_writer = tf.summary.FileWriter("logs/relu8", tf.get_default_graph())
file_writer.close()

# Use this file to visualize the graph in TensorBoard!

It is somewhat unfortunate that the threshold variable must be defined outside the
relu() function, where all the rest of the ReLU code resides. To fix this, the following
code creates the threshold variable within the relu() function upon the first call,
then reuses it in subsequent calls. Now the relu() function does not have to worry
about name scopes or variable sharing: it just calls get_variable() , which will create or reuse the threshold variable (it does not need to know which is the case). The rest
of the code calls relu() five times, making sure to set reuse=False on the first call,
and reuse=True for the other calls.

In [86]:
reset_graph()

def relu(X):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
    w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
    w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
    b = tf.Variable(0.0, name="bias")                           # not shown
    z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
    return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = []
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")

In [87]:
file_writer = tf.summary.FileWriter("logs/relu9", tf.get_default_graph())
file_writer.close()

# Use this file to visualize the graph in TensorBoard!

**Question:** How does the last graph differ from the one right before it?

# Extra material

If you are finished with the above stuff and there is still time left in class, look through the below code and try to figure out what is going on. 

In [88]:
reset_graph()

with tf.variable_scope("my_scope"):
    x0 = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    x1 = tf.Variable(0., name="x")
    x2 = tf.Variable(0., name="x")

with tf.variable_scope("my_scope", reuse=True):
    x3 = tf.get_variable("x")
    x4 = tf.Variable(0., name="x")

with tf.variable_scope("", default_name="", reuse=True):
    x5 = tf.get_variable("my_scope/x")

print("x0:", x0.op.name)
print("x1:", x1.op.name)
print("x2:", x2.op.name)
print("x3:", x3.op.name)
print("x4:", x4.op.name)
print("x5:", x5.op.name)
print(x0 is x3 and x3 is x5)

x0: my_scope/x
x1: my_scope/x_1
x2: my_scope/x_2
x3: my_scope/x
x4: my_scope_1/x
x5: my_scope/x
True


The first `variable_scope()` block first creates the shared variable `x0`, named `my_scope/x`. For all operations other than shared variables (including non-shared variables), the variable scope acts like a regular name scope, which is why the two variables `x1` and `x2` have a name with a prefix `my_scope/`. Note however that TensorFlow makes their names unique by adding an index: `my_scope/x_1` and `my_scope/x_2`.

The second `variable_scope()` block reuses the shared variables in scope `my_scope`, which is why `x0 is x3`. Once again, for all operations other than shared variables it acts as a named scope, and since it's a separate block from the first one, the name of the scope is made unique by TensorFlow (`my_scope_1`) and thus the variable `x4` is named `my_scope_1/x`.

The third block shows another way to get a handle on the shared variable `my_scope/x` by creating a `variable_scope()` at the root scope (whose name is an empty string), then calling `get_variable()` with the full name of the shared variable (i.e. `"my_scope/x"`).

## Strings

In [89]:
reset_graph()

text = np.array("Do you want some café?".split())
text_tensor = tf.constant(text)

with tf.Session() as sess:
    print(text_tensor.eval())

[b'Do' b'you' b'want' b'some' b'caf\xc3\xa9?']
