# Up and Running with Tensorflow

"In this chapter, we will go through the basics of Tensorflow, from installation to creating, running, saving, and visualizing simple computational graphs."

In [1]:
import tensorflow as tf
tf.__version__

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  from ._conv import register_converters as _register_converters


'1.12.0'

In [3]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2
type(f)

tensorflow.python.framework.ops.Tensor

In [4]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
    print(result)

42


Instead of manually running the initializer for every single variable, you can use the `global_variables_initializer()` function. Note that it does not actually perform the initialization immediately, but rather creates a node in the graph that will initialize all variable when it is run:

In [6]:
init = tf.global_variables_initializer() # preparing an init node
with tf.Session() as sess:
    init.run() # initialize all variables
    result = f.eval()
    print(result)

42


### Managing Graphs

Any node you create is automatically added to the default graph.

In most cases this is fine, but sometimes you may want to manage multiple independent graphs. You can do this by creating a new `Graph` and temporarily making it the default graph inside a `with` block:

In [8]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

In [10]:
x2.graph is graph

True

In [11]:
x2.graph is tf.get_default_graph()

False

### Lifecycle of a Node Value

Look at this bit of code:

In [12]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

Notice that the *y* AND *z* nodes relies on *x* and *w*.
TF will compute the value of x, then, w, then y, but then will have to recompute x and w before computing z.

"If you want to evaluate `y` and `z` efficiently, without evaluating `w` and `x` twice as in the previous code, you must ask TensorFlow to evaluate `y` and `z` in just one graph run, as shown in the following code:"

In [13]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y,z])
    print(y_val, z_val)

10 15


Here, we will demonstrate how to do linear regression (simply using the Normal equation) with TF:

In [2]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)),housing.data]

In [None]:
x = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='x')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
xt = tf.transpose(x)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(xt,x)), xt), y)

In [16]:
with tf.Session() as sess:
    theta_value = theta.eval()
    print(theta_value)

[[-3.7185181e+01]
 [ 4.3633747e-01]
 [ 9.3952334e-03]
 [-1.0711310e-01]
 [ 6.4479220e-01]
 [-4.0338000e-06]
 [-3.7813708e-03]
 [-4.2348403e-01]
 [-4.3721911e-01]]


### Implementing Gradient Descent

"Let's try using Batch Gradient Descent instead of the Normal Equation."

#### Manually Computing the Gradients

In [3]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias)

In [22]:
n_epochs = 1000
learning_rate = 0.01


x = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='x')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0), name='theta')
y_pred = tf.matmul(x, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(x), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

sess = tf.InteractiveSession()
init.run()
for epoch in range(n_epochs):
    if epoch % 100 == 0:
        print('Epoch', epoch, 'MSE =', mse.eval())
    sess.run(training_op)

best_theta = theta.eval()
sess.close()



Epoch 0 MSE = 8.940121
Epoch 100 MSE = 5.073209
Epoch 200 MSE = 4.964173
Epoch 300 MSE = 4.9186454
Epoch 400 MSE = 4.886637
Epoch 500 MSE = 4.863559
Epoch 600 MSE = 4.846886
Epoch 700 MSE = 4.834839
Epoch 800 MSE = 4.826128
Epoch 900 MSE = 4.8198304


#### Using autodiff

"The preceding code works fine, but it requires mathematically deriving the gradients from the cost function (MSE). In the case of Linear Regression, it is reasonably easy, but if you had to do this with deep neural networks you would get quite a headache... Fortunately, TensorFlow's autodiff feature comes to the rescue: it can automatically and efficiently compute the gradients for you. Simply replace the `gradients = ...` line in the Gradient Descent code in the previous section with the following line:"

`gradients = tf.gradients(mse, [theta])[0]`

In [24]:
n_epochs = 1000
learning_rate = 0.01


x = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='x')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0), name='theta')
y_pred = tf.matmul(x, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

sess = tf.InteractiveSession()
init.run()
for epoch in range(n_epochs):
    if epoch % 100 == 0:
        print('Epoch', epoch, 'MSE =', mse.eval())
    sess.run(training_op)

best_theta = theta.eval()
sess.close()



Epoch 0 MSE = 10.676546
Epoch 100 MSE = 4.9156957
Epoch 200 MSE = 4.845221
Epoch 300 MSE = 4.8338785
Epoch 400 MSE = 4.826566
Epoch 500 MSE = 4.821135
Epoch 600 MSE = 4.8170424
Epoch 700 MSE = 4.813942
Epoch 800 MSE = 4.811581
Epoch 900 MSE = 4.809775


#### Using an Optimizer

Along with computing the gradient for you, "[tensorflow] also provides a number of optimizers out of the box... Simply replace the preceding `gradients = ...` and `training_op = ...` with:

In [26]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(mse)

### Feeding the Data to the Training Algorithm

"Let's try to modify the previous code to implement Mini-batch Gradient Descent. For this, we need a way to replace X and y at every iteration with the next mini-batch. The simplest way to do this is to use placeholder nodes. These nodes are special because they don't actually perform any computation, they just output the data you tell them to output at runtime."

"To create a placeholder node, you must call the `placeholder()` function and specify the output tensor's data type. Optionally, you can also specify its shape, if you want to enforce it."

In [28]:
A = tf.placeholder(tf.float32, shape=(None,3))
B = A + 5

sess = tf.InteractiveSession()
print( B.eval(feed_dict={A: [[1,2,3]]}) )
print( B.eval(feed_dict={A: [[4,5,6],[7,8,9]]}) )
sess.close()

[[6. 7. 8.]]
[[ 9. 10. 11.]
 [12. 13. 14.]]




Now when implementing mini-batches, we need x and y to be placeholders:

In [29]:
x = tf.placeholder(tf.float32, shape=(None, n+1), name='x')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

Computing the number of total batches we need:

In [30]:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

"Finally, in the execution phase, fetch the mini-batches one by one, then provide the value of x and y via the `feed_dict` parameter when evaluating a node that depends on either of them."

In [None]:
# Don't actually run this, it's just psuedo code and won't work

def fetch_batch(epoch, batch_index, batch_size):
    #[...] load the data from the disk
    return x_batch, y_batch

sess = tf.InteractiveSession()
sess.run(init)
for epoch in range(n_epochs):
    for batch_index in range(n_batches):
        x_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
        sess.run(training_op, feed_dict={x: x_batch, y: y_batch})
best_theta = theta.eval()

### Saving and Restoring Models

"TensorFlow makes saving and restoring a model very easy. Just create a `Saver` node at the end of the construction phase (after all the variable nodes are created); then, in the execution phase, just call its `save()` method whenever you want to save the model, passing it the session and path of the checkpoint file"

In [None]:
# During construction
saver = tf.train.Saver()

# Example execution phase:
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0: #checkpoint every 100 epochs
            save_path = saver.save(sess, '/tmp/my_model.ckpt')
        sess.run(training_op)
    best_theta = theta.eval()
    save_path = saver.save(sess, '/tmp/my_model_final.ckpt')

"Restoring a model is just as easy: you create a `Saver` at the end of the construction phase just like before, but then at the beginning of the execution phase, instead of initializing the variables using the `init` node, you call the `restore()` method of the `Saver` object:"

In [None]:
with tf.Session() as sess:
    saver.restore(sess, '/tmp/my_model_final.ckpt')
    # [...]

"By default, a `Saver` saves and restores all variables under their own name, but if you need more control, you can specigy which variables to save or restore, and what names to use:"

In [31]:
saver = tf.train.Saver({'weights': theta})

"By default, the `save()` method also saves the structure of the graph in a second file with the same name plus a .meta extension." If you want to restore your graph (and not just constructing it again), you can use `import_meta_graph`.

In [None]:
saver = tf.train.import_meta_graph('/tmp/my_model_final.ckpt.meta')

with tf.Session() as sess:
    saver.restore(sess, '/tmp/my_model_final.ckpt')
    # [...]

### Visualizing the Graph and Training Curves Using Tensorboard

Let's add a bit to the start of our code so we can log the mse into Tensorboard.

In [32]:
from datetime import datetime

now = datetime.utcnow().strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)

In [33]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

"Next, you need to update the execution phase to evaluate the `mse_summary` node regularly during training (e.g., every 10 mini-batches). This will output a summary that you can then write to the events file using the `file_writer`:"

In [None]:
# [...]
for batch_index in range(n_batches):
    x_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
    if batch_index % 10 == 0:
        summary_str = mse_summary.eval(feed_dict={x: x_batch, y: y_batch})
        step = epoch * n_batches + batch_index
        file_writer.add_summary(summary_str, step)
    sess.run(training_op, feed_dict={x: x_batch, y: y_batch})
# [...]

In [34]:
file_writer.close() # Make sure to close file writer at end of program

In [4]:
# Here's the full predictor now:
from datetime import datetime

now = datetime.utcnow().strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)



n_epochs = 1000
learning_rate = 0.01


x = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='x')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0), name='theta')
y_pred = tf.matmul(x, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = tf.gradients(mse, [theta])[0]
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
saver = tf.train.Saver()
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

init = tf.global_variables_initializer()

sess = tf.InteractiveSession()
init.run()
for epoch in range(n_epochs):
    if epoch % 100 == 0:
        saver.save(sess, '/logs/model.ckpt')
    if epoch % 50 == 0:
        summary_str = mse_summary.eval()
        step = epoch * n_epochs
        file_writer.add_summary(summary_str, step)
    sess.run(training_op)
saver.save(sess, '/logs/model_final.ckpt')
best_theta = theta.eval()
sess.close()

#### Name Scopes

"When dealing with more complex models such as neural networks, the graph can easily becme cluttered with thousands of nodes. To avoid this, you can create **name scopes** to group related nodes. For example, let's modify the previous code to define the `error` and `mse` ops within a name scope called `loss`:

In [5]:
with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')

In [6]:
error.op.name

'loss/sub'

### Modularity

"Suppose you want to create a graph that adds the output of two [relu] nodes... The following code does the job, but is quite repetitive:"

In [8]:
graph = tf.Graph()
with graph.as_default():
    n_features = 3
    X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')

    w1 = tf.Variable(tf.random_normal((n_features,1)), name='weights1')
    w2 = tf.Variable(tf.random_normal((n_features,1)), name='weights2')
    b1 = tf.Variable(0.0, name='bias1')
    b2 = tf.Variable(0.0, name='bias2')

    z1 = tf.add(tf.matmul(X,w1),b1, name='z1')
    z2 = tf.add(tf.matmul(X,w2),b2, name='z2')
    
    relu1 = tf.maximum(z1, 0.0, name='relu1')
    relu2 = tf.maximum(z2, 0.0, name='relu2')
    
    output = tf.add(relu1, relu2, name='output')

"Such repetitive code is hard to maintain and error-prone." Luckily, there is an easier solution. Just like how in normal python code you would create a function for something that you do multiple times, you can do the same in Tensorflow:

In [34]:
def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name='weights')
    b = tf.Variable(0.0, name='bias')
    z = tf.add(tf.matmul(X,w),b, name='z')
    return tf.maximum(z, 0.0, name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    X = np.random.uniform(size=(5, n_features))
    output.eval()

InvalidArgumentError: You must feed a value for placeholder tensor 'X_26' with dtype float and shape [?,3]
	 [[node X_26 (defined at <ipython-input-34-02bc566abfb7>:9)  = Placeholder[dtype=DT_FLOAT, shape=[?,3], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'X_26', defined at:
  File "C:\Users\Tim\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Tim\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "C:\Users\Tim\Anaconda3\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "C:\Users\Tim\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tornado\ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "C:\Users\Tim\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2802, in run_ast_nodes
    if self.run_code(code, result):
  File "C:\Users\Tim\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-34-02bc566abfb7>", line 9, in <module>
    X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1747, in placeholder
    return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 6252, in placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "C:\Users\Tim\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'X_26' with dtype float and shape [?,3]
	 [[node X_26 (defined at <ipython-input-34-02bc566abfb7>:9)  = Placeholder[dtype=DT_FLOAT, shape=[?,3], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


### Sharing Variables

"If you want to share a variable between various components of your graph, one simple option is to create it first then pass it as a parameter to the functions that need it."

In [35]:
def relu(X, threshold):
    # ...
    return tf.maximum(z, threshold, name='max')

"This works fine: now you can control the threshold for all ReLUs using the `threshold` variable. However, if there are many shared parameters such as this one, it will be painful to have to pass them around as parameters all the time. Many people create a Python dictionary containing all the variables in their model, and pass it around to every function. Others create a class for each module (e.g. a ReLU class using class variables to handle the shared parameter). Yet another option is to set the shared variables as an attribute of the `relu()` function upon the first call, like so:"

In [37]:
def relu(X):
    with tf.name_scope('relu'):
        if not hasattr(relu, 'threshold'):
            relu.threshold = tf.Variable(0.0, name='threshold')
        # [...]

"Tensorflow offers another option... This solution is a bit tricky to understand at first, but since it is used a lot in TensorFlow, it is worth going into a bit of detail. The idea is to use the `get_variable()` function to create the shared variable if it does not exist yet, or reuse it if it already exists. The desired behavior (creating or reusing) is controlled by an attribute of the current `variable_scope()`. For example, the following code will create a variable named `relu/threshold` (as a scalar, since shape=(), and using 0,0 as the initial value):

In [38]:
with tf.variable_scope('relu'):
    threshold = tf.get_variable('threshold', shape=(), initializer=tf.constant_initializer(0.0))

"Note that if the variable has already been created by an earlier call to the `get_variable()`, this code will rase an exception. This behavior prevents reusing variables by mistake. If you want to reuse a variable, you need to explicity say so by setting the variable scope's `reuse` attribute to `True` (in which case, you don't have to specify the shape or the initializer):"

In [39]:
with tf.variable_scope('relu', reuse=True):
    threshold = tf.get_variable('threshold')