<h1>Chapter 09. Up and Running with TensorFlow</h1>

**TensorFlow** is an open-source machine learning framework developed by Google Brain for building and training various types of machine learning models, including deep learning neural networks. It provides a comprehensive ecosystem of tools, libraries, and resources for tasks such as data preprocessing, model building, training, deployment, and inference. TensorFlow supports both CPU and GPU computations, allowing for efficient processing of large-scale datasets and complex models. It's widely used in research, industry, and academia for a diverse range of machine learning and artificial intelligence applications.

<h2>Creating and Running a first Graph</h2>

In TensorFlow, a graph represents the computation workflow of a machine learning model. It consists of nodes that represent operations (such as mathematical computations or data transformations) and edges that represent the flow of data between these operations. The graph defines the dependencies between operations, allowing TensorFlow to efficiently execute them in the correct order and optimize the computation for performance. Graphs can be constructed using TensorFlow's high-level APIs or by directly manipulating TensorFlow's computational graph using low-level operations. They serve as a foundational structure for defining, training, and executing machine learning models in TensorFlow.

Creating a Graph

In [1]:
# Import TensorFlow version 1 compatibility module as tf
import tensorflow.compat.v1 as tf

# Disable eager execution
tf.disable_eager_execution()

# Define TensorFlow variables with initial values
x = tf.Variable(initial_value=3, name="x")
y = tf.Variable(initial_value=4, name="y")

# Define computational graph
f = x * x * y + y + 2

To run the graph, a TensorFlow session must be opened and used to initialize variables and evaluate `f`. The following code creates a session, initializes variables, computes `f`, and closes the session (freeing resources).

In [2]:
sess = tf.Session()  # create session

sess.run(x.initializer)  # initialize x
sess.run(y.initializer)  # initialize y

result = sess.run(f)  # execute the computation f
result

2024-04-12 11:43:26.018988: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled


42

In [3]:
sess.close()

Rather than individually executing the initializer for each variable, it is more efficiant to utilize `global_variables_initializer()`. Repeatedly calling `sess.run` for initialization can become cumbersome; however, there exists a more streamlined approach: 

In [4]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()

result

42

`InteractiveSession()` in TensorFlow allows you to work interactively, seamlessly combining graph and execution without needing a `with` block. It's convenient for interactive environments like Jupyter notebooks, but remember to close the session manually to free up resources. 

In [5]:
sess = tf.InteractiveSession()

init.run()
result = f.eval()
result

42

In [6]:
sess.close()

<h2>Managing Graphs</h2>

In [7]:
# Reset the default TensorFlow graph to clear
# any previously defined operations or variables
tf.reset_default_graph()

x1 = tf.Variable(1)

x1.graph is tf.get_default_graph()

True

In [8]:
graph = tf.Graph()

with graph.as_default():
    x2 = tf.Variable(2)

x2.graph is graph

True

In [9]:
x2.graph is tf.get_default_graph()

False

<h2>Lifecycle of a Node Value</h2>

In [10]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


In [11]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


<h2>Linear Rergression with TensorFlow</h2>

The code below manipulates two-dimensional arrays to perform linear regression on a dataset containing California real estate prices. The code starts by retrieving a dataset. It then adds an additional input bias feature to all training samples using NumPy. The code then creates two TensorFlow constant nodes, `X` and `y`, to store this data and targets, and then uses a series of matrix operations provided by TensorFlow to determine theta. Such matrix functions are `transpose()`, `matmul()` and `matrix_inverse()` does not perform calculations immediately; instead, nodes are created in the graph that will be executed when the graph is run.

In [12]:
import numpy as np
from sklearn.datasets import fetch_california_housing

tf.reset_default_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

theta_value

array([[-3.7174683e+01],
       [ 4.3591911e-01],
       [ 9.3909204e-03],
       [-1.0637519e-01],
       [ 6.4145899e-01],
       [-4.1128196e-06],
       [-3.7799443e-03],
       [-4.2388692e-01],
       [-4.3728542e-01]], dtype=float32)

Compare with pure NumPy

In [13]:
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)

theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
theta_numpy

array([[-3.69419202e+01],
       [ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01]])

Compare with Scikit-Learn

In [14]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T]

array([[-3.69419202e+01],
       [ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01]])

<h2>Implementation of Gradient Descent</h2>

Gradient Descent requires scaling the feature vectors first. This can be achieved using the `StandardScaler` from Scikit-Learn. The `StandardScaler` scales features by removing the mean and scaling to unit variance, which helps improve the performance of Gradient Descent algorithms by ensuring that all features have a similar scale.

In [15]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [16]:
# Compute the mean value of each feature (column)
scaled_housing_data_plus_bias.mean(axis=0)

array([ 1.00000000e+00,  6.60969987e-17,  5.50808322e-18,  6.60969987e-17,
       -1.06030602e-16, -1.10161664e-17,  3.44255201e-18, -1.07958431e-15,
       -8.52651283e-15])

In [17]:
# Compute the mean value of each instance (row)
scaled_housing_data_plus_bias.mean(axis=1)

array([ 0.38915536,  0.36424355,  0.5116157 , ..., -0.06612179,
       -0.06360587,  0.01359031])

In [18]:
# Compute the overall mean value of all elements
scaled_housing_data_plus_bias.mean()

0.11111111111111005

In [19]:
scaled_housing_data_plus_bias.shape

(20640, 9)

<h3>Manually computing the Gradients</h3>

In [20]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

# random_uniform() creates a node in the graph that will generate a tensor
# containing random values with a given shape and range
theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)

y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2 / m * tf.matmul(tf.transpose(X), error)

# Create a node that will assign a new value to the variable
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # Loop repeatedly executes the learning step and
    # otputs the MSE each 100 iterations
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        sess.run(training_op)

    best_theta = theta.eval()

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322219371795654
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338220000267029
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293705463409424


In [21]:
best_theta

array([[ 2.06855226e+00],
       [ 7.74078071e-01],
       [ 1.31192386e-01],
       [-1.17845066e-01],
       [ 1.64778143e-01],
       [ 7.44081801e-04],
       [-3.91945131e-02],
       [-8.61356556e-01],
       [-8.23479712e-01]], dtype=float32)

<h3>Using autodiff</h3>

In [22]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

The `tf.gradients()` function takes the `mse` operation and `theta` variable list and creates a list of operations (one per variable) to compute the gradients of the operation with respect to each variable. Thus, the gradients node will compute the vector gradient of the MSE with respect to theta.

In [23]:
gradients = tf.gradients(ys=mse, xs=[theta])[0]

In [24]:
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322218775749207
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338219404220581
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293704867362976

Best theta:
[[ 2.06855249e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945094e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


<h3>Using a <code>GradientDescentOptimizer</code></h3>

The `GradientDescentOptimizer` is an optimization algorithm that iteratively adjusts the parameters (weights) of a model in the direction of the steepest descent of the loss function. It updates the parameters based on the gradients of the loss function with respect to the parameters, scaled by a learning rate. 

In [25]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

Replace `gradients = ...` and `training_op = ...` with the following code:

In [26]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)  # minimize the mean squared error

In [27]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322218775749207
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338219404220581
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293704867362976

Best theta:
[[ 2.06855249e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945094e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


<h3>Using a <code>MomentumOptimizer</code></h3>

The `MomentumOptimizer` is an optimization algorithm that enhances the basic gradient descent method by incorporating momentum. It accumulates a weighted average of past gradients and uses this information to update the parameters (weights) of a model. This helps to accelerate convergence, especially in the presence of high curvature or noisy gradients. The momentum parameter controls the influence of past gradients on the parameter updates.

In [28]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [29]:
# Set momentum parameter to 0.9, which controls the influence
# of past gradients on the parameter updates
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
training_op = optimizer.minimize(mse)

In [30]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.5273160338401794
Epoch 200 MSE = 0.5244147777557373
Epoch 300 MSE = 0.5243281722068787
Epoch 400 MSE = 0.5243218541145325
Epoch 500 MSE = 0.5243210792541504
Epoch 600 MSE = 0.5243210196495056
Epoch 700 MSE = 0.5243209600448608
Epoch 800 MSE = 0.5243209600448608
Epoch 900 MSE = 0.5243209600448608

Best theta:
[[ 2.068558  ]
 [ 0.8296167 ]
 [ 0.11875112]
 [-0.26552212]
 [ 0.30569226]
 [-0.00450316]
 [-0.03932616]
 [-0.8998917 ]
 [-0.87054664]]


<h2>Feeding Data to the Training Algorithm</h2>

The `tf.placeholder()` function in TensorFlow is used to create a placeholder tensor. Placeholder tensors are used as input points for feeding actual data into a TensorFlow computational graph during the execution phase. They act as empty nodes that will be filled with actual data when a session runs the computational graph. Placeholders allow for dynamic data input, making TensorFlow models more flexible and adaptable to different datasets and scenarios.

In [31]:
tf.reset_default_graph()

A = tf.placeholder(dtype=tf.float32, shape=(None, 3))  # None for any size measurement
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

In [32]:
print(B_val_1)

[[6. 7. 8.]]


In [33]:
print(B_val_2)

[[ 9. 10. 11.]
 [12. 13. 14.]]


<h3>Mini-batch Gradient Descent</h3>

In [34]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

# Set X and y as placeholders
X = tf.placeholder(dtype=tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(dtype=tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

# Define tha packet size and calculate the totla number of packets
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

Finally, at runtime, one extracts the mini-batches one by one and provides the value of X and y through the feed_dict parameter when evaluating a node that depends on either of them.

In [35]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)

    X_batch = scaled_housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]

    return X_batch, y_batch


with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

best_theta

array([[ 2.0714476 ],
       [ 0.8462012 ],
       [ 0.11558535],
       [-0.26835832],
       [ 0.32982782],
       [ 0.00608358],
       [ 0.07052915],
       [-0.87988573],
       [-0.8634251 ]], dtype=float32)

<h2>Saving and restoring a Model</h2>

`Saver()` in TensorFlow is a utility class used to save and restore TensorFlow models. It allows you to save the variables of a model to disk and later restore them during inference or further training.

In [36]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

theta = tf.Variable(
    tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name="theta"
)
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

# Create a Saver to save TensorFlow model variables
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
            # Specify the path where the model variables will be saved and save them
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)

    best_theta = theta.eval()

    # Specify the path where the final model variables will be saved and save them
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322218775749207
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338219404220581
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293704867362976


In [37]:
best_theta

array([[ 2.06855249e+00],
       [ 7.74078071e-01],
       [ 1.31192386e-01],
       [-1.17845066e-01],
       [ 1.64778143e-01],
       [ 7.44078017e-04],
       [-3.91945094e-02],
       [-8.61356676e-01],
       [-8.23479772e-01]], dtype=float32)

Restore model: you create a `Saver` object at the end of the build stage as before, but then call the `restore()` method of the `Saver` object at the beginning of the runtime stage instead of initializing variables using an initializing node:

In [38]:
with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval()

INFO:tensorflow:Restoring parameters from /tmp/my_model_final.ckpt


In [39]:
np.allclose(best_theta, best_theta_restored)

True

To utilizing a `saver` that loads and restores `theta` with a different identifier, such as 'weights':

In [40]:
saver = tf.train.Saver({"weights": theta})

By default, the `saver` also stores the graph structure itself in a secondary file with the extension `.meta`. The function `tf.train.import_meta_graph()` can be utilized to recover the graph structure. This function imports the graph into the default graph and provides a `Saver` object that can subsequently be used to restore the graph state (i.e., the variable values):

In [41]:
# Start with an empty graph
tf.reset_default_graph()

# Import the graph structure from the meta file
saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")

# Retrieve the variable named 'theta' from the imported graph
theta = tf.get_default_graph().get_tensor_by_name("theta:0")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # restore the graph's state
    best_theta_restored = theta.eval()

INFO:tensorflow:Restoring parameters from /tmp/my_model_final.ckpt


  return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,


This functionality enables the importation of a pretrained model without necessitating the corresponding Python code to construct the graph. This feature proves advantageous when continuously modifying and saving a model: a previously saved model can be loaded without the need to locate the version of the code that generated it.

<h2>Visualization of the Graph and Learning Curves using TensorBoard</h2>

TensorBoard is a great tool for visualizing TensorFlow graphs, training curves, and more. TensorFlow code typically writes various files to a log directory, and the TensorBoard server regularly reads these files to generate interactive visualizations. It can plot graphs, learning curves (i.e., how the loss evaluated on the training set or test set evolves as a function of the epoch number), profiling data to identify performance bottlenecks, and more. In short, it helps users keep track of everything. Here's the overall process:

`TensorFlow writes logs to ===> log directory ===> TensorBoard reads data and displays visualizations`

To visualize different graphs or learning curves for different training runs, it's important to organize the log files properly. Each graph or run should have its own log subdirectory. It's common to use a root log directory called `tf_logs` and create a subdirectory called `run-` followed by the current timestamp (or any other name preferred in your own code):

It's necessary to use a distinct log directory for each program run; otherwise, TensorBoard will combine statistics from various runs, resulting in compromised visualizations. To address this, it's recommended to incorporate a timestamp into the log directory name:

In [42]:
from datetime import datetime

now = datetime.now().strftime("%Y%m%d_%H%M%S")
log_dir = f"tf_logs/run-{now}/"

log_dir

'tf_logs/run-20240412_114442/'

A function can be created to generate a subdirectory path whenever it's needed:

In [43]:
def make_log_subdir(run_id=None):
    if run_id is None:
        run_id = datetime.now().strftime("%Y%m%d_%H%M%S")
    return f"tf_logs/run-{run_id}/"

`tf.summary.FileWriter()` is a utility class used to write summary data for TensorBoard visualization. It allows you to write various types of summary data, such as scalar values, histograms, and images, to event files that can be consumed by TensorBoard.

In [44]:
file_writer = tf.summary.FileWriter(logdir=log_dir, graph=tf.get_default_graph())

Now the root log directory contains one subdirectory:

In [45]:
import os

os.listdir("tf_logs")

['run-20240412_114442']

And this subdirectory contains only one log file (called `tfevents` file) for the graph:

In [46]:
os.listdir(log_dir)

['events.out.tfevents.1712915082.Ivans-MBP.fritz.box']

In order to ensure that the actual graph data is properly written to disk, it's necessary to call `flush()` or `close()` on the `FileWriter`. This action is crucial as the data may still be in the operating system's file cache.

In [47]:
file_writer.close()

In order to start TensorBoard, it must be initiated as a web server in a separate process. One approach is to execute the `tensorboard` command in a terminal window. Alternatively, the `%tensorboard` Jupyter extension can be utilized, which not only launches TensorBoard but also facilitates viewing its user interface directly within Jupyter.

In [48]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

The `%tensorboard` extension to start the TensorBoard server. Must point to the root log directory:

In [49]:
%tensorboard --logdir tf_logs/

To simplify this process, a save_graph() function can be created. This function automatically generates a new log subdirectory and saves the specified graph (by default tf.get_default_graph()) to this directory.

In [50]:
def save_graph(graph=None, run_id=None):
    if graph is None:
        graph = tf.get_default_graph()
    log_dir = make_log_subdir(run_id=run_id)
    file_writer = tf.summary.FileWriter(logdir=log_dir, graph=graph)
    file_writer.close()
    return log_dir


save_graph()

'tf_logs/run-20240412_114453/'

<h3>Visualizing Learning Curves</h3>

In [51]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

log_dir = make_log_subdir()

mse_summary = tf.summary.scalar("MSE", mse)
file_writer = tf.summary.FileWriter(logdir=log_dir, graph=tf.get_default_graph())

In [52]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(
                epoch=epoch, batch_index=batch_index, batch_size=batch_size
            )
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

file_writer.close()

In [53]:
best_theta

array([[ 1.1198488 ],
       [ 0.40280202],
       [ 0.5302975 ],
       [ 0.50325125],
       [-0.06576284],
       [ 0.24595782],
       [-0.63472825],
       [-0.7215738 ],
       [-0.39073238]], dtype=float32)

Let's examine TensorBoard. One should attempt to navigate to the SCALARS tab.

In [54]:
%tensorboard --logdir tf_logs/

Reusing TensorBoard on port 6006 (pid 21949), started 0:00:01 ago. (Use '!kill 21949' to kill it.)

<h2>Name Scopes</h2>

Name scopes provide a way to organize operations within a computational graph. They group related operations together, enabling clearer organization, hierarchical structuring, and easier visualization and debugging of complex models. Scoped names automatically prefix operations, aiding in identification and management of the computational graph's complexity.

In [55]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name='X')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')

Define the operations `error` and `mse` within the namespace called "loss"

In [56]:
with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')

In [57]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

log_dir = make_log_subdir()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir=log_dir, graph=tf.get_default_graph())

In [58]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))


with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

file_writer.flush()
file_writer.close()
print("Best theta:")
print(best_theta)

Best theta:
[[ 2.070016  ]
 [ 0.8204561 ]
 [ 0.1173173 ]
 [-0.22739051]
 [ 0.3113402 ]
 [ 0.00353193]
 [-0.01126994]
 [-0.91643935]
 [-0.8795008 ]]


The name of each operation defined within the space is now prefixed with `loss/`

In [59]:
print(error.op.name)

loss/sub


In [60]:
print(mse.op.name)

loss/mse


<h2>Modularity</h2>

In [61]:
tf.reset_default_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')

w1 = tf.Variable(tf.random_normal((n_features, 1)), name='weights1')
w2 = tf.Variable(tf.random_normal((n_features, 1)), name='weights2')

b1 = tf.Variable(0.0, name='bias1')
b2 = tf.Variable(0.0, name='bias2')

z1 = tf.add(tf.matmul(X, w1), b1, name='z1')
z2 = tf.add(tf.matmul(X, w2), b2, name='z2')

relu1 = tf.maximum(z1, 0.0, name='relu1')
relu2 = tf.maximum(z2, 0.0, name='relu2')

output = tf.add(relu1, relu2, name='output')

ReLU (Rectified Linear Unit) is an activation function commonly used for introducing non-linearity into the model. It computes the output as the maximum of 0 and the input value, effectively thresholding negative values to zero while leaving positive values unchanged. ReLU is simple and computationally efficient, and it helps prevent the vanishing gradient problem during training by allowing for faster convergence.

Using a function to build the ReLUs:

In [62]:
tf.reset_default_graph()


def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name='weights')
    b = tf.Variable(0.0, name='bias')
    z = tf.add(tf.matmul(X, w), b, name='z')
    return tf.maximum(z, 0.0, name='relu')


n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

In [63]:
save_graph(run_id='relu1')

'tf_logs/run-relu1/'

In [64]:
%tensorboard --logdir 'tf_logs'

Using a function to build the ReLUs with a name scope:

In [65]:
tf.reset_default_graph()


def relu(X):
    with tf.name_scope('relu'):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X, w), b, name='z')
        return tf.maximum(z, 0.0, name='relu')


n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

In [66]:
save_graph(run_id='relu2')

'tf_logs/run-relu2/'

In [67]:
%tensorboard --logdir 'tf_logs'

Reusing TensorBoard on port 6007 (pid 21952), started 0:00:00 ago. (Use '!kill 21952' to kill it.)

<h2>Sharing Variables</h2>

Sharing a `threshold` variable is a classic way, by defining it outside of the `relu()` function the passing it as a parameter

In [68]:
tf.reset_default_graph()


def relu(X, threshold):
    with tf.name_scope('relu'):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X, w), b, name='z')
        return tf.maximum(z, threshold, name='max')


threshold = tf.Variable(0.0, name='threshold')
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name='output')

Shared variable as an attribute of the `relu()` function at the first call

In [69]:
tf.reset_default_graph()


def relu(X):
    with tf.name_scope('relu'):
        if not hasattr(relu, 'thershold'):
            relu.threshold = tf.Variable(0.0, name='threshold')
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X, w), b, name='z')
        return tf.maximum(z, relu.threshold, name='max')


X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

The TensorFlow library suggests using the `get_variable()` function to create a shared variable if it does not yet exist, or reuse it if it already exists. The behavior (create or reuse) is controlled by the `variable_scope()` attribute of the current variable space. 

In [70]:
tf.reset_default_graph()


with tf.variable_scope('relu'):
    threshold = tf.get_variable(
        name='threshold', shape=(), initializer=tf.constant_initializer(0.0)
    )

To reuse a variable, set the `reuse` attribute of the variable space to `True`

In [71]:
with tf.variable_scope('relu', reuse=True):
    threshold = tf.get_variable('threshold')

Alternatively, set the `reuse` attribute to `True` within a block by calling the `reuse_variables()` method of the variable space

In [72]:
with tf.variable_scope('relu') as scope:
    scope.reuse_variables()
    threshold = tf.get_variable('threshold')

The `relu()` function with the `threshold` variable without needing to pass it as a parameter.

In [73]:
tf.reset_default_graph()


def relu(X):
    with tf.variable_scope('relu', reuse=True):
        threshold = tf.get_variable('threshold')  # reuse an existing variable
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X, w), b, name='z')
        return tf.maximum(z, threshold, name='max')


X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
with tf.variable_scope('relu'):  # create variable
    threshold = tf.get_variable(
        name='threshold', shape=(), initializer=tf.constant_initializer(0.0)
    )
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name='output')

In [74]:
save_graph(run_id='relu3')

'tf_logs/run-relu3/'

In [75]:
%tensorboard --logdir 'tf_logs'

Reusing TensorBoard on port 6007 (pid 21952), started 0:00:01 ago. (Use '!kill 21952' to kill it.)

The `threshold` variable must be defined outside the `relu()` function, where all other ReLU code resides. </br>
The following code creates the `threshold` variable inside the `relu()` function on its first call, and reuses it on subsequent calls. Now the `relu()` function doesn't have to worry about namespaces or variable separation: it simply calls the `get_variable()` method, which will create or reapply the threshold variable.

In [76]:
tf.reset_default_graph()


def relu(X):
    threshold = tf.get_variable(
        name='threshold', shape=(), initializer=tf.constant_initializer(0.0)
    )
    w_shape = int(X.get_shape()[1]), 1
    w = tf.Variable(tf.random_normal(w_shape), name='weights')
    b = tf.Variable(0.0, name='bias')
    z = tf.add(tf.matmul(X, w), b, name='z')
    return tf.maximum(z, threshold, name='max')


X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
# The `relu()` function is called five times, ensuring that `reuse=False`
# is set for the first call and `reuse=True` for all others
relus = []
for relu_index in range(5):
    with tf.variable_scope('relu', reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name='output')

In [77]:
save_graph(run_id='relu4')

'tf_logs/run-relu4/'

In [78]:
%tensorboard --logdir 'tf_logs'

Reusing TensorBoard on port 6007 (pid 21952), started 0:00:01 ago. (Use '!kill 21952' to kill it.)