<h1>Chapter 09. Up and Running with TensorFlow</h1>

**TensorFlow** is an open-source machine learning framework developed by Google Brain for building and training various types of machine learning models, including deep learning neural networks. It provides a comprehensive ecosystem of tools, libraries, and resources for tasks such as data preprocessing, model building, training, deployment, and inference. TensorFlow supports both CPU and GPU computations, allowing for efficient processing of large-scale datasets and complex models. It's widely used in research, industry, and academia for a diverse range of machine learning and artificial intelligence applications.

<h2>Creating and Running a first Graph</h2>

Creating a graph

In [1]:
# Import TensorFlow version 1 compatibility module as tf
import tensorflow.compat.v1 as tf


# Disable eager execution
tf.disable_eager_execution()

# Define TensorFlow variables with initial values
x = tf.Variable(initial_value=3, name='x')
y = tf.Variable(initial_value=4, name='y')

# Define computational graph
f = x * x * y + y + 2

To run the graph, a TensorFlow session must be opened and used to initialize variables and evaluate `f`. The following code creates a session, initializes variables, computes `f`, and closes the session (freeing resources).

In [2]:
sess = tf.Session()  # create session

sess.run(x.initializer)  # initialize x
sess.run(y.initializer)  # initialize y

result = sess.run(f)  # execute the computation f
result

2024-03-21 17:27:10.032256: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled


42

In [3]:
sess.close()

Rather than individually executing the initializer for each variable, it is more efficiant to utilize `global_variables_initializer()`. Repeatedly calling `sess.run` for initialization can become cumbersome; however, there exists a more streamlined approach: 

In [4]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()

result

42

`InteractiveSession()` in TensorFlow allows you to work interactively, seamlessly combining graph and execution without needing a `with` block. It's convenient for interactive environments like Jupyter notebooks, but remember to close the session manually to free up resources. 

In [5]:
sess = tf.InteractiveSession()

init.run()
result = f.eval()
result

42

In [6]:
sess.close()

<h2>Managing Graphs</h2>

In [7]:
# Reset the default TensorFlow graph to clear
# any previously defined operations or variables
tf.reset_default_graph()

x1 = tf.Variable(1)

x1.graph is tf.get_default_graph()

True

In [8]:
graph = tf.Graph()

with graph.as_default():
    x2 = tf.Variable(2)

x2.graph is graph

True

In [9]:
x2.graph is tf.get_default_graph()

False

<h2>Lifecycle of a Node Value</h2>

In [10]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


In [11]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


<h2>Linear Rergression with TensorFlow</h2>

The code below manipulates two-dimensional arrays to perform linear regression on a dataset containing California real estate prices. The code starts by retrieving a dataset. It then adds an additional input bias feature to all training samples using NumPy. The code then creates two TensorFlow constant nodes, `X` and `y`, to store this data and targets, and then uses a series of matrix operations provided by TensorFlow to determine theta. Such matrix functions are `transpose()`, `matmul()` and `matrix_inverse()` does not perform calculations immediately; instead, nodes are created in the graph that will be executed when the graph is run.

In [12]:
import numpy as np
from sklearn.datasets import fetch_california_housing


tf.reset_default_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(
    housing_data_plus_bias,
    dtype=tf.float32,
    name='X'
)
y = tf.constant(
    housing.target.reshape(-1, 1),
    dtype=tf.float32,
    name='y'
)

XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

theta_value

array([[-3.7174683e+01],
       [ 4.3591911e-01],
       [ 9.3909204e-03],
       [-1.0637519e-01],
       [ 6.4145899e-01],
       [-4.1128196e-06],
       [-3.7799443e-03],
       [-4.2388692e-01],
       [-4.3728542e-01]], dtype=float32)

Compare with pure NumPy

In [13]:
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)

theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
theta_numpy

array([[-3.69419202e+01],
       [ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01]])

Compare with Scikit-Learn

In [14]:
from sklearn.linear_model import LinearRegression


lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T]

array([[-3.69419202e+01],
       [ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01]])

<h2>Implementation of Gradient Descent</h2>

Gradient Descent requires scaling the feature vectors first. This can be achieved using the `StandardScaler` from Scikit-Learn. The `StandardScaler` scales features by removing the mean and scaling to unit variance, which helps improve the performance of Gradient Descent algorithms by ensuring that all features have a similar scale.

In [15]:
from sklearn.preprocessing import StandardScaler


scaler = StandardScaler()

scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [16]:
# Compute the mean value of each feature (column)
scaled_housing_data_plus_bias.mean(axis=0)

array([ 1.00000000e+00,  6.60969987e-17,  5.50808322e-18,  6.60969987e-17,
       -1.06030602e-16, -1.10161664e-17,  3.44255201e-18, -1.07958431e-15,
       -8.52651283e-15])

In [17]:
# Compute the mean value of each instance (row)
scaled_housing_data_plus_bias.mean(axis=1)

array([ 0.38915536,  0.36424355,  0.5116157 , ..., -0.06612179,
       -0.06360587,  0.01359031])

In [18]:
# Compute the overall mean value of all elements
scaled_housing_data_plus_bias.mean()

0.11111111111111005

In [19]:
scaled_housing_data_plus_bias.shape

(20640, 9)

<h3>Manually computing the Gradients</h3>

In [20]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(
    scaled_housing_data_plus_bias,
    dtype=tf.float32,
    name='X'
)
y = tf.constant(
    housing.target.reshape(-1, 1),
    dtype=tf.float32,
    name='y'
)

# random_uniform() creates a node in the graph that will generate a tensor
# containing random values with a given shape and range
theta = tf.Variable(tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name='theta')

y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2 / m * tf.matmul(tf.transpose(X), error)

# Create a node that will assign a new value to the variable
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # Loop repeatedly executes the learning step and
    # otputs the MSE each 100 iterations
    for epoch in range(n_epochs):
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        
        sess.run(training_op)

    best_theta = theta.eval()

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322219371795654
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338220000267029
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293705463409424


In [21]:
best_theta

array([[ 2.06855226e+00],
       [ 7.74078071e-01],
       [ 1.31192386e-01],
       [-1.17845066e-01],
       [ 1.64778143e-01],
       [ 7.44081801e-04],
       [-3.91945131e-02],
       [-8.61356556e-01],
       [-8.23479712e-01]], dtype=float32)

<h3>Using autodiff</h3>

In [22]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(
    scaled_housing_data_plus_bias,
    dtype=tf.float32,
    name='X'
)
y = tf.constant(
    housing.target.reshape(-1, 1),
    dtype=tf.float32,
    name='y'
)

theta = tf.Variable(tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

The `tf.gradients()` function takes the `mse` operation and `theta` variable list and creates a list of operations (one per variable) to compute the gradients of the operation with respect to each variable. Thus, the gradients node will compute the vector gradient of the MSE with respect to theta.

In [23]:
gradients = tf.gradients(ys=mse, xs=[theta])[0]

In [24]:
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322218775749207
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338219404220581
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293704867362976

Best theta:
[[ 2.06855249e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945094e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


<h3>Using a <code>GradientDescentOptimizer</code></h3>

The `GradientDescentOptimizer` is an optimization algorithm that iteratively adjusts the parameters (weights) of a model in the direction of the steepest descent of the loss function. It updates the parameters based on the gradients of the loss function with respect to the parameters, scaled by a learning rate. 

In [25]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(
    scaled_housing_data_plus_bias,
    dtype=tf.float32,
    name='X'
)
y = tf.constant(
    housing.target.reshape(-1, 1),
    dtype=tf.float32,
    name='y'
)

theta = tf.Variable(tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

Replace `gradients = ...` and `training_op = ...` with the following code:

In [26]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)  # minimize the mean squared error 

In [27]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.6322218775749207
Epoch 200 MSE = 0.5727803111076355
Epoch 300 MSE = 0.5585008263587952
Epoch 400 MSE = 0.5490700006484985
Epoch 500 MSE = 0.5422880053520203
Epoch 600 MSE = 0.5373790860176086
Epoch 700 MSE = 0.5338219404220581
Epoch 800 MSE = 0.5312425494194031
Epoch 900 MSE = 0.5293704867362976

Best theta:
[[ 2.06855249e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945094e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


<h3>Using a <code>MomentumOptimizer</code></h3>

The `MomentumOptimizer` is an optimization algorithm that enhances the basic gradient descent method by incorporating momentum. It accumulates a weighted average of past gradients and uses this information to update the parameters (weights) of a model. This helps to accelerate convergence, especially in the presence of high curvature or noisy gradients. The momentum parameter controls the influence of past gradients on the parameter updates.

In [28]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(
    scaled_housing_data_plus_bias,
    dtype=tf.float32,
    name='X'
)
y = tf.constant(
    housing.target.reshape(-1, 1),
    dtype=tf.float32,
    name='y'
)

theta = tf.Variable(tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

In [29]:
# Set momentum parameter to 0.9, which controls the influence
# of past gradients on the parameter updates
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)
training_op = optimizer.minimize(mse)

In [30]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        
        if epoch % 100 == 0:
            print(f"Epoch {epoch} MSE = {mse.eval()}")
        
        sess.run(training_op)

    best_theta = theta.eval()

print(f"\nBest theta:\n{best_theta}")

Epoch 0 MSE = 2.7544267177581787
Epoch 100 MSE = 0.5273160338401794
Epoch 200 MSE = 0.5244147777557373
Epoch 300 MSE = 0.5243281722068787
Epoch 400 MSE = 0.5243218541145325
Epoch 500 MSE = 0.5243210792541504
Epoch 600 MSE = 0.5243210196495056
Epoch 700 MSE = 0.5243209600448608
Epoch 800 MSE = 0.5243209600448608
Epoch 900 MSE = 0.5243209600448608

Best theta:
[[ 2.068558  ]
 [ 0.8296167 ]
 [ 0.11875112]
 [-0.26552212]
 [ 0.30569226]
 [-0.00450316]
 [-0.03932616]
 [-0.8998917 ]
 [-0.87054664]]


<h2>Feeding Data to the Training Algorithm</h2>

The `tf.placeholder()` function in TensorFlow is used to create a placeholder tensor. Placeholder tensors are used as input points for feeding actual data into a TensorFlow computational graph during the execution phase. They act as empty nodes that will be filled with actual data when a session runs the computational graph. Placeholders allow for dynamic data input, making TensorFlow models more flexible and adaptable to different datasets and scenarios.

In [31]:
tf.reset_default_graph()

A = tf.placeholder(dtype=tf.float32, shape=(None, 3))  # None for any size measurement
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

In [32]:
print(B_val_1)

[[6. 7. 8.]]


In [33]:
print(B_val_2)

[[ 9. 10. 11.]
 [12. 13. 14.]]


<h3>Mini-batch Gradient Descent</h3>

In [34]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

# Set X and y as placeholders
X = tf.placeholder(dtype=tf.float32, shape=(None, n + 1), name='X')
y = tf.placeholder(dtype=tf.float32, shape=(None, 1), name='y')

theta = tf.Variable(tf.random_uniform(shape=[n + 1, 1], minval=-1.0, maxval=1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

# Define tha packet size and calculate the totla number of packets
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

Finally, at runtime, one extracts the mini-batches one by one and provides the value of X and y through the feed_dict parameter when evaluating a node that depends on either of them.

In [35]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)

    X_batch = scaled_housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]

    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

best_theta

array([[ 2.0714476 ],
       [ 0.8462012 ],
       [ 0.11558535],
       [-0.26835832],
       [ 0.32982782],
       [ 0.00608358],
       [ 0.07052915],
       [-0.87988573],
       [-0.8634251 ]], dtype=float32)