# Running a Graph in a Session

The best way to run a session is using a `with` statement. The `with` statement the current session is set as the default session. Also, it is preferred to initiliaze all global variables using the `global_variables_initializer()` function. This function does not automatically initializes the variables, but created a node that has to be ran with the session is opened.

In [1]:
import tensorflow as tf 

x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    
print(result)

42


# Managing Graphs

A node created during a session is automtically added to the default graph. Sometimes it is necessary to create multiple independent graphs, so a new graph has to be created a make it the default graph inside a `with` statement.

It is common that while using the Jupyter Notebook or the Python shell some commands are run several times. That leads to having duplicate nodes in a graph. To reset the default graph, the command `tf.reset_default_graph()` can be used. 

In [2]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


# Lifecycle of a Node Value

When evaluating a node $x$, TensorFlow first determines the nodes on which the node `x` depends. Then, the nodes on which the node `x` depends on are evaluated first. 

In the code below, the evaluation of node `y` makes TensorFlow evaluate nodes `w` and `x` first. The same process is carried on for node `z`. It has to be taken into account that `w` and `x` are being evaluated twice: one for `y` and one for `z`. 

```python
w = tf.constant(3)
x = w + 2
y = x + 5
z = x + 3


with tf.Session() as sess:
    print(y.eval())
    print(z.eval())
```

To evaluate `x` and `w` efficiently, the code must ask TensorFlow to evaluate `y` and `z` in one graph.

```python
with tf.Session as sess:
    y_val, z_val = sess.run([y,z])
    print(y_val)
    print(z_val)
```

When single-process TensorFlow is used, each session has its copy of every variable. In distributed TensorFlow, the variable state is stored in the servers, so multiple sessions can share the same variables.

# Linear Regression with TensorFlow

The code below implements linear regression for the California housing dataset using TensorFlow. The first step is to add a bias term using numpy. Then, two TensorFlow constants nodes are created to hold the data and the targets. `theta` is calculated based on the normal equation $\hat{\theta} = \left(\mathbf{X}^{T}\cdot\mathbf{X}\right)^{-1}\cdot \mathbf{X}^{T}\cdot \mathbf{y}$. All the matrix operations needed to compute `theta` are included in TensorFlow. 






In [3]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data]

scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias)

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

# Implementing Gradient Descent

## Manually Computing the Gradients

The code below implements gradient descent manually using TensorFlow. The gradient descent algorithm is straightforward to follow, but there are a few TensorFlow functions to keep in mind:

+ `random_uniform()` function creates a node in the graph that generates a tensor with random values within the range specified in the function.
+ The `assign()` function creates a node that assigns a new value to a variable. In this particular case, the function updates the variable `theta` with the calculated gradients.

In [4]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m*tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta-learning_rate*gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE', mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE 6.74645
Epoch 100 MSE 4.87894
Epoch 200 MSE 4.84628
Epoch 300 MSE 4.83791
Epoch 400 MSE 4.83146
Epoch 500 MSE 4.82628
Epoch 600 MSE 4.8221
Epoch 700 MSE 4.8187
Epoch 800 MSE 4.81595
Epoch 900 MSE 4.81371


## Gradient Descent Using autodiff

Calculating gradients manually for an arbitrary functions can lead to inefficient code or even to a very difficult task. This is why it is better to use TensorFlow's autodiff feature; autodiff can calculate gradients automatically and efficiently. 

To implement autodiff in gradient descent for linear regression it is enough to change the gradients definition for

```python
gradients = tf.gradients(mse, [theta])[0]
```
The `gradients()` function takes an operation (`mse`) operation and a list of variables (`[theta]`). The function creates a list of operations (one list per variable) to compute the gradients of the operation with regards to each variable. In this particular case, the `gradients` node will compute the gradient vector of MSE with regards to `theta`. 

## Using an Optimizer

The process of finding the minimum value using gradient descent can be made even easier using optimizer provided by TensorFlow. For example, to optimize the gradient descent the lines `gradient = ` and `training_op = ` should be replaced for 

```python
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
```

If the optmizer has to be changed for the problem, it is enough to change the optimizer line. For example, the momentum optimizer can be used instead of the gradient descent optimizer.

# Feeding Data to the Training Algorithm

The best way to implement mini-batch gradient descent is to use placeholder nodes. This type of node do not perform any computation, it only outputs the data that it is given at runtime. Placeholder nodes are used to pass training data to TensorFlow. 

When a placceholder is created, it is necessary to specify the tensor's datatype. Also, the shape of the tensor can be specified using the shape parameter. A tensor with an arbitrary number of dimensions can be specified using `None`. The example below creates a placeholder node `A` that will contain floats and that has to be three-dimensional. Then, the values for `A` are passed using `feed_dict` specifying the value of the three-dimensional tensor.

In [5]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4,5,6],[7,8,9]]})
print(B_val_1)
print(B_val_2)

[[ 6.  7.  8.]]
[[  9.  10.  11.]
 [ 12.  13.  14.]]


To implement mini-batch gradient descent, the way `X` and `y` are specified has to be changed; both have to be converted to placeholder nodes. Also, a batch size has to be defined along with the total number of batches. Finally, the mini-batches have to be provided during the execution phase when evaluating each of the nodes that depend on `X` and `y`. 

```python

X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')
y = tf.placeholder(tf.float32, shape(None, 1), name='y')

batch_size = 100
n_batches = int(np.ceil(m/batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch
    
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y_batch})
    best_theta = theta.eval()
```

# Saving and Restoring Models

Once a model is trained, the parameters should be saved on disk so they can be used later. It is also useful to save checkpoints during training so if the computer crashes while training the model, the training can continue from the last checkpoint instead of starting from scratch.

Saving and restoring a model in TensorFlow is easy. It is enough to create a `Saver` node at the end of the construction phase, ie after all the variable nodes are created. Then, in the execution phase, the `save()` method of this node should be called passing the session and the path of the checkpoint file.

```python
...
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
...
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run()
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            save_path = saver.save(sess, 'tmp/my_model.ckpt')
            
        sess.run(training_op)
        
    best_theta = thete.eval()
    save_path = save.save(sess, 'tmp/my_model_final.ckpt')
```

To restore a model saved in a checkpoint file the process is similar. A `Saver()` node has to be created at the end of the construction phase. Then, at the beginning of the execution phase, the method `restore()` is called instead of initializing the variables.

```python
with tf.Session() as sess:
    saver.restore(sess, 'tmp/my_model.ckpt')
    
```

The saver node allows to specify which variables to save and restore; all variables are saved and restored by default. For example, the following line is called to save only the theta variable under the name "weights".

```python
saver = tf.train.Saver({'weights':theta})
```

Graphs are also saved by default by the saver nodes in a secondary file with the extension .meta. This graph can be loaded using `tf.train.import_meta_graph()`. This function adds the graph in the file to the default graph and returns a `Saver` that can be used to restores the graph's state (the variable values).

```python
save = tf.train.import_meta_graph('tmp/my_model_final.ckpt.meta')

with tf.Session() as sess:
    saver.restore(sess, 'tmp/my_model_final.ckpt')
```
Restoring the graph allows to restore completely a saved model: the graph structure and the variable variables. This can be done without having the code that built the model.

# Visualizing the Graph and Traininng Curves Using TensorBoard

TensorBoard is TensorFlow's tool to visualize progress during training. TensorBoard displays interactive training stats in a web browser. Also, TensorBoard allows to browse through a graph using its definition. Browsing through the graph is a way to identify errors, find bottlenecks, etc.

The first step is to change the program that implements mini-batch gradient descent so it writes the graph definition and training stats, such as the training error MSE, to a log directory. It is important to use different directories for each time the program runs, because TensorBoard would merge the stats if they are on the same directory. The easier solution to create different directories for each run is to create a time stamp.

In [6]:
from datetime import datetime

now = datetime.utcnow().strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)

The next step is to add two lines at the end of the construction phase. The first line creates a node that evaluates the MSE value and writes it to a binary string called `mse_summary` compatible with TensorBoard. The second line creates a `FileWriter` that will write summaries to logfiles in the log directory. The first parameter specified the path to the log directory, and the second parameter is the graph that is going to be visualized. `FileWriter` will create the log directory if it does not exist and write the graph definition to a binary log file called *events file*.

```python
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph)
```

Once the previous lines are added, the execution phase has to be updated to evaluate the mse_summary node during training. It is important to avoid logging training stats in every step since that can significantly slow training. The evaluation of this node would output a summary that is then written by the `file_writer`. The `file_writer` must be closed at the end of the execution. 

When running this program, an events file is created containing the graph definition and the MSE values. To access the TensorBoard visualization, the `tensorboard` command must be ran in the directory containing the log files. Then, in a web browser the address *http://0.0.0.0:6006/*. In the event tabs the are all the values of the MSE, while on the graph tab the graph of the process should be shown. 

To make the visualization clearer, the nodes that many connections to other nodes are separated to the right area. Nodes can be moved back and forth between the main graph and the auxiliary area by right-clicking on the node. 

In [7]:
tf.reset_default_graph()

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch

learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:                                                        
    sess.run(init)                                                                

    for epoch in range(n_epochs):                                                 
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()       
    
file_writer.close()

# Name Scopes

When using TensorFlow with more complex models, the graph can become cluttered with nodes. Name scopes are created to group related nodes. For example, the `error` and `mse` operations can be grouped in a name scope called `loss`. when a node is created within a name scope, the name of the node is going to be preceded by the name scope.

```python

with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')
```

In the graph displayed by TensorBoard, the `mse` and `error` nodes appear inside the `loss` namespace that appears collapsed by default.

In [8]:
tf.reset_default_graph()

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch

learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")

with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name='mse')


optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:                                                        
    sess.run(init)                                                                

    for epoch in range(n_epochs):                                                 
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()       
    
file_writer.close()

    
print(error.op.name)
print(mse.op.name)

loss/sub
loss/mse


# Modularity

Sometimes repetitive code has to be written to perform a task. For this cases, TensorFlow allows the creation of functions for repetitive tasks. The code below is a function to build rectified linear units (ReLU).

ReLU

$h_{\mathbf{w}, b} = \max(\mathbf{X\cdot w} + b , 0)$

The function definition is the same as a function used in Python. However, the difference when TensorFlow is used lies in that nodes with the same name are created every time the function is called. In these cases, TensorFlow adds an underscore and an index to make each node unique.

To make a repetitive task more clearer, a name scope can be created when the function is called. TensorFlow also adds a unique index to each name space, so each name space will have unique name.

In [11]:
def relu(X):
    with tf.name_scope('relu'):
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X,w), b, name='z')
        return tf.maximum(z,0., name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

# Sharing Variables

TensorFlow has a particular option for sharing variables: the `get_variable()` function. The behavior of this function is controlled by the current `variable_scope()` environment. 

To reuse a variable the `reuse` attribute in the `variable_scope` has to be set to `True`. 

The code below calls five times the `relu` function. The code takes care that the `reuse` attribute is set to `False` in the first code and to `True` in subsequent calls. That way, the threshold is defined in the first call and the value is reused in the subsequent calls.

In [13]:
def relu(X):
    with tf.name_scope('relu'):
        threshold = tf.get_variable('threshold', shape=(),
                                    initializer=tf.constant_initializer(0.0))
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name='weights')
        b = tf.Variable(0.0, name='bias')
        z = tf.add(tf.matmul(X,w), b, name='z')
        return tf.maximum(z,threshold, name='max')
    
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = []
for relu_index in range(5):
    with tf.variable_scope('relu',reuse=(relu_index >=1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name='output')