# Running a Graph in a Session

The best way to run a session is using a `with` statement. The `with` statement the current session is set as the default session. Also, it is preferred to initiliaze all global variables using the `global_variables_initializer()` function. This function does not automatically initializes the variables, but created a node that has to be ran with the session is opened.

In [1]:
import tensorflow as tf 

x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    
print(result)

42


# Managing Graphs

A node created during a session is automtically added to the default graph. Sometimes it is necessary to create multiple independent graphs, so a new graph has to be created a make it the default graph inside a `with` statement.

It is common that while using the Jupyter Notebook or the Python shell some commands are run several times. That leads to having duplicate nodes in a graph. To reset the default graph, the command `tf.reset_default_graph()` can be used. 

In [2]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


# Lifecycle of a Node Value

When evaluating a node $x$, TensorFlow first determines the nodes on which the node `x` depends. Then, the nodes on which the node `x` depends on are evaluated first. 

In the code below, the evaluation of node `y` makes TensorFlow evaluate nodes `w` and `x` first. The same process is carried on for node `z`. It has to be taken into account that `w` and `x` are being evaluated twice: one for `y` and one for `z`. 

```python
w = tf.constant(3)
x = w + 2
y = x + 5
z = x + 3


with tf.Session() as sess:
    print(y.eval())
    print(z.eval())
```

To evaluate `x` and `w` efficiently, the code must ask TensorFlow to evaluate `y` and `z` in one graph.

```python
with tf.Session as sess:
    y_val, z_val = sess.run([y,z])
    print(y_val)
    print(z_val)
```

When single-process TensorFlow is used, each session has its copy of every variable. In distributed TensorFlow, the variable state is stored in the servers, so multiple sessions can share the same variables.

# Linear Regression with TensorFlow

The code below implements linear regression for the California housing dataset using TensorFlow. The first step is to add a bias term using numpy. Then, two TensorFlow constants nodes are created to hold the data and the targets. `theta` is calculated based on the normal equation $\hat{\theta} = \left(\mathbf{X}^{T}\cdot\mathbf{X}\right)^{-1}\cdot \mathbf{X}^{T}\cdot \mathbf{y}$. All the matrix operations needed to compute `theta` are included in TensorFlow. 






In [3]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data]

scaler = StandardScaler()
scaled_housing_data_plus_bias = scaler.fit_transform(housing_data_plus_bias)

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

# Implementing Gradient Descent

## Manually Computing the Gradients

The code below implements gradient descent manually using TensorFlow. The gradient descent algorithm is straightforward to follow, but there are a few TensorFlow functions to keep in mind:

+ `random_uniform()` function creates a node in the graph that generates a tensor with random values within the range specified in the function.
+ The `assign()` function creates a node that assigns a new value to a variable. In this particular case, the function updates the variable `theta` with the calculated gradients.

In [4]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1,1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m*tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta-learning_rate*gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE', mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE 7.83235
Epoch 100 MSE 5.10969
Epoch 200 MSE 4.98783
Epoch 300 MSE 4.93502
Epoch 400 MSE 4.89822
Epoch 500 MSE 4.87175
Epoch 600 MSE 4.85266
Epoch 700 MSE 4.83889
Epoch 800 MSE 4.82896
Epoch 900 MSE 4.8218


## Gradient Descent Using autodiff

Calculating gradients manually for an arbitrary functions can lead to inefficient code or even to a very difficult task. This is why it is better to use TensorFlow's autodiff feature; autodiff can calculate gradients automatically and efficiently. 

To implement autodiff in gradient descent for linear regression it is enough to change the gradients definition for

```python
gradients = tf.gradients(mse, [theta])[0]
```
The `gradients()` function takes an operation (`mse`) operation and a list of variables (`[theta]`). The function creates a list of operations (one list per variable) to compute the gradients of the operation with regards to each variable. In this particular case, the `gradients` node will compute the gradient vector of MSE with regards to `theta`. 

## Using an Optimizer

The process of finding the minimum value using gradient descent can be made even easier using optimizer provided by TensorFlow. For example, to optimize the gradient descent the lines `gradient = ` and `training_op = ` should be replaced for 

```python
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
```

If the optmizer has to be changed for the problem, it is enough to change the optimizer line. For example, the momentum optimizer can be used instead of the gradient descent optimizer.

# Feeding Data to the Training Algorithm

The best way to implement mini-batch gradient descent is to use placeholder nodes. This type of node do not perform any computation, it only outputs the data that it is given at runtime. Placeholder nodes are used to pass training data to TensorFlow. 

When a placceholder is created, it is necessary to specify the tensor's datatype. Also, the shape of the tensor can be specified using the shape parameter. A tensor with an arbitrary number of dimensions can be specified using `None`. The example below creates a placeholder node `A` that will contain floats and that has to be three-dimensional. Then, the values for `A` are passed using `feed_dict` specifying the value of the three-dimensional tensor.

In [7]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5

with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4,5,6],[7,8,9]]})
print(B_val_1)
print(B_val_2)

[[ 6.  7.  8.]]
[[  9.  10.  11.]
 [ 12.  13.  14.]]


To implement mini-batch gradient descent, the way `X` and `y` are specified has to be changed; both have to be converted to placeholder nodes. Also, a batch size has to be defined along with the total number of batches. Finally, the mini-batches have to be provided during the execution phase when evaluating each of the nodes that depend on `X` and `y`. 

```python

X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')
y = tf.placeholder(tf.float32, shape(None, 1), name='y')

batch_size = 100
n_batches = int(np.ceil(m/batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    #load data from disk
    return X_batch, y_batch
    
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y_batch})
    best_theta = theta.eval()
```

# Saving and Restoring Models

Once a model is trained, the parameters should be saved on disk so they can be used later. It is also useful to save checkpoints during training so if the computer crashes while training the model, the training can continue from the last checkpoint instead of starting from scratch.

Saving and restoring a model in TensorFlow is easy. It is enough to create a `Saver` node at the end of the construction phase, ie after all the variable nodes are created. Then, in the execution phase, the `save()` method of this node should be called passing the session and the path of the checkpoint file.

```python
...
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name='theta')
...
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run()
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            save_path = saver.save(sess, 'tmp/my_model.ckpt')
            
        sess.run(training_op)
        
    best_theta = thete.eval()
    save_path = save.save(sess, 'tmp/my_model_final.ckpt')
```

To restore a model saved in a checkpoint file the process is similar. A `Saver()` node has to be created at the end of the construction phase. Then, at the beginning of the execution phase, the method `restore()` is called instead of initializing the variables.

```python
with tf.Session() as sess:
    saver.restore(sess, 'tmp/my_model.ckpt')
    
```

The saver node allows to specify which variables to save and restore; all variables are saved and restored by default. For example, the following line is called to save only the theta variable under the name "weights".

```python
saver = tf.train.Saver({'weights':theta})
```

Graphs are also saved by default by the saver nodes in a secondary file with the extension .meta. This graph can be loaded using `tf.train.import_meta_graph()`. This function adds the graph in the file to the default graph and returns a `Saver` that can be used to restores the graph's state (the variable values).

```python
save = tf.train.import_meta_graph('tmp/my_model_final.ckpt.meta')

with tf.Session() as sess:
    saver.restore(sess, 'tmp/my_model_final.ckpt')
```
Restoring the graph allows to restore completely a saved model: the graph structure and the variable variables. This can be done without having the code that built the model.