# TensorFlow

In this tutorial we will go through the basics of TensorFlow, from installation to creating, running, and saving simple computational graphs.

### Table of Contents <br \>
- 1  What is TensorFlow?
- 2 How does TensorFlow work?
- 3 Creating Your First Graph and Running It in a Session 
- 4 Managing Graphs 
- 5 Lifecycle of a Node Value 
- 6 Linear Regression with TensorFlow   
- 7 Implementing Gradient Descent  
- 8 Feeding Data to the Training Algorithm  
- 9 Saving and Restoring Models 
- 10 TensorFlow in Use

### 1. What is TensorFlow?

TensorFlow is an open source complex library for distributed numerical computation using data flow graphs. It makes it possible to train and run very large neural networks efficiently by distributing the computations across potentially thousands of multi-GPU servers. 

TensorFlow was created at Google and supports many of their large-scale Machine Learning applications. TensorFlow can train a network with millions of parameters on a training set composed of billions of instances with millions of features each. It was open-sourced in November 2015.

For more info check the information on https://www.tensorflow.org/

What are tensors? 

TensorFlow, as the name indicates, is a framework to define and run computations involving tensors. A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

When writing a TensorFlow program, the main object you manipulate and pass around is the tf.Tensor. A tf.Tensor object represents a partially defined computation that will eventually produce a value. TensorFlow programs work by first building a graph of tf.Tensor objects, detailing how each tensor is computed based on the other available tensors and then by running parts of this graph to achieve the desired results.

A tf.Tensor has the following properties:

  - a data type (float32, int32, or string, for example)
  - a shape

Each element in the Tensor has the same data type, and the data type is always known. 

More info on tensors on this link https://www.tensorflow.org/programmers_guide/tensors

### 2. How does TensorFlow work?

`TensorFlow` library is used for numerical computation and fine-tuned for large-scale Machine Learning. We first define a graph of computations to perform and then `TensorFlow` takes that graph and runs it using optimized C++ code.  <br \> <br \> A `TensorFlow` program consists of two parts: <br \> 1- The construction phase builds a computation graph representing the ML model and the computations required to train it. <br \> 2- The execution phase runs a loop that evaluates a training step repeatedly, gradually improving the model parameters.

For tf graphs check more on  https://www.tensorflow.org/programmers_guide/graphs 

For tf operations check more on https://www.tensorflow.org/api_docs/python/tf/Operation

##### Install Packages

In [9]:
%sh /databricks/python3/bin/pip3 install tensorflow

### 3. Creating Your First Graph and Running It in a Session

Create a computation graph.

In [12]:
import tensorflow as tf

x = tf.Variable(3, name = "x")
y = tf.Variable(4, name = "y")
f = x*x*y + y + 2

Important! - The most important thing to understand is that this code does not actually perform any computation, even though it looks like it does (especially the
last line). It just creates a computation graph. In fact, even the variables are not initialized yet. To evaluate this graph, you need to open a TensorFlow session and use it to initialize the variables and evaluate f. A TensorFlow session takes care of placing the operations onto devices such as CPUs and GPUs and running them, and it holds all the variable values.

Create a session, initialize the variables and evaluate. Then, close the session with `f` to free up resources.

In [15]:
 sess = tf.Session()
 sess.run(x.initializer)
 sess.run(y.initializer)
 result = sess.run(f)
 print(result)
 sess.close()

Set the session as the default session inside the `with` block.  `x.initializer.run()` is equivalent to  `tf.get_default_session().run(x.initializer)`, and similarly `f.eval()` is equivalent to`tf.get_default_session().run(f)`. The session is automatically closed at the end of the block.

In [17]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

This code performs the same set of operations as above but in an optimized way as it does not repeat `sess.run()` and does the initialization automatically. 

Notice that the initialization is not done immediately. It first creates a node in the graph  that will initialize all the variables when it is run.

In [19]:
init = tf.global_variables_initializer()  # prepare an init node

with tf.Session() as sess:
    init.run()  # actually initialize all the variables
    result = f.eval()

Create an `InteractiveSession`. Once it is created it automatically sets itself as the default session. There is no need for a `with` block. 

Note: The only difference from a regular Session is that when an `InteractiveSession` is created it automatically sets itself as the default session.

In [21]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)
sess.close()

### 4. Managing Graphs

A node is a {matrix, tensor, vector, scalar} value. Any node created is automatically added to the default graph.

__What if we do not want this to happen?__
 
 We can define temporal default graphs inside `with` blocks so that the node created is added in the new graph.

Create a node.

In [25]:
x1 = tf.Variable(1)

Check if the node is added to the default graph.

In [27]:
x1.graph is tf.get_default_graph()

Create a new graph and temporarily make it the default graph inside a `with` block.

In [29]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

More on `tf.Variable()` can be found on https://www.tensorflow.org/api_docs/python/tf/Variable

Check if the new node is added to the new graph.

In [32]:
x2.graph is graph

Check if the new node is added to the old graph.

In [34]:
x2.graph is tf.get_default_graph()

Reset the default graph.

In [36]:
tf.reset_default_graph()

### 5. Lifecycle of a Node Value

When we evaluate a node, `TensorFlow` automatically determines the set of nodes that depends on and it evaluates these nodes first.

Define a simple graph.

In [40]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

Start the session and run the graph to evaluate `y` and `z`.

In [42]:
with tf.Session() as sess:
    print(y.eval())  # 10
    print(z.eval())  # 15

This code performs the same set of operations as above but is more efficient because it evaluates both `y` and `z` in one run. Unlike the previous code here, `w` and `x` are evaluated only once.

In [44]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)  # 10
    print(z_val)  # 15

### 6. Linear Regression with `TensorFlow`

`TensorFlow` operations (called ops) can take any number of inputs and produce any number of outputs. 
- For example, multiplication ops takes two inputs and produces one output. 
- Source ops like constants and variables take no inputs. 

The inputs and outputs are multidimensional arrays, called tensors. Tensors have a type and a shape.

Fetch the California housing dataset and add an extra bias input feature (x0 = 1) to all training instances using `NumPy`. In the linear regression model, slope will become the weight and the constant will act as bias. The weights and biases are the parameters you are going to optimize in order to get a good and accurate model.

In [48]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

Note : np.ones((2, 1))

  array ([[ 1.],

         [ 1.]])

Print the shape of `housing` data.

In [51]:
print(m, n)

Creates two `TensorFlow` constant nodes, X and y.

In [53]:
X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

These matrix functions — `transpose()`, `matmul()`, and `matrix_inverse()` - do not perform any computations immediately; instead, they create nodes in the graph that will perform them when the graph is run. The definition of `theta` corresponds to the Normal Equation ((XT · X)–1 · XT · y). By convention, the Greek letter θ (theta) is frequently used to represent model parameters.

Note: 
- XT is the transpose of X  
- check http://mathworld.wolfram.com/MatrixInverse.html for matrix_inverse
- matmul() is matrix multiplication like the dot product in numpy
- reshape(-1,1) the unspecified value is -1

In [55]:
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

d Create a session and use it to evaluate `theta`.

In [57]:
with tf.Session() as sess:
    theta_value = theta.eval()

Print the `theta_value`.

In [59]:
theta_value

### 7. Implementing Gradient Descent

Gradient Descent is a very generic optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea of Gradient Descent is to tweak parameters iteratively in order to minimize a cost function. There are optimizers that can speed up training large models tremendously compared to plain Gradient Descent.

More info on Gradient Descent: https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer

In this section we will use the Batch Gradient Descent instead of the Normal Equation. First we will manually compute the gradients, then we will use `TensorFlow`’s autodiff feature to let TensorFlow compute the gradients automatically. At the end, we will use a couple of TensorFlow’s out-of-the-box optimizers.

##### Manually Computing the Gradients

When using Gradient Descent, remember that it is important to first normalize the input feature vectors, or else training may be much slower. You can do this using TensorFlow, NumPy, Scikit- Learn’s StandardScaler, or any other solution you prefer.

Scale the feature vectors in order to use Gradient Descent.

In [66]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

Create a node in the graph that will generate a tensor containing random values, given its shape and value range using the `random_uniform()` function. 

- The `random_uniform()` function creates a node in the graph that will generate a tensor containing random values, given its shape and value range, much like NumPy’s rand() function.
- The `assign()` function creates a node that will assign a new value to a variable. In this case, it implements the Batch Gradient Descent step θ(next step) = θ – η∇θMSE(θ).

In [68]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

This loop executes the training step over and over again (`n_epochs` times), and every 100 iterations it prints out the current Mean Squared Error (mse). You should see the MSE go down at every iteration.

In [70]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

Print `best_theta`.

In [72]:
best_theta

##### Using Autodiff

This code is the same as the two cells above but replaces `gradients` = ... with the `gradients()` function to automatically compute the gradients. <br \>
The gradients() function takes the `mse` and the `theta`, and creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. The gradients node will compute the gradient vector of the MSE with regards to theta.

In [75]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = tf.gradients(mse, [theta])[0] #notice the gradient from the previous code is changed 

training_op = tf.assign(theta, theta - learning_rate * gradients)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init) 

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

Print `best_theta`.

In [77]:
best_theta

##### Using an Optimizer

The cell code above is used again replacing the `training_op`=... with code to use an optimizer as shown below.

In [80]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = tf.gradients(mse, [theta])[0] 

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) #notice codeline added
training_op = optimizer.minimize(mse) #notice the training_op from the previous code is changed 

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

Print `best_theta`.

In [82]:
best_theta

To use a different type of optimizer change the code line that defines the optimizer. This code is an example of how to define a momentum optimizer.

In [84]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)

### 8. Feeding Data to the Training Algorithm

In this section we will modify the previous code to implement Mini-batch Gradient Descent. In order to do this, we need to replace X and y at every iteration with the next mini-batch. The simplest way to do this is to use placeholder nodes. These nodes just output the data we tell them to output at runtime.

Create a placeholder node `A` and node `B` = `A + 5`. When we evaluate B, we pass a `feed_dict` to the `eval()` method that specifies the value of `A`. `A` must be two-dimensional and there must be three columns. The number of rows can be any.

In [88]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
     B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
     B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

Print `B_val_1`. Notice that `A` passed to `eval()` is two dimensional, three columns and one row.

In [90]:
print(B_val_1)

Print `B_val_2`. Notice that `A` passed to `eval()` is two dimensional, three columns and two rows.

In [92]:
print(B_val_2)

In order to implement Mini-batch Gradient Descent we change the definition of X and y in the construction phase to make them placeholder nodes.

In [94]:
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

Define the batch size and compute the total number of batches.

In [96]:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
n_batches

####Note: 
np.ceil(-1.5, 0.2) is -1 and 0

Fetch the mini-batches one by one.

In [99]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  
    indices = np.random.randint(m, size=batch_size)  # generate indices the highest is m the size is batch size for example; 1, 7, 8, 56, 23, 14, 10 if batch size 7 and m=60
    X_batch = scaled_housing_data_plus_bias[indices] 
    y_batch = housing.target.reshape(-1, 1)[indices] 
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

### 9. Saving and Restoring Models

To save models with `TensorFlow`: <br \ >
1- Create a `saver` node at the end of the construction phase after all variable nodes are created. <br \ >
2- Call `save()` method in the execution phase passing it the session and path of the checkpoint file. (Use it whenever you change the model )

In [102]:
from tensorflow.python.framework import ops                      # library imported to reset the graph
ops.reset_default_graph()                                        # notice that we start with an empty graph

n_epochs = 1000                                                                      
learning_rate = 0.01                                                            

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                      
error = y_pred - y                                                                    
mse = tf.reduce_mean(tf.square(error), name="mse")                                    
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            
training_op = optimizer.minimize(mse)                                                

init = tf.global_variables_initializer()
saver = tf.train.Saver()                                        # create the saver node after all the variables nodes are created 

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:                                    # checkpoint every 100 epochs
            print("Epoch", epoch, "MSE =", mse.eval())                                
            save_path = saver.save(sess, "/tmp/my_model.ckpt")  # use the save_path() to save the model
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")    # use the save_path() to save the model 

Print the `best_theta`.

In [104]:
best_theta

__To restore models with `TensorFlow`: <br \ >__

1. Create a `saver` node at the end of the constructtion phase. <br \ >
2. Instead of initializing the variables using the `init` node, we call the `restore()` method of the `saver` object. <br \> For example, use `saver.restore(sess, "/tmp/my_model_final.ckpt")` instead of `init` code.

By default a `Saver` saves and restores all variables under their own name, but we can also specify which variables to save or restore, and under which names to save. For example, `saver = tf.train.Saver({"w": theta})` will save or restore only the `theta` variable under the name `'w'`.

The `save()` method saves the structure of the graph in a second file with the same name plus a `.meta` extension. Use `tf.train.import_meta_graph()` to load this graph structure. This adds the graph to the default graph, and returns a `saver` instance. The instance can be used to restore the graph’s state.

In [108]:
ops.reset_default_graph() # notice that we start with an empty graph

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") 

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval() 

Print `best_theta_restored`.

In [110]:
best_theta_restored

`np.allclose()` returns true if the two arrays are equal element-wise within a tolerance. This means that we can load a previously saved model without having to have the code that built it.

In [112]:
np.allclose(best_theta, best_theta_restored) 

### 10. Tensorflow in Use

Some of the current uses of Tensorflow are:

- ### 1. Deep Speech

  -Organization: Mozilla
  
  -Domain: Speech Recognition
  
  -Description: A TensorFlow implementation motivated by Baidu's Deep Speech architecture.

- ### 2. RankBrain

  -Organization: Google
  
  -Domain: Information Retrieval
  
  -Description: A large-scale deployment of deep neural nets for search ranking on www.google.com
  
  -More info: "Google Turning Over Its Lucrative Search to AI Machines"

- ### 3. Inception Image Classification Model

  -Organization: Google
  
  -Description: Baseline model and follow on research into highly accurate computer vision models, starting with the model that won the 2014 Imagenet image classification challenge


- ### -  4. SmartReply

  -Organization: Google
  
  -Description: Deep LSTM model to automatically generate email responses

- ### 5. Massively Multitask Networks for Drug Discovery

  -Organization: Google and Stanford University
  
  -Domain: Drug discovery
  
  -Description: A deep neural network model for identifying promising drug candidates
  

_Note: For more details check https://www.tensorflow.org/about/uses_