# Introduction to TensorFlow
This notebook will introduce you to the basics of TensorFlow as well as walk through a linear regression example, a concept you should already be familiar with, with you.

In [11]:
# Import required libraries
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import StandardScaler
from datetime import datetime

from sklearn.datasets import fetch_california_housing

## TensorFlow sessions

Every TensorFlow session takes care of placing the operations onto devices such as CPUs and GPUs and running them. There are two types of TensorFlow sessions:
1. Regular sessions: require you to start and close the session, as well as declare the session as the live session in a separate "block"
2. Interactive sessions: still requires you to start and close the session, but does not require you to declare the session as the live session in a separate block 

TensorFlow will automatically evaluate sets of nodes that depend on each other, e.g.:<br>

w = tf.constant(3) <br>
x = w + 2 <br>
y = x + 5 <br>
z = x + 3 <br>

#### Sample regular session

In [2]:
# Set up constants and variables
x = tf.Variable(3, name = "x")
y = tf.Variable(4, name = "y")
f = x*x*y + y + 2

# Evaluate the variables in a regular session
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

# If you don't want to write sess.run() for every session, you can use a starting block
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
    print(result)
sess.close()

42
42


#### Sample interactive session

In [3]:
# Set up constants and variables
x = tf.Variable(3, name = "x")
y = tf.Variable(4, name = "y")
f = x*x*y + y + 2

# Evaluate the variables in an interactive session
sess = tf.InteractiveSession()
x.initializer.run()
y.initializer.run()
result = f.eval()
print(result)
sess.close()

42


You can also shorten your code by initializing all your variables at once using tf.global_variables_initializer().run()

## TensorFlow graphs
TensorFlow operations are conducted within the framework of a graph. You can create a TensorFlow graph very easily, as creating a node will add it to the default graph.

In [4]:
# Add a node to the default graph
x1 = tf.Variable(1)
print(x1.graph is tf.get_default_graph())

# Create a separate default graph, and add two nodes to it
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
    x3 = tf.Variable(3)
    
# You can test which node the graph belongs to
print(x2.graph is graph)

# You can also reset your default graph when you've ran the same commands multiple times, 
# meaning you've added many duplicate nodes to the same graph
tf.reset_default_graph()

True
True


Before we talk about linear regression, it is important to talk about some of the vocabulary in TensorFlow:
* Ops: TensorFlow operations for multiplication and addition etc. take any number of inputs and produce any number of outputs
* Tensors: Inputs and outputs are multidimensional arrays with a shape and type

Deep learning relies on linear algebra (working with arrays) to make large calculations faster, instead of using for loops for instance. Also, TensorFlow doesn't make any calculations that are added to a graph until the graph is run. This also helps optimize run times.

## Linear Regression

#### Deterministic solution
Linear regression has a deterministic solution using linear algebra. Let's use the deterministic solution to create a linear regression for a sample data set. Recall the linear regression least squares estimator solution for the coefficients is equal to:
$$(X^TX)^{-1} \cdot y$$
<br>Before we write the deterministic solution for linear regression, we need to learn a few functions in TensorFlow:
* matmul(A,B): matrix multiplication
* matrix_inverse(A): inverse of a matrix
* transpose(A): transpose of a matrix

In [5]:
# Let's pull one of the datasets from sklearn and understand the format and data type for housing
housing = fetch_california_housing()
housing

{'data': array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
 'feature_names': ['MedInc',
  'HouseAge',
  'AveRooms',
  'AveBedrms',
  'Population',
  'AveOccup',
  'Latitude',
  'Longitude'],
 'DESCR': 'California housing dataset.\n\nThe original database is available from StatLib\n\n 

In [6]:
# The data comes as a sklearn Bunch object
print(type(housing))

# A scklean Bunch is similar to a json-format for data or a Python dictionary. You can do things like call certain keys:
print(housing['DESCR'])

# The feature names are the column names for the data array, and the target variable is described in the 'DESCR' value
print(len(housing['feature_names']))

<class 'sklearn.utils.Bunch'>
California housing dataset.

The original database is available from StatLib

    http://lib.stat.cmu.edu/datasets/

The data contains 20,640 observations on 9 variables.

This dataset contains the average house value as target variable
and the following input variables (features): average income,
housing average age, average rooms, average bedrooms, population,
average occupation, latitude, and longitude in that order.

References
----------

Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,
Statistics and Probability Letters, 33 (1997) 291-297.


8


In [7]:
# We can use numpy arrays to work in TensorFlow. Recall the deterministic linear regression solution from above.
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m,1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")
XT = tf.transpose(X)
coefficients = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT),y)

with tf.Session() as sess:
    theta_value = coefficients.eval()
    print(theta_value)
    print(len(theta_value))
sess.close()
print(n)
print(housing_data_plus_bias.shape, housing.data.shape)

[[-3.7185181e+01]
 [ 4.3633747e-01]
 [ 9.3952334e-03]
 [-1.0711310e-01]
 [ 6.4479220e-01]
 [-4.0338000e-06]
 [-3.7813708e-03]
 [-4.2348403e-01]
 [-4.3721911e-01]]
9
8
(20640, 9) (20640, 8)


_Checks for understanding:_
* Why do we have to run tf.Session() as sess? Is it is a regular session or interactive session?
* What happens when we run coefficients.eval()? Describe the order in which the variables are evaluated.
* How do we interpret theta_value?
* What is a limitation of running linear regression in this way?
* Does the linear regression result contain a constant (y = aX + b) or not (y = aX)? Explain how.

#### Gradient descent

We can also solve linear regression manually with gradient descent, whose calculations are optimized for speed when using TensorFlow. Before we begin linear regression with gradient descent, there are a few functions in TensorFlow to learn:
* random_uniform(): creates a node in the default graph with random values in a matrix, given a shape and value range
* assign(): assigns a new value to a variable, i.e. updating a variable versus declaring the value of a new variable node

Recall the psuedo code describing what gradient descent does.

$coefficents = random\_uniform()\\
for \ n \ iterations:\\
 \  \  \ current\_cost = mse(coefficients)\\
 \  \  \ if \ current\_cost < tolerance:\\
 \  \  \  \  \  \ break\\
\  \  \ coefficients = coefficients - learning\_rate*gradient\\
return \ coefficients$

In [8]:
# Set some constants for the gradient descent algorithm
n_epochs = 1000
learning_rate = 0.01

# Scale your data for gradient descent. We use the StandardScaler library
m, n = housing.data.shape
scaler = StandardScaler()
scaler.fit(housing.data)
scaled_data = scaler.transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m,1)), scaled_data]

# Set your input data as data matrices
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")

# Set the answer to a random matrix of coefficients using random_uniform
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name = "theta")

# Calculate the current_cost function value
y_pred = tf.matmul(X, theta, name = "predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name = "mse")

# Calculate the gradient (a multi-variable derivative) and the new theta value
# See https://spin.atomicobject.com/wp-content/uploads/linear_regression_gradient1.png to understand the math behind the gradient
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate*gradients)

# This is a shortcut in TensorFlow to initialize all constants/variables at once
init = tf.global_variables_initializer()

# Evaluate the session
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Iteration:", epoch, "MSE = ", mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
    print(best_theta)
sess.close()

Iteration: 0 MSE =  7.469492
Iteration: 100 MSE =  1.0072225
Iteration: 200 MSE =  0.82173306
Iteration: 300 MSE =  0.7433307
Iteration: 400 MSE =  0.68697
Iteration: 500 MSE =  0.64556336
Iteration: 600 MSE =  0.61504304
Iteration: 700 MSE =  0.59248376
Iteration: 800 MSE =  0.57575727
Iteration: 900 MSE =  0.56331325
[[ 2.0685523 ]
 [ 0.87307703]
 [ 0.17952992]
 [-0.24919613]
 [ 0.2489019 ]
 [ 0.01704344]
 [-0.04539434]
 [-0.41602504]
 [-0.38697636]]


_Checks for understanding?_
* Why is it important to scale your data when using gradient descent? What will happen if you do not do so?
* How do we interpret the coefficients in this instance, and why are they different from before?
* Why is theta.eval() written after the for loop for calculating training_op?

Note, in the above example we are calculating gradients with a formula using what we know about the partial derivative as well as finding the minimum MSE using for loops. However, TensorFlow has a built in gradient function as well as a built in optimizer framework to get around these manual calculations.

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate)<br>
training_op = optimizer.minimize(mse)

## Saving and visualizing models
TensorFlow provides us an easy way to save and visualize models. We will save the gradient descent linear regression model (updated with the built-in functions mentioned) as an example below. We just add two lines:

saver = tf.train.Saver() <br>
save_path = saver.save(sess, '/tmp/gradient_descent_sample.ckpt')<br>
saver.restore(sess, 'temp/gradient_descent_sample.ckpt')

In [9]:
# Set some constants for the gradient descent algorithm
n_epochs = 1000
learning_rate = 0.01

# Scale your data for gradient descent. We use the StandardScaler library
m, n = housing.data.shape
scaler = StandardScaler()
scaler.fit(housing.data)
scaled_data = scaler.transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m,1)), scaled_data]

# Set your input data as data matrices
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")

# Set the answer to a random matrix of coefficients using random_uniform
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name = "theta")

# Calculate the current_cost function value
y_pred = tf.matmul(X, theta, name = "predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name = "mse")

# Calculate the gradient (a multi-variable derivative) and the new theta value
# See https://spin.atomicobject.com/wp-content/uploads/linear_regression_gradient1.png to understand the math behind the gradient
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate)
training_op = optimizer.minimize(mse)

# This is a shortcut in TensorFlow to initialize all constants/variables at once
init = tf.global_variables_initializer()

# Initialize the saver
saver = tf.train.Saver()

# Evaluate the session
with tf.Session() as sess:
    sess.run(init)
    sess.run(training_op)
    best_theta = theta.eval()
    print(best_theta)
    save_path = saver.save(sess, '/tmp/gradient_descent_sample.ckpt') # Add a save path for your model
sess.close()

[[-0.6959382 ]
 [ 0.36700886]
 [-0.53080046]
 [ 0.17899923]
 [-0.21365108]
 [ 0.87984574]
 [-0.75590307]
 [-0.57794285]
 [-0.02671256]]


In [10]:
# To pull up the saved model, you can use the restore function to, for instance, print again the best theta,
# which we had stored in the previous session

with tf.Session() as sess:
    saver.restore(sess, '/tmp/gradient_descent_sample.ckpt')
    print(best_theta)
sess.close()

INFO:tensorflow:Restoring parameters from /tmp/gradient_descent_sample.ckpt
[[-0.6959382 ]
 [ 0.36700886]
 [-0.53080046]
 [ 0.17899923]
 [-0.21365108]
 [ 0.87984574]
 [-0.75590307]
 [-0.57794285]
 [-0.02671256]]


Lastly, we can visualize our models in TensorFlow using Google's TensorBoard web server and writing a summary of your model to a log directory. This will involve setting up a log file location and writing to a summary file. After saving your data, you can open a tensorboard session and go to localhost:6006. Note that 6006 is "goog" upside down.

In [12]:
# Recreate model while writing to a logdir

# Set up the logdir location
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

# Set some constants for the gradient descent algorithm
n_epochs = 1000
learning_rate = 0.01

# Scale your data for gradient descent. We use the StandardScaler library
m, n = housing.data.shape
scaler = StandardScaler()
scaler.fit(housing.data)
scaled_data = scaler.transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m,1)), scaled_data]

# Set your input data as data matrices
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name = "X")
y = tf.constant(housing.target.reshape(-1,1), dtype = tf.float32, name = "y")

# Set the answer to a random matrix of coefficients using random_uniform
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0), name = "theta")

# Calculate the current_cost function value
y_pred = tf.matmul(X, theta, name = "predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name = "mse")

# Calculate the gradient (a multi-variable derivative) and the new theta value
# See https://spin.atomicobject.com/wp-content/uploads/linear_regression_gradient1.png to understand the math behind the gradient
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate)
training_op = optimizer.minimize(mse)

# This is a shortcut in TensorFlow to initialize all constants/variables at once
init = tf.global_variables_initializer()

# Initialize the saver
saver = tf.train.Saver()

# Evaluate the session
with tf.Session() as sess:
    sess.run(init)
    sess.run(training_op)
    best_theta = theta.eval()
    print(best_theta)
    mse = tf.summary.scalar('MSE', mse)
    file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
sess.close()

[[ 0.5264683 ]
 [ 0.46899325]
 [-0.90169144]
 [ 0.7137366 ]
 [ 0.04586203]
 [ 0.34158915]
 [-0.01591481]
 [-0.8304522 ]
 [-0.7211892 ]]


In [None]:
# Open tensorboard with this text. You can also run this code in your command prompt
!tensorboard --logdir tf_logs/

Sources: O'Reilly Hands-On Machine Learning with Scikit-Learn & TensorFlow