### Your name:

<pre>Craig Barbisan</pre>

### Collaborators:

Solutions sourced/adapted from: https://github.com/ageron/handson-ml/blob/master/09_up_and_running_with_tensorflow.ipynb</pre>


In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
# to make this notebook's output stable across runs
np.random.seed(123)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12


### TensorFlow

Q1. When is a variable initialized? When is it destroyed?

Variables are initialized when you call their initializer. Variables are destroyed when the session ends.

In a distributed environment, variables are held in containers on the cluster and do not get destroyed when the session ends. To destroy a variable in this scenario, you need to clear its container.

Q2. What is the difference between a placeholder and a variable?

Placeholders hold information about the type and shape of the tensor they represent. They have no value. They are used to feed training or test data to TensorFlow during the execution phase.

A variable is an operation that holds a value. If you run the variable, it returns that value. Variables need to be initialized before they can be run. A variable's value can be changed, and it keeps the same value upon successive runs of the graph. 

Q3. How many times does reverse-mode autodiff need to traverse the graph in order to compute the gradients of the cost function with regards to 10 variables? What about forward-mode autodiff? And symbolic differentiation?

Reverse-mode autodiff needs to traverse the graph 2 times to compute the gradients of the cost function with regards to 10 variables.

Forward-mode autodiff needs to traverse the graph 10 times to compute the gradients of the cost function with regards to 10 variables.

Symbolic differentiation doesn't traverse the graph to compute gradients.

Q4. Implement Logistic Regression with Mini-batch Gradient Descent using TensorFlow. Train it and evaluate it on the moons dataset (introduced in Chapter 5). Try adding all the bells and whistles:

- Define the graph within a logistic_regression() function that can be reused easily.

- Save checkpoints using a Saver at regular intervals during training, and save the final model at the end of training.

- Restore the last checkpoint upon startup if training was interrupted.

- Define the graph using name scopes so the graph looks good in TensorBoard.

- Add summaries to visualize the learning curves in TensorBoard.

- Try tweaking some hyperparameters such as the learning rate or the mini-batch size and look at the shape of the learning curve.

In [2]:
from datetime import datetime

# constructs the graph for the logistic regression equation
def logistic_regression(X, y, initializer=None, seed=42, learning_rate=0.01):
    n_inputs_including_bias = int(X.get_shape()[1])
    with tf.name_scope("logistic_regression"):
        with tf.name_scope("model"):
            if initializer is None:
                initializer = tf.random_uniform([n_inputs_including_bias, 1], -1.0, 1.0, seed=seed)
            theta = tf.Variable(initializer, name="theta")
            logits = tf.matmul(X, theta, name="logits")
            y_proba = tf.sigmoid(logits)
        with tf.name_scope("train"):
            loss = tf.losses.log_loss(y, y_proba, scope="loss")
            optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
            training_op = optimizer.minimize(loss)
            loss_summary = tf.summary.scalar('log_loss', loss)
        with tf.name_scope("init"):
            init = tf.global_variables_initializer()
        with tf.name_scope("save"):
            saver = tf.train.Saver()
    return y_proba, loss, training_op, loss_summary, init, saver


def random_batch(X_train, y_train, batch_size):
    rnd_indices = np.random.randint(0, len(X_train), batch_size)
    X_batch = X_train[rnd_indices]
    y_batch = y_train[rnd_indices]
    return X_batch, y_batch

def log_dir(prefix=""):
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "tf_logs"
    if prefix:
        prefix += "-"
    name = prefix + "run-" + now
    return "{}/{}/".format(root_logdir, name)

In [3]:
from sklearn.datasets import make_moons
import os

m = 1000
X_moons, y_moons = make_moons(m, noise=0.1)

# add an extra bias feature
X_moons_with_bias = np.c_[np.ones((m, 1)), X_moons]

# reshape y_train to make it a column vector
y_moons_column_vector = y_moons.reshape(-1, 1)

# split the data into training and test
test_ratio = 0.2
test_size = int(m * test_ratio)
X_train = X_moons_with_bias[:-test_size]
X_test = X_moons_with_bias[-test_size:]
y_train = y_moons_column_vector[:-test_size]
y_test = y_moons_column_vector[-test_size:]

# reset the default graph
tf.reset_default_graph()

n_inputs = 2
logdir = log_dir("logreg")

X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

y_proba, loss, training_op, loss_summary, init, saver = logistic_regression(X, y)

file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10001
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

checkpoint_path = "/tmp/my_logreg_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_logreg_model"

with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)

    for epoch in range(start_epoch, n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = random_batch(X_train, y_train, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        loss_val, summary_str = sess.run([loss, loss_summary], feed_dict={X: X_test, y: y_test})
        file_writer.add_summary(summary_str, epoch)
        if epoch % 500 == 0:
            print("Epoch:", epoch, "\tLoss:", loss_val)
            saver.save(sess, checkpoint_path)
            with open(checkpoint_epoch_path, "wb") as f:
                f.write(b"%d" % (epoch + 1))

    saver.save(sess, final_model_path)
    y_proba_val = y_proba.eval(feed_dict={X: X_test, y: y_test})
    os.remove(checkpoint_epoch_path)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Epoch: 0 	Loss: 0.9290874
Epoch: 500 	Loss: 0.2795907
Epoch: 1000 	Loss: 0.27295217
Epoch: 1500 	Loss: 0.2734188
Epoch: 2000 	Loss: 0.27229783
Epoch: 2500 	Loss: 0.2730588
Epoch: 3000 	Loss: 0.27329552
Epoch: 3500 	Loss: 0.2730586
Epoch: 4000 	Loss: 0.27253303
Epoch: 4500 	Loss: 0.27305058
Epoch: 5000 	Loss: 0.2732856
Epoch: 5500 	Loss: 0.27387607
Epoch: 6000 	Loss: 0.27318084
Epoch: 6500 	Loss: 0.27417946
Epoch: 7000 	Loss: 0.2733306
Epoch: 7500 	Loss: 0.27319005
Epoch: 8000 	Loss: 0.27357602
Epoch: 8500 	Loss: 0.27382684
Epoch: 9000 	Loss: 0.2734192
Epoch: 9500 	Loss: 0.2732019
Epoch: 10000 	Loss: 0.27364567


Q5. Similar to the linear regression implementation in class, write a lasso regression implementation. Use the same dataset, and choose a value for the penalty $\alpha$:

Using a Saver at regular intervals during training, and save the final model at the end of training.

Restore the last checkpoint upon startup if training was interrupted.

Define the graph using name scopes so the graph looks good in TensorBoard.

Add summaries to visualize the learning curves in TensorBoard.

Try tweaking some hyperparameters such as the learning rate or the mini-batch size and look at the shape of the learning curve.

In [4]:
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
m, n = housing.data.shape

scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [5]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01
alpha = 0.1

logdir = log_dir("lasso")
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

checkpoint_path = "/tmp/my_lasso_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_lasso_model"

with tf.name_scope("lasso_regression"):
    with tf.name_scope("model"):
        X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
        y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
        theta = tf.Variable(tf.random_normal(shape=(n+1,1), seed=42, dtype=tf.float32), name="theta")
        y_pred = tf.matmul(X, theta, name="predictions")
    with tf.name_scope("train"):
        error = y_pred - y
        reg_parm = tf.multiply(alpha,theta)
        cost = tf.add(tf.reduce_mean(tf.square(error)), reg_parm, name="cost")
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
        training_op = optimizer.minimize(cost)
    with tf.name_scope("init"):
        init = tf.global_variables_initializer()
    with tf.name_scope("save"):
        saver = tf.train.Saver()

with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "Cost =", cost.eval())
            sess.run([training_op])
        
        if epoch % 500 == 0:
            saver.save(sess, checkpoint_path)
            with open(checkpoint_epoch_path, "wb") as f:
                f.write(b"%d" % (epoch + 1))

    saver.save(sess, final_model_path)
    os.remove(checkpoint_epoch_path)

Epoch 0 Cost = [[17.993376]
 [18.007679]
 [17.953821]
 [18.023912]
 [17.932095]
 [17.938606]
 [18.142143]
 [18.159555]
 [17.875874]]
Epoch 100 Cost = [[10.381736]
 [10.369181]
 [10.304965]
 [10.380938]
 [10.28957 ]
 [10.298252]
 [10.467155]
 [10.452883]
 [10.270487]]
Epoch 200 Cost = [[6.466881 ]
 [6.4313293]
 [6.362198 ]
 [6.4393077]
 [6.3486576]
 [6.358741 ]
 [6.5001454]
 [6.4682975]
 [6.351897 ]]
Epoch 300 Cost = [[4.3335385]
 [4.278538 ]
 [4.2077947]
 [4.2831492]
 [4.193444 ]
 [4.2047086]
 [4.3238735]
 [4.283011 ]
 [4.2096534]]
Epoch 400 Cost = [[3.1024516]
 [3.0311635]
 [2.9607437]
 [3.0326743]
 [2.9441018]
 [2.9565592]
 [3.0574539]
 [3.012652 ]
 [2.9673905]]
Epoch 500 Cost = [[2.354402 ]
 [2.2695804]
 [2.2005563]
 [2.2681901]
 [2.1809044]
 [2.1946325]
 [2.280323 ]
 [2.2344854]
 [2.2075741]]
Epoch 600 Cost = [[1.8799244]
 [1.783933 ]
 [1.7168378]
 [1.7798089]
 [1.6939337]
 [1.7090003]
 [1.7819062]
 [1.7366002]
 [1.7216749]]
Epoch 700 Cost = [[1.5686082]
 [1.4634519]
 [1.3984885]
 