# Part 0: Setting Up TensorFlow
By: Dylan Slack

The goal of this notebook is to provide an easy to use guide to learn some of the basics of TensorFlow (TF).  It should correspond to slides posted along with this repo.  Follow along and learn! :) 



## Installing TensorFlow
## (TODO: Before launching this notebook)
Installing TF can be a trying process in its own right (ask Emile) — particularly when trying to get it to run on the GPU.  These instructions should work for the lab machines without setting up for GPU use.  If you're trying to install on you're own computer hopefully they work as well.

---
##### GPU Aside
If you're wondering why we might want the GPU, GPU's allow us to perform lots of computations concurrently over many simple cores.  CPU's have a few complex cores.  If our computations are simple enough (which they are in many cases in deep learning, think matrix multiplication), we can let a GPU perform them in parallel.  This saves us *a lot* of time when we're building large models.

---

We're going to install TF in a virtual environment.  Virtual environments allow us to create different sets of dependencies for different projects; the good news is that if we screw something up trying to install TF in our virtual environment, it shouldn't mess anything else up!

First, setup the virtual environment:

``` conda create -n tf_tutorial ```

You should be in the same directory as the virtual environment. Now:

``` source activate tf_tutorial ```

Install pip, TF, and another package that will help us manage different jupyter notebook kernels.  This could take a second:

``` conda install pip```

``` pip install tensorflow ```

``` pip install matplotlib ```

``` pip install pandas ```

``` pip install sklearn ```

```conda install nb_conda```

Finally, load the notebook:

```jupyter notebook```

Navigate to this notebook and load it up!  Hopefully it works!

# Part 1: TensorFlow Basics

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
tf.logging.set_verbosity(tf.logging.ERROR)

In [2]:
##
# Slide 7
##

In [3]:
## Simple graph example 
x = 4
y = 2
add = tf.add(x,y)
mul = tf.multiply(x,y)
output_1 = tf.multiply(add,mul)
output_2 = tf.pow(add, y)

In [4]:
## What happens if we just print the tensor
print(output_1)

Tensor("Mul_1:0", shape=(), dtype=int32)


In [5]:
## But, what if we include a session
with tf.Session() as sess:
    correct_output = sess.run(output_1)
    print (correct_output)

48


In [6]:
# Further, we can also evaluate output_2
with tf.Session() as sess:
    second_output = sess.run(output_2)
    print (second_output)
    
# Starting to get the idea?

36


In [7]:
# Also, this works
with tf.Session() as sess:
    one,two = sess.run([output_1, output_2])
    print ("One:",one,"Two:",two)

One: 48 Two: 36


In [8]:
##
# Slide 8
##

In [9]:
"""
 * There are many such operations we can do, here we multiple [3,4] by the indentity matrix.
 * We introduce constants, which we can name
 * We want to name them because we can visualize them using tensorboard -- a **really** useful graph visualization
   tool
"""

m_1 = tf.constant([3,4], name="hello")
m_2 = tf.constant([[1,0],[0,1]], name="tensorflow")
r = tf.multiply(m_1,m_2, name="multiplication")

# We make our tensorboard call here. Run http://localhost:6006/#graphs&run=. to see the visualizaton
# Run tensorboard --logdir=graphs from the home directory of this project
writer = tf.summary.FileWriter('./graphs', tf.get_default_graph())

with tf.Session() as sess:
    print (sess.run(r))
writer.close()

[[3 0]
 [0 4]]


In [10]:
"""
 * Constants are bad because they're hardcoded into the defintion of the graph
 * Let's see what that means
"""
arr = [2.0,3.0]

bad_constant = tf.constant(arr)
with tf.Session() as sess:
        # Uncover this print statement to see 
        # print (sess.graph.as_graph_def())
        pass
    

# This starts to get out of hand for really large constants

In [11]:
## 
# Slide 9
##

In [12]:
"""
 * Variables maintain the state of the graph across calls to run
 * Unlike constants they must be initialized
 
NOTE: if you try and run this cell again, it will fail because there will be a     
      variable that already exists with the same name.  Hit the >> button on 
      the toolbar uptop to resolve the issue!!
"""

# This still suffers from the variable loading problem :O
bad_var_1 = tf.Variable(2, name="scalar_example")

# This is a better way to do things
var_1 = tf.get_variable("scalar_example", initializer=tf.constant(2.0)) 
var_2 = tf.get_variable("array_example", initializer=tf.constant([1.0,0.0]))

# This just gives a 3x3 matrix with random pulls from a normal distribution
var_3 = tf.get_variable("other_array", (3,3),
                        initializer=tf.random_normal_initializer())

out_1 = tf.add(var_1, var_3)
out_2 = tf.multiply(var_1, var_2)

# We need to initialize these variables and do so as such
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print (sess.run(out_1))

[[3.2048893 2.6847842 2.7117345]
 [2.0592587 1.1133807 1.8628153]
 [2.4075432 2.2154744 2.8005471]]


In [13]:
# Here's why they're variables

the_output = var_1.assign(2.0 * var_1)
with tf.Session() as sess:
    sess.run(var_1.initializer)
    print (sess.run(the_output))
    print (sess.run(the_output))
    print (sess.run(the_output))
    
# var_1 retains it's value across session runs

4.0
8.0
16.0


In [14]:
##
# Slide 10
## 

In [15]:
"""
* Variables can have different values across different sessions!
"""

var_1 = tf.get_variable("sessions_example", initializer=tf.constant(5))

session_1 = tf.Session()
session_2 = tf.Session()

session_1.run(var_1.initializer)
session_2.run(var_1.initializer)

option_1 = var_1.assign(2 * var_1)
other_option = var_1.assign(100 * var_1)
print (session_1.run(option_1))
print (session_2.run(other_option))

# We need to close both of these sessions because we didn't use "with" here
# With automatically closes the session once the program leaves the scope of with
session_1.close() 
session_2.close()

10
500


In [16]:
"""
* Aside on shape.  If you're familiar with matrix operations with packages like numpy, skip this
* It might be the case that readers don't have a good sense of how computations using matrices
  are handled by matrix computation packages like TF but also popular packages like numpy and
  pandas
* Here we discuss this briefly
"""

# Suppose we have the scalar value 
a = 3 

# We use a package called numpy that handles matrices really well to convert it to a matrix
a_arr = np.array(3)
print (a)
print (a_arr.shape)
print ("---")

# We see that it has no shape. Consider next:
b = [3,3]
b_arr = np.array(b)
print (b)
print (b_arr.shape)
print ('---')

# The shape here is (2,) to reflect that there is one dimension with two values. Further:
c = [[1,2],[3,4]]
c_arr = np.array(c)
print (c)
print (c_arr.shape)
print ('---')

# Now the shape is (2,2) to reflect 2 values across two dimensions
# We can create larger arrays with more dimensions

d = np.random.rand(5,5,5,5)
print (d.shape)

# This array has four dimensions with 5 values of random numbers on the range of 0 - 1 

3
()
---
[3, 3]
(2,)
---
[[1, 2], [3, 4]]
(2, 2)
---
(5, 5, 5, 5)


In [17]:
##
# Slide 11
## 

In [18]:
"""
* We now introduce placeholders
* Placeholders define what we want out data to look like.  We can include placeholders in 
  our graph in locations where we want to feed in data later on.
"""

# The shape of our expected input is [None, 3], meaning we accept any value in the dimension with None
# and expect there to be three values in the second dimension
in_arr = tf.placeholder(tf.float32, shape=[None,3])

multiplier = tf.constant([1,2,3], tf.float32)

out = tf.multiply(in_arr, multiplier)

with tf.Session() as sess:
    # We include our desired input as a "feed dict"
    print (sess.run(out, feed_dict={in_arr: [[1,0,0], [0,1,0], [0,0,1]]}))
    
# We compute the array [1,2,3] times the identity matrix in this case 

[[1. 0. 0.]
 [0. 2. 0.]
 [0. 0. 3.]]


In [19]:
# Here's another example

in_one = tf.placeholder(tf.float32, shape=3)
in_two = tf.placeholder(tf.float32, shape=[])

out = tf.multiply(tf.reduce_sum(in_one),tf.add(in_two, in_two))

with tf.Session() as sess:
    print (sess.run(out, feed_dict={in_one:[2,2,2], in_two:5}))

60.0


In [20]:
"""
Aside: a really bad coding practice in TF, don't do this

You have to remember that tensorflow builds graph edges on calls like tf.add, tf.subtract, tf.multiply...
If you loop over these calls, you'll just end up adding more edges to the graph
"""

x = tf.Variable(1, name="bad_one")
y = tf.Variable(2, name="bad_two")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(5):
        sess.run(tf.add(x,y)) # Could you save lines? Nope!
writer.close()

Here, the user cearly wants one operation for add.  However, a new operation is added on each call to tf.add
so there's an operation added on every iteration. This is bad because the size of the graph could blow up in 
your face.

# Part 2: Building a Linear Regression in TensorFlow

In [21]:
##
# Slide 13 Onward
##
from IPython.display import Math

### Step 1: Load the data
First, we're going to load the data.  I've already created a train-test split for the boston housing dataset.  The boston housing dataset provides housing values in the suburbs of boston in relation to features like crime, average rooms, age, tax, etc. All we have to do is load the data using pandas. 

In [22]:
train_data_path = "./data/train_boston_housing.csv"
test_data_path = "./data/test_boston_housing.csv"

# This is the label of the target variable
# Its the median value of owner-occupied homes in $1,000s
LABEL_COLUMN = "medv"
EPOCHS = 15
LEARNING_RATE = 0.001

train_data = pd.read_csv(train_data_path)

"""
If you're interested, you can uncover the print statement to display the
training data
"""
# print (train_data)

train_data_y = train_data[LABEL_COLUMN].values
train_data_y = train_data_y.reshape(train_data_y.shape[0],1)
train_data_X = train_data.drop([LABEL_COLUMN], axis=1).values

test_data = pd.read_csv(test_data_path)
test_data_y = test_data[LABEL_COLUMN].values
test_data_y = test_data_y.reshape(test_data_y.shape[0],1)
test_data_X = test_data.drop([LABEL_COLUMN], axis=1).values

### Step 2: Preprocess the data
Next, we standardize features by removing the mean and scaling to unit variance.  Specifically, we compute z = (x - u)/s where x is the training data, u is the mean of the training data, s is the standard deviation of the training data, and z is the preprocessed training data.

This is important to do because differences in the feature ranges can cause issues with fitting our model.  Making our data look like a gaussian with mean 0 and unit variance makes things work better.

Of importance, we apply the same u and s that we do to the training data to the testing data.  If we don't, we're "looking ahead" to our training data which is a no-no.

In [23]:
from sklearn.preprocessing import StandardScaler

scal = StandardScaler()
scal.fit(train_data_X)

train_data_X = scal.transform(train_data_X)
test_data_X = scal.transform(test_data_X)
NUM_SAMPLES = train_data_X.shape[0]
NUM_FEATURES = train_data_X.shape[1]

### Step 3: Setup the linear regression graph and optimizer
I give the code first, then work through an explanation.  There's not a lot of code but a good amount of theory here. 

In [24]:
# (1)
X = tf.placeholder(tf.float32,shape=[None,NUM_FEATURES])
y = tf.placeholder(tf.float32,shape=[None,1])

# (2)
weights = tf.get_variable('weights',shape=[NUM_FEATURES, 1],
                          initializer=tf.random_normal_initializer())
bias = tf.get_variable('bias',shape=[1],
                       initializer=tf.random_normal_initializer())

# (3)
mat_out = tf.matmul(X, weights)
y_pred = tf.add(mat_out, bias)

# (4)
loss = tf.reduce_sum(tf.square(y - y_pred)) / 2

optimizer = tf.train.GradientDescentOptimizer(learning_rate=LEARNING_RATE).\
    minimize(loss) #<< SO easy!!!!

(1) We're now ready to setup the graph of our linear regression.  We let our regression accept inputs with the dimensions [None, NUM_FEATURES] because we want to allow for computation across multiple inputs.  The same rationale holds true for y.

(2) We next initialize the weights and a bias using our variable initialization patterns from before.  These are our trainable parameters that we're going to optimize.

(3) Finally, we can setup the computation for the outputs of our regression.  We define the output of our regression as follows: $y_{pred} = X \cdot \theta + b$

Here, $\theta$ represents our learned weights and $b$ is a bias term.  

Let's think about the dimensionality for a second and why this works.  Consider if we had 20 training points X each with 10 features.  The dimensionality of X would be (20,10) representing 20 rows of training points stacked on top of one another each with 10 features.    

$\theta$ would have 10 weights (one for each feature).  By performing a matrix multiplication with dimensions $(20,10)\cdot (10,1)$ we can get an output of $(20,1)$, which is the weights applied to each row.

(4) We can now define our loss. The loss is a function we setup that defines our desirable our predictions are.  The loss we are using here is $$ \mathcal{L_\theta} = \frac{(y^{(i)} - y_{pred}^{(i)})^2}{2}  $$  

For simplicity, we only consider the loss over two inputs in this implementation.  However, a more generalized version of the loss would look like: $$ \mathcal{L}_\theta = \frac{1}{2n} \sum\limits_{i=1}^{n}(y^{(i)} - y_{pred}^{(i)})^2$$

Lastly, we can set up an optimizer called the GradientDescentOptimizer to compute the gradient descent algorithm over our variables. Gradient descent iteratively computes: 
\begin{equation}
\begin{split}
\theta_{i+1} & = \theta_{i} - \alpha \nabla_\theta \mathcal{L}_\theta \\
 & =  \theta_{i} - \alpha(y^{(i)} - y_{pred}^{(i)})x^{(i)} 
\end{split}
\end{equation}
The intuition for why this works is that we take our weights and compute the gradient with respect to a loss functions that defines a goal.  The goal here is to get the predictions as close to the actual as possible.  We do so by taking the gradient of our loss with respect to the current weights and then take a small "step" in the right direction.  The right direction is the opposite of the gradient because we want to minimize (thus subtraction).  We have $\alpha$ as a multiplier to limit how far of a step we take.  We refer $\alpha$ as the learning rate.  Smaller learning rates mean smaller steps.

It's amazing that we can do all this with such a simple call in TensorFlow!

### Step 4: The Learning Loop
We're finally ready to setup the learning loop.  The learning loop iteratively performs updates on $\theta$ using gradient descent.  We call our method by calling sess.run() on the optimizer over multiple passes through all the training data.   We really only call sess.run() on the loss so we can gt it for a print out. 

After we're done, we perform predictions on the test data set to evaluate how well we did.  

In [25]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(EPOCHS+1):
        accum_loss = 0
        
        # Iterate over the train data and labels
        for x_val, y_val in zip(train_data_X, train_data_y):
            _, l  = sess.run([optimizer, loss], 
                            feed_dict={X: [x_val], y: [y_val]})
            accum_loss += l
            
        print ("Epoch:",i,"Mean Loss:",accum_loss / NUM_SAMPLES)
            
    # Here are the weights and bias we learn
    print("Learned weights:\n",sess.run(weights))
    print("Learned bias:\n",sess.run(bias))
    
    # Show how well we did on our test set
    predicted_values = sess.run(y_pred, feed_dict={X: test_data_X})
    test_mse = sess.run(tf.reduce_mean(tf.square(predicted_values - test_data_y)))
    print ("Test MSE:",test_mse)

Epoch: 0 Mean Loss: 203.7207158579685
Epoch: 1 Mean Loss: 91.86812633962029
Epoch: 2 Mean Loss: 46.76892905055296
Epoch: 3 Mean Loss: 27.040234817608507
Epoch: 4 Mean Loss: 18.31549652841273
Epoch: 5 Mean Loss: 14.422666630707573
Epoch: 6 Mean Loss: 12.662005137636116
Epoch: 7 Mean Loss: 11.846505802636722
Epoch: 8 Mean Loss: 11.452822254092283
Epoch: 9 Mean Loss: 11.249580900348645
Epoch: 10 Mean Loss: 11.134139232822983
Epoch: 11 Mean Loss: 11.060740884882158
Epoch: 12 Mean Loss: 11.008747823848253
Epoch: 13 Mean Loss: 10.968701045546828
Epoch: 14 Mean Loss: 10.936096067328497
Epoch: 15 Mean Loss: 10.908657537852639
Learned weights:
 [[-0.6015169 ]
 [ 0.5940588 ]
 [ 0.06025974]
 [ 0.87217134]
 [-1.373715  ]
 [ 3.2051513 ]
 [ 0.04334285]
 [-2.126282  ]
 [ 1.4754201 ]
 [-1.0264769 ]
 [-1.9868598 ]
 [ 0.8629196 ]
 [-3.1517959 ]]
Learned bias:
 [22.523546]
Test MSE: 27.119338979078805
