<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Linear-Regression-with-TensorFlow" data-toc-modified-id="Linear-Regression-with-TensorFlow-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Linear Regression with TensorFlow</a></span></li><li><span><a href="#TensorFlow-Graphs" data-toc-modified-id="TensorFlow-Graphs-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>TensorFlow Graphs</a></span></li><li><span><a href="#TensorFlow-Sessions" data-toc-modified-id="TensorFlow-Sessions-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>TensorFlow Sessions</a></span></li><li><span><a href="#Using-TensorBoard" data-toc-modified-id="Using-TensorBoard-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Using TensorBoard</a></span></li><li><span><a href="#Linear-Regression" data-toc-modified-id="Linear-Regression-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Linear Regression</a></span><ul class="toc-item"><li><span><a href="#Notation" data-toc-modified-id="Notation-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Notation</a></span></li><li><span><a href="#Importing-Data" data-toc-modified-id="Importing-Data-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Importing Data</a></span></li><li><span><a href="#Constructing-our-graph" data-toc-modified-id="Constructing-our-graph-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Constructing our graph</a></span></li><li><span><a href="#Creating-Summaries" data-toc-modified-id="Creating-Summaries-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>Creating Summaries</a></span></li><li><span><a href="#Training-Session-for-Linear-Regression" data-toc-modified-id="Training-Session-for-Linear-Regression-5.5"><span class="toc-item-num">5.5&nbsp;&nbsp;</span>Training Session for Linear Regression</a></span></li><li><span><a href="#Evaluating-Model-Training-Using-TensorBoard" data-toc-modified-id="Evaluating-Model-Training-Using-TensorBoard-5.6"><span class="toc-item-num">5.6&nbsp;&nbsp;</span>Evaluating Model Training Using TensorBoard</a></span></li><li><span><a href="#Autodiff" data-toc-modified-id="Autodiff-5.7"><span class="toc-item-num">5.7&nbsp;&nbsp;</span>Autodiff</a></span></li></ul></li><li><span><a href="#Extensions" data-toc-modified-id="Extensions-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Extensions</a></span></li></ul></div>

# Linear Regression with TensorFlow
TensorFlow was built with deep learning in mind, but it is an incredibly flexible framework that can be used for all sorts of computational tasks. To emphasize this point, we will use TensorFlow for linear regression in the problems below. But first, lets start with a simpler computational graph...

*Please install the dependencies in the cell below!*

In [None]:
import tensorflow as tf
import numpy as np

# sklearn functions for preprocessing
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# TensorFlow Graphs
We will start off with a small graph as an example. We'll build the graph pictured below:

<img src="images/computational-graph.png" alt="Computational Graph" style="width: 300px;"/>

Let's begin by making the constant nodes, and the node that adds them together.

1. Use `tf.constant` to create a constant that holds the value 5. Save this constant to a variable, `const_5`. Make another constant called `const_2` that holds the value 2.
1. When you use `tf.constant`, you can pass a kwarg, `name`, to assign the node a name. Please name `const_5` and `const_2` (use the same name as you use for the python variables!)
1. Use the normal addition operator, `+`, to create a node that adds `const_5` and `const_2`. Call this node `add`.

In [None]:
const_5 = ...
const_2 = ...
add = ...

All TensorFlow operations belong to a specific graph. In the cell above, we made three tensorflow operations, but didn't explicitly add them to a graph. TensorFlow creates a default graph which is implicitly used when no other graph is specified. Every Tensorflow operation contains a reference to the graph it belongs to. If your code above is correct, the cell below should pass the assertions:

In [None]:
assert const_5.graph is tf.get_default_graph()
assert const_2.graph is tf.get_default_graph()
assert add.graph is tf.get_default_graph()

Constant nodes are used for numbers/tensors that will never change. TensorFlow gives us two main types of nodes for varying values:

1. `tf.placeholder`: A node that represents some user provided input. Analogous to the input layer of a neural network. Frequently, placeholders are used to represent training data or test data in a model.
1. `tf.Variable`: A node that represents a value that can be reassigned/trained.

Below, create a placeholder named `u_input`, and store it to a variable of the same name. When you create a placeholder, you must specify what data type it is meant to hold. Set `dtype=tf.int32`.

Then, create another node named `mult`, which multiplies `u_input` and `add`.

In [None]:
u_input = ...
mult = ...

# TensorFlow Sessions

Now we've constructed the graph pictured above, but we haven't done any computation! To perform actual computational work in TensorFlow, we have to start a *Session*. Sessions are responsible for dividing computational work across available system resources, like the CPUs and GPUs on your local machine. In the cell below, we will start a new session:

* Use `tf.Session` to create a new session, and store this to the variable `sess_1`
* Use `sess_1.run` to compute the value of the node `add`

In [None]:
sess_1 = ...
# use sess_1.run to compute the value at the node `add`

The value of the node `add` relies on the value of the nodes `const_5` and `const_2`. In the code you wrote above, TensorFlow evaluates only the nodes that add relies on. So, for instance, our node `u_input` was never evaluated, since `add` doesn't rely on its value.

Suppose that we want to evaluate the node `mult`. This node *does* rely on the node `u_input`. However, `u_input` doesn't have a value until we give it one ourselves; it is merely a placeholder for our input. In the cell below, we must *feed* the value of `u_input` into our graph.

* Create a dict (called `feed_dict`) that stores the values for all of our placeholders. In this case, `feed_dict` should have one key value pair. It's key should be the name TensorFlow has assigned to our `u_input` node (you may access this via `u_input.name`). Let's set `u_input`'s value to 10.
* Use `sess_1.run` to evaluate the value of the `mult` node. You may pass `feed_dict` as a kwarg.

In [None]:
feed_dict = ...
# use sess_1.run to evaluate the value of the `mult` node


We've officially constructed a computational graph and used a session to evaluate nodes in that graph! Now that we are done using this session, we must close the session so that it releases resources it has acquired in order to perform computations.

* In the cell below, call `sess_1.close`

In [None]:
# close our tf.Session


Every TensorFlow session has a TensorFlow graph associated with it. In our example above, we never explicitly declared which graph we wanted `sess_1` to evaluate, so TensorFlow picked the default graph. In the problems below, if we continued to create TensorFlow operations in the global variable environment, we would continue to add nodes to the same graph we've been building.

In future problems in this notebook, we will create a completely new graph. When we create a session to evaluate this graph, we must pass this graph explicitly to the Session constructor.

# Using TensorBoard

TensorFlow gives us tools to display information about our graphs and to keep track of node values as sessions progress. These data collection tools can all be found in the `tf.summary` module. The front-end that displays the metadata we collect on our graphs and sessions is called TensorBoard.

`tf.summary.FileWriter` gives us a way to record events that occur during a session. More specifically, an instance of the FileWriter class will write Summary [protocol buffers](https://developers.google.com/protocol-buffers/docs/pythontutorial) to event files. TensorBoard uses the event files created by a writer to display information about TensorFlow sessions.

* Store an instance of `tf.summary.FileWriter` in the variable writer.
* You must pass it a path to a directory to store event files in. Use the directory `tensorboard/simple-graph`
* You should also pass the kwarg `graph`, which specifies the graph we are considering.

In [None]:
writer = ...

A writer is used to save events to disk. However, events don't just happen on their own - we have create nodes in our graph that track events. `tf.summary` has a handful of methods that track different types of data. In this example, we will use `tf.summary.scalar` to track the value of `mult` across many evaluations of `mult`.

* In the cell below, create a `tf.summary.scalar` named "mult_output", and set it up to track the value of mult. Save this to the variable `mult_summary_op`.

In [None]:
mult_summary_op = ...

Note that `mult_summary_op` is a node/operation that belongs to the graph, just like any other. The result of evaluating this node is a tensor, just like any other node in the graph. However, instead of containing numbers, the tensor produced by `mult_summary_op` will contain a protobuf that describes the state of the `mult` node.

Since `mult_summary_op` is a node in our graph, TensorFlow won't evaluate the node unless we explicitly ask it to (or if we ask it to evaluate a node that relies on `mult_summary_op` - it is exceedingly uncommon for TensorFlow operations to rely on summary operations, though).

In the cell below, we have created a new session. Instead of creating and closing a session manually, we can use sessions as context managers. This way, the session will automatically close once the `with` block is done executing. Within the `with` block you should:

* create a loop that iterates 5000 times
* on each iteration, evaluate the nodes `mult` and `mult_summary_op`. You may do this with a single call to `sess.run` by passing it a list of nodes/operations to evaluate.
* on each evaluation, you should feed the graph a different value for `u_input`. Try, for example, feeding it the square of the iteration you are on.
* save the result of each session run to a variable, `summary`.
* use `writer.add_summary` to record the `summary`. Pass it the iteration number you are on as well - this will act as the x-coordinate for a graph of `mult` over each iteration.

In [None]:
with tf.Session() as sess:
    pass

Now you've generated an event file in `tensorboard/simple-graph/run1`! From the command line, run the command:

    tensorboard --logdir=tensorboard/simple-graph

This will start a server which watches for changes to event files in the `logdir`, and generates/serves a nice visualization of the summary information we've collected. After running the command above, TensorBoard should print a message telling you which port it's listening on - by default it should be port 6006. Visit localhost:< portnum\>  to see the summaries we've created.

The graph tab displays the graph we've created, while the scalar tab should display a line graph of the value of mult on each iteration.

In the next section, we will create a linear regression model which collects information on its performance and parameters as it trains. We'll use TensorBoard to display this information. 

# Linear Regression
Now we will create a linear regression model that learns using gradient descent! Before we proceed, a quick note on notation...

## Notation
We will denote numpy arrays with variable names beginning with an underscore. Variable names beginning with an alphanumeric character are reserved for tensors in TensorFlow graphs. Lower case variables will be used to denote row or column vectors. For example:

* `_X` is a numpy array (2 dimensional or larger)
* `X` is a TensorFlow tensor (2 dimensional or larger)
* `_y` is a numpy array (with shape (n,) or shape (n, 1))
* `y` is a TensorFlow tensor (with shape (n,) or shape (n, 1))

## Importing Data

Run the cell below to import a data set and run some small preprocessing steps. Read through it to see what its doing!

In [None]:
def preprocess(_X, _y):
    pipeline = make_pipeline(StandardScaler()) # additional sklearn preprocessing goes here
    _X = pipeline.fit_transform(_X)
    _X = np.c_[_X, np.ones(len(_X))] # add column of ones for constant
    
    # split into train/test sets
    _X_train, _X_test, _y_train, _y_test = train_test_split(
        _X,
        _y,
        test_size=.2,
        random_state=42
    )
    
    # store y's as column vectors
    _y_train = _y_train.reshape(-1,1)
    _y_test = _y_test.reshape(-1,1)
    
    return _X_train, _X_test, _y_train, _y_test

_X_train, _X_test, _y_train, _y_test = preprocess(*load_diabetes(return_X_y=True))

## Constructing our graph 

We want to start work on a whole new graph. Use `tf.Graph` to construct a new graph. Store this variable as `g_linreg`.

In [None]:
g_linreg = ...

TensorFlow graphs implement the python context manager interface. By working inside the confines of a `with` block, we may ensure we are creating and adding nodes to the graph we intend to!

* Inside the context below, create placeholders for our training data
* `X` and `y` should each be placeholders with `dtype=tf.float32`. Name the placeholders accordingly.

In [None]:
with g_linreg.as_default():
    X = ...
    y = ...

Now let's use `tf.Variable`, to create a variable `theta`, which stores the parameters of our regression model. Variables are nodes that may be reassigned or trained by other nodes in our graph.

* We must pass `tf.Variable` an initial value for the variable.
* Use `tf.random_uniform` to initialze theta to a tensor with shape (11,1) that holds random values between -1 and 1.

In [None]:
with g_linreg.as_default():
    theta = ...

Linear regression models make predictions the same way a neuron in a neural net calculated its weighted input. That is, we perform the computation:

$$\hat{y} = X\theta.$$

The vector $\hat{y}$ is the output of the regression model (i.e. the predicted value). Note that we haven't added a bias term here. This is because our preprocessing step added a new column of ones to every observation in our training data. That way, the last parameter in $\theta$ acts the same way as a bias term would.

* Create a new node on our graph called `y_hat` that applies `theta` to `X` using matrix multiplication (Hint: look into `tf.matmul`)

In [None]:
with g_linreg.as_default():
    y_hat = ...

The following cell creates nodes in our graph to compute the `error`, `mse`, and `gradient` of the mse function in a similar fashion to how we did this during the precourse.

In [None]:
with g_linreg.as_default():
    error = tf.identity(y_hat - y, name="error")
    mse = tf.reduce_mean(tf.square(error), name="mse")
    gradient = tf.identity(2/353 * tf.matmul(tf.transpose(X), error), name="gradient")

At this point, we've created a graph that can apply the parameters `theta` to our training data `X`, and can use the known values `y` to compute the gradient of the mse function. We have all of the pieces in place to perform gradient descent!

All we have left to do is reassign the variable `theta` to an improved value! In TensorFlow, assignment is modeled as another node in a graph, just like every other operation. In the cell below:

* Use `tf.assign` to create a node that reassigns the value of `theta` to `theta - (.01 * gradient)`. We will name this operation `train`.

In [None]:
with g_linreg.as_default():
    train = ...

## Creating Summaries

We're just about ready to train our model - but first lets create summary nodes so we can track our model's progress over time. In the cell below:

* create a writer to write event files to `tensorboard/linreg/run1`. Save it to a variable `writer_linreg`.
* create a scalar summary that tracks `mse`. Give it an appropriate name. Save it to a variable `scalar_op_linreg`
* create a histogram summary that tracks `theta` and name it appropriately. Save it to a variable `hist_op_linreg`.

*Remember: summary operations are operations like any other - they must be created on the proper graph. If you aren't careful, you will add them to the default graph, which still refers to our original simple graph from the first section.*

In [None]:
# create writer_linreg,  scalar_op_linreg, and hist_op_linreg


Keeping track of multiple summary operations and evaluating them all individually can become quite a pain. To combat this, TensorFlow provides us with convenient methods to bundle summaries into a single operation.

`tf.summary.merge_all` will merge all summary operations in the default graph into a single operation. We could use this as long as we are in a context where `g_linreg` is the default graph. `tf.summary.merge` allows us to explicitly pass a list of summary operations we would like to merge.

* use `tf.summary.merge_all` or `tf.summary.merge` to merge `scalar_op_linreg` and `hist_op_linreg`. Save the result to a variable, `summary_op_linreg`.

In [None]:
# create summary_op_linreg


## Training Session for Linear Regression

Now we have all the tools necessary to train and document our model! In order to actually perform computation, we have to begin a session. Note that when we create the session, we are passing the graph we created as a kwarg. If we don't do this, TensorFlow will associate the session with the default graph in our current context.

After creating our session, we have a number of things to do:
1. Whenever we create a `tf.Variable`, we also create a secondary operation whose sole job it is to initialize the variable. Before we use `theta`, we must use `sess.run` to run the initializer for `theta` (which is conveniently referenced by `theta.initializer`)
1. Create our `train_feed_dict` and `test_feed_dict` that store our training and test data, respectively. They will be used to pass values to our placeholder nodes.
1. Before training our model, lets see how it performs with random values for `theta`. Evaluate `mse`, feeding the model our test data. print the result.
1. Let's train our model! Evaluate the `train` and `summary_op_linreg` operations 2000 times.
1. Each time, use `writer_linreg` to store the generated summary. Remember to pass the summary and an index representing the epoch the summary is from.
1. After training and recording summaries, lets evaluate `mse` on our test data once more. It should have improved greatly!

In [None]:
with tf.Session(graph=g_linreg) as sess:
    # 1. initialize theta
    # 2. create train_feed_dict and test_feed_dict
    # 3. evaluate the mse on our test data, prior to training theta
    # 4. train the model for 2000 epochs.
    #    remember to evaluate the summary operation each time and write it to memory.
    # 5. evaluate the mse on the test data once more.
    #    it should be significantly lower after training
    pass

## Evaluating Model Training Using TensorBoard

Checkout our summary in TensorBoard! What does the model's training curve look like? What is the final distribution of model parameters like? Are there other stats you'd like to record during training?

Also notice how messy the visualization of the computational graph is. This was manageable for a very small graph, but now it's hardly helpful because of how many nodes there are. Luckily, there's a fix: with the help of the `tf.name_scope` context manager, we can go from a very messy graph to a much better organized graph:

<img src="images/clean-vs-messy.png" alt="Messy Graph" style="height: 350px;"/>

If you create an operation named "an_operation" within the name_scope "a_name_scope", TensorFlow will assign the operation the full name "a_name_scope/an_operation". When TensorBoard displays our graph, it will group operations in the same name scope into a single node.

* please refactor your code as follows:
    1. `X` and `y` belong to the name scope "input"
    1. `theta` belongs to the name scope "parameters"
    1. `y_hat` belongs to the name scope "predictions"
    1. `error` and `mse` belong to the name scope "loss"
    1. `gradient` and `train` belong to the name scope "training"

The graph displayed by TensorBoard should be identical to the clean graph in the picture above. You can click on a name scope to expand it, and view individual operations within that name scope.

## Autodiff

The `gradient` operation in our graph manually computes the gradient of the `mse` operation with respect to the variable `theta`. This is okay for a simple linear regression model. However, computing the gradient of our loss function with respect to model parameters would be a non-trivial problem for larger models like deep neural nets.

Luckily, TensorFlow has an autodiff feature which creates operations that calculate these gradients for us! The `tf.gradients` method accepts two main argumets: 
1. `y`: A tensor, like `mse`, to differentiate
1. `xs`: A tensor or list of tensors, like `theta`, which we would like to use for differentiation.

`tf.gradients` then returns an operation, or a list of operations, that compute the gradient of `y` with respect to each operation in `xs`. How does this work? Built-in TensorFlow operations implement code to compute their own gradients. When we compose built-in operations to build more complex operations (e.g. how we create the `mse` operation), TensorFlow is able to use well-known rules of differential calculus (e.g. the chain rule) to create a gradient operation by composing the component gradient operations. The algorithm TensorFlow uses to do this is called [reverse-mode automatic differentiation](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation).

* Refactor your code so that the `gradients` operation is computed by the `tf.gradients` method

# Extensions
1. Can you spot the data leak in the preprocessing function?
1. Can you build a fully connected neural net using the features of TensorFlow discussed in this notebook?
