Use Python to define computational graph, which is exectued in C++ Code.
Due to the graphical structure TF is highly parallizeable.
TF can compute gradients automatically.
The main python API is very flexible at the cost of higher complexity.
It comes with tensorboard: a visualization tool for the computational graph.

### Tensor flow APIs for machine learning
* `tensorflow.estimator`: API for predefined models (very simalar to scikit-learn)
* `tensorflow.losses`: Losses
* `tensorflow.metrics`: Metrics
* `tensorflow.layes`: Neural network layes



### High level APIs to Tensorflow
* `tensorflow.contrib.learn`: API compatible with scikit-learn
* `tensorflow.contrib.slim`: API to simplify computations
* `tensorflow.contrib.keras`: Keras API

### Rought Structure of Tensorflow
In principle (as e.g., in Spark) a tensorflow computation consists of two steps:
1. **Construction phase:**
 This parts specifies the computational graph. I.e., it specifies the model to compute
 In tensorflow we call the associated object _dataflow graph_ (or just graph). Any additional variable is just a node added to the graph. The individual nodes are **Tensors** 
 and **operations**. Tensors don't carry any data. 
 
2. **Execution phase:**
This part then runs the actual computation and returns results. 
In Tensorflow the associated object is called a _session_. The return dypes of a session are **numpy arrays**.
### Todos / Brainstorming
* How to implement fit predict cleanly? 
* Warnings with Jupyter notebooks
* Visualization of the computational graph

### Possible Examples
* Logistic Regression
* Linear Regression
* Generalized linear models
* Survival Analysis
* Customer Livetime values
* Decision Tree
* Random Forest

# A first example

In [1]:
import tensorflow as tf

  return f(*args, **kwds)


In [2]:
# Creating the computational graph
# Neither functions are evaluated nor variables are initialized
# Not executed yet (just like a tranformation in spark)
x = tf.Variable(3, name="x")
y = tf.Variable(5, name="y")
f = x*x*x + x + y +42

In [3]:
# A Session 1) initializes the variables + computes the graph
sess = tf.Session()
sess.run(x.initializer) # Initialize Variables
sess.run(y.initializer)
result = sess.run(f) # Run Graph
print(result)

77


In [4]:
# Convientient alternative via context
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
print(result)

77


# Initializing variables
**TF introduces shortcuts for readability:**
* `x.initializer.run()` is equivalent to `tf.get_default_session().run(x.initializer)`
* `f.eval()` is equivalent to `tf.get_default_session().run(f)` 

**Globally initializing variables**
* Instead of  initializing each  variable individually, this may be done globally.
* By calling `tf.global_variables_initializer()`
* So the above code may be written more compactly

In [5]:
init = tf.global_variables_initializer() # add init node
with tf.Session() as sess:
    init.run() # actually run init
    result = f.eval()
print(result)

77


# Using Tensorflow for experiments (jupyter notebook)
* There are several ways to:
    - speed up experimenting with TF
    - reduce boilerplates (executing sessions)
    - these are an 1) interactive tf session or 2) to enable eager execution (get rid of session calls)
### Interactive TF session
* Useful in particular for jupyter notebooks
* Don't forget to close the session afterwards (would be done automatically within the session context)

In [6]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)
sess.close()

77


### Eager execution
* [Tensor flow documentation](https://www.tensorflow.org/guide/eager)
* Within this execution mode TF operates similar to numpy 
* I.e, the graph is directly evaluated and thus calling the execution phase immediatly
* To this end we need to tell tensorflow
    - `tf.enable_eager_execution()`
    
**However, this operational mode is still under development!**

> Eager execution is not included in the latest release (version 1.4) of TensorFlow. To use it, you will need to build TensorFlow from source or install the nightly builds.


In [7]:
import tensorflow.contrib.eager as tfe
try:
    tfe.enable_eager_execution()
    x = tf.constant([[2., 3.]], dtype=tf.float32, name="x")
    m = tf.matmul(x, x)
except:
    print("Currently you need to build TF in order to enable eager execution")

Currently you need to build TF in order to enable eager execution


# Graphs
* So far we did not assign any explicit graph
* The reason is that tensoflow introduces the so called **default graph**
* Any variable specified get's added to the default graph
* However, we may also introduce other graphs, apart from the default graph.

#### Graphs within jupyter notebooks
* As jupyter notebooks are used for experimenting and cells are executed various times, it is possible that the graph get's "messed up". 
* In order to clean the graph tensoflow allows to clean the graph by the command `tf.reset_default_graph()`.

In [8]:
# node is added to default graph
x1 = tf.Variable(3, "x1")
print("Is x1 in default graph?:", x1.graph is tf.get_default_graph())

# create a new graph object
my_graph = tf.Graph()
with my_graph.as_default():
    x2 = tf.Variable(30, "x2")

print("Is x2 is in my_graph?:", x2.graph is my_graph) 
print("Is x2 is in default_graph?:", x2.graph is tf.get_default_graph()) 

Is x1 in default graph?: True
Is x2 is in my_graph?: True
Is x2 is in default_graph?: False


In [9]:
x3 = tf.Variable(333, "x3") # add to default graph
tf.reset_default_graph() # reset default graph
print("Is x3 is in default graph:", x3.graph is tf.get_default_graph()) 

Is x3 is in default graph: False


### Using a graph in a session
* Instanite a new graph
* Bild new graph
* Instantiate Session with the new graph

In [10]:
new_graph = tf.Graph()

with new_graph.as_default():
    x = tf.constant(3) # use float.32 as default
    y = tf.constant(4) # use float.32 as default

with tf.Session(graph=new_graph) as sess:
    print(sess.graph is new_graph)

True


# Node Lifecycle
* All node _values_ are dropped between graph runs (within a session).
* Node values are (intermediate) results
* This implies that node values are not reused (as in spark) per default.
* This leads to  inefficiencies if also intermediate results are of interest. 
* For each (intermediate) result output TF runs an individual graph run.
* In order to avoid redundant evaluations, you need to ask tensorflow to **evaluate all interesting variables within a single graph run**.
* This can be achieved by 



#### Additional Information from the textbook (A. Geron, _Hands on machine learning with scikit-learn and tensoflow_, 2017):

> All node values are dropped between graph runs, except variable values, which are maintained by the
session across graph runs (queues and readers also maintain some state, as we will see in Chapter 12). A
variable starts its life when its initializer is run, and it ends when the session is closed.

> In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable). In distributed TensorFlow (see Chapter 12), variable state is stored on the servers, not in the sessions, so multiple sessions can share the same variables.

In [11]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3
# Redundant graph evaluation: 2 graphs are evaluated
with tf.Session() as sess:
    print(y.eval()) # 10
    print(z.eval()) # 15

10
15


In [12]:
# one graph computation only
with tf.Session() as sess:
    y_val, z_val =  sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


# Data types

### Principle
* Tensor flow datatypes are just placeholders

### Tensors:


### Constants and variables
* **A constant is immutable.**
* **A variable is mutable.** Thus, variables may change their values (e.g. by assign) constants not!
* When working with Variables, they need to be explicitly intitialized, via ` tf.global_variables_initializer`.
 
### Casting datatypes
* `tf.cast`

### Broadcasting and reshaping
* https://colab.research.google.com/notebooks/mlcc/creating_and_manipulating_tensors.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=tensors-colab&hl=de
* Tensorflow follows numpy in broadcasting and reshaping
* In principle this is an elementwise operation with compatible dimension blowups
* 

### Feature columns

In [13]:
tf.reset_default_graph()

x = tf.constant([[3,4], [3, 3]], name="x")
y = tf.Variable([3,4])
y = y.assign([3, 0])

init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    print(x.eval())
    print(y.eval())

[[3 4]
 [3 3]]
[3 0]


In [14]:
# Constants are immutable -> error
tf.reset_default_graph()
x = tf.constant([[3,4], [3, 3]], name="x")
try:
    x = x.assign([[0,0],[0,0]])
except:
    print("Failed! Cannot assign constant!")
with tf.Session() as sess:
    print(x.eval())


Failed! Cannot assign constant!
[[3 4]
 [3 3]]


In [15]:
# Braodcasting

tf.reset_default_graph()
x = tf.constant([[1,1],[2,2]]) # default type is tf.int32
y = tf.constant(1) 
x_times_y = x + y

with tf.Session() as sess:
    print("Elementwise sum between shapes {0} and {1}".format(x.shape, y.shape))
    print(x_times_y.eval())
    

tf.reset_default_graph()
x = tf.constant([[1,1],[2,2]]) # default type is tf.int32
y = tf.constant([1]) 
x_times_y = x + y

with tf.Session() as sess:
    print("Elementwise sum between shapes {0} and {1}".format(x.shape, y.shape))
    print(x_times_y.eval())

    
tf.reset_default_graph()
x = tf.constant([[1,1],[2,2]]) # default type is tf.int32
y = tf.constant([1,2]) 
x_times_y = x + y

with tf.Session() as sess:
    print("Elementwise sum between shapes {0} and {1}".format(x.shape, y.shape))
    print(x_times_y.eval())    
    
tf.reset_default_graph()
x = tf.constant([[1,1],[2,2]]) # default type is tf.int32
y = tf.constant([[1,2], [3,4]]) 
x_times_y = x + y

with tf.Session() as sess:
    print("Elementwise sum between shapes {0} and {1}".format(x.shape, y.shape))
    print(x_times_y.eval())      

Elementwise sum between shapes (2, 2) and ()
[[2 2]
 [3 3]]
Elementwise sum between shapes (2, 2) and (1,)
[[2 2]
 [3 3]]
Elementwise sum between shapes (2, 2) and (2,)
[[2 3]
 [3 4]]
Elementwise sum between shapes (2, 2) and (2, 2)
[[2 3]
 [5 6]]


In [16]:
# Example with incompatible shape -> Throws error
tf.reset_default_graph()

x = tf.constant([[1,1],[2,2]]) # default type is tf.int32
y = tf.constant([1,2,4]) 
x_times_y = x + y

with tf.Session() as sess:
    print("Elementwise sum between shapes {0} and {1}".format(x.shape, y.shape))
    print(x_times_y.eval())  

ValueError: Dimensions must be equal, but are 2 and 3 for 'add' (op: 'Add') with input shapes: [2,2], [3].

In [17]:
tf.reset_default_graph()

matrix = tf.constant(
    [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]],
    dtype=tf.int32)

reshaped_2x2x4_tensor = tf.reshape(matrix, [2, 2, 4])
one_dimensional_vector = tf.reshape(matrix, [16])

with tf.Session() as sess:
    print(matrix.eval())
    print(reshaped_2x2x4_tensor.eval())
    print(one_dimensional_vector.eval())

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]
 [13 14]
 [15 16]]
[[[ 1  2  3  4]
  [ 5  6  7  8]]

 [[ 9 10 11 12]
  [13 14 15 16]]]
[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]


# [Feeding data into Graphs:](https://www.tensorflow.org/guide/low_level_intro)
* There are two possibilietes of feeding training data into graphs:  via placeholders and 
datasets
* Furthermore layers are objects used for trainable (mutable) model parameters

### Placeholders:
* `tf.placeholder`
* The values are inserted by dictionary, `feed_dict` into the run method within a Session 
* Placeholders throw an error if no value is fed to them

### Data sets:
* [Doku for importing data with Datasets](https://www.tensorflow.org/guide/datasets)
* Datasets are used for streaming data into the model
* (Placeholder are good for experimenting)
* To get a runable / iterable Tensor from Data two steps must be done
    1. **Convert the data** to a `tf.data.Iterator` object (aka Dataset)
    2. **Call** `tf.data.Iterator.get_next()` method
    3. **Catch exception:** reaching the end of the dataset streaming an `tf.errors.OutofRangeError` is thrown
* The easiest generator can be obtained with the  `tf.data.Dataset.make_one_shot_iterator`
* The dataset may depend on stateful operations. In this case, it needs to be initialized before


### Layers:
* While placeholders and datasets are used for providing training data, layers are used for the the model parameters, because
* in an (iteratively) trainable model, model paramters must be able to change during training / optimization.
* In tensorflow the preferable way to do this is with the `tf.layers` object.
* While this wording is motivated from neural networks it is a general principle
* As varibles, **layers need to be intitialized** (`tf.initialize_global_variables()`)
* **Shortcuts:**
    - `tf.layers.Dense` returns an object that needs to be called on tensor
    - `tf.layers.dense` is a method that takes input tensor and params from above init
    - Difference in semantics: capital vs non capital letters
    
    
### Shared variables:
* **Todo**

In [18]:
# placeholder
tf.reset_default_graph()
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = x + y

# placeholders are no varibles - no need for initialization
with tf.Session() as sess:
    print(sess.run(z, feed_dict={x: 3, y: 4.5}))
    print(sess.run(x, feed_dict={x: 10}))

7.5
10.0


In [19]:
# Combining variables and placeholders
tf.reset_default_graph()
a = tf.Variable(42.0, tf.float32)
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = a + x + y

# As we are having a variable here we need to initialize 
init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    print(sess.run(z, feed_dict={x: 3, y: 4.5}))    

49.5


In [None]:
# Datasets

tf.reset_default_graph()

data = [
    [0, 1,],
    [2, 3,],
    [4, 5,],
    [6, 7,]]

slices = tf.data.Dataset.from_tensor_slices(data) # 4 Tensors of shape 2
iterator = slices.make_one_shot_iterator() # initialize iterator
next_item = iterator.get_next()


with tf.Session() as sess:
    i = 0
    while True:
        try:
            print("Portion: ", i )
            print(sess.run(next_item))
            i = i + 1
        except tf.errors.OutOfRangeError:
            break

Portion:  0
[0 1]
Portion:  1
[2 3]
Portion:  2
[4 5]
Portion:  3
[6 7]
Portion:  4


In [None]:
# Layers
tf.reset_default_graph()

x = tf.placeholder(tf.float32, shape=[None, 3]) # 3 Features, N training samples as None
# Initialize linear regression, initialize with ones for reproduction
linear_model = tf.layers.Dense(units=1, 
                               activation=None, 
                               use_bias=True, 
                               kernel_initializer = tf.ones_initializer(),
                               bias_initializer=tf.ones_initializer())
# throw x on linear model
y = linear_model(x)          

init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    print(sess.run(y, {x: [[1, 2, 3],[4, 5, 6]]}))
    
    
# Remark: we set all model parameters to one. 
# So this is rather a prediction (for paremeters all one)
# The power of layes comes when we train using an optimizer!

[[ 7.]
 [16.]]


In [None]:
# Layers using shortcut
tf.reset_default_graph()

x = tf.placeholder(tf.float32, shape=[None, 3]) 
# Initialize linear regression, initialize with ones for reproduction
y = tf.layers.dense(x, 
                    units=1, 
                    activation=None, 
                    use_bias=True, 
                    kernel_initializer = tf.ones_initializer(),
                    bias_initializer=tf.ones_initializer())

init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    print(sess.run(y, {x: [[1, 2, 3],[4, 5, 6]]}))

# Loss functions
* In principle losses can be defined manually using tensor datatypes. 
* However as it is so common to use losses in order to train a model via optimization (reducing loss), tensor flow provides the common loss functions
* [`tf.losses`](https://www.tensorflow.org/api_docs/python/tf/losses)

In [None]:
# Squared loss function for regression
tf.reset_default_graph()
y_true = tf.constant([1, 1, 1])
y_pred = tf.constant([2, 2, 2])
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
print("Loss is not a tensor but an operation!:", type(loss), "\n\n")

with tf.Session() as sess:
    print(loss.eval())

# Optimizers
* Tensor flow strongly follows the neural nets paradigm
- Provide training data (`tf.data` or `tf.placeholder`)
- Set up model (`tf.layers`)
- Set up loss (`tf.losses`)
- Find model parameters by optimizing the loss evaluated on training data

* So the last ingredient are optimizers!
- `tf.train`
- The simplest optimizer one can imagine is gradient descent: `tf.train.GradientDescentOptimizer`
- They have to be instantiated and thrown on a loss function (that in turn depends on input data).

~~~~(.python)
loss = ??? 
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
~~~~

# A First simple example: linear regression

In [None]:
# from TF docu!
# Data
x = tf.constant([[1], [2], [3], [4]], dtype=tf.float32)
y_true = tf.constant([[0], [-1], [-2], [-3]], dtype=tf.float32)

# Model
linear_model = tf.layers.Dense(units=1)
y_pred = linear_model(x)

#Loss
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)

#Optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    for i in range(20): # optimization steps
        _, loss_value = sess.run((train, loss))
        print(loss_value)
    print(sess.run(y_pred))

# Linear Regression - some Theory
For linear regression the closed solution is known via the so called normal equation
Todo:
1. Specify model
2. Speficy loss function
2. Write down log Likelihood function
3. Minimize + get Normal equation

### Approach 1: Non-probabilisitc using loss function
The easies approach to linear regression is to use the squared loss function, 
$$l = \sum_{i=1}^N (y_i - \hat y_i)^2,$$
where $\hat y_i$ is the predicted value and $y_i$ is the true value of sample $i$. Together with a linear, non probabilistic model for the. For feature vector $x_i$ (where the first component is per convention the constant one -aka known as intercept) the prediction is given by the linear model. In scalar product notation:
$$y_i =  x_i^T \theta,$$
with coefficient vector $\theta$. Rewriting this in Matrix notation to account for all sampels we get for the loss together with the linear model:
$$l = (Y - X\theta)^T (Y - X\theta)$$
Now we would like to find $\theta^*$ that _minimizes_ the loss. 
Setting the derivative w.r.t. $\theta$ of the loss function zero  gives the normal equation 
$$X^T(Y - X\theta) = 0.$$
If $X^T X$ is non-singular (this is the case when there are more traning examples than features because then $X^T X$ is positive definite) allows for finding the unique solutions of the normal equation, 

$$\theta ^* =  \left( X^T X  \right)^{-1} X^T Y$$

Note that this approach is non-probabilistic and thus, does not explicitly account for uncertainty  (as a probability distributions) in the data and coefficients.
### Approach 2: Probabilistic  +  generative approach using max likelihood
This approach introduces a probability distribution but does not explicitly consider a loss function. The  response is modelled  via a Normal distribution ("Gauss error") assuming constant standard deviation
$$y_i = x_i^T \theta + \epsilon := \mathcal N (x_i^T \theta, \sigma^2)$$
In other words, the conditional distribtion  $p(y \mid x, \theta, \sigma^2)$ is given by a Normal distribtion.
The likelihood function is just the pdf of __all__ datapoints assuming i.i.d (this assumption in fact leads to the factorization), 
$$\mathcal L = \Pi_{i=1}^N p(y_i \mid x, \theta, \sigma).$$
As we are aiming to optimize $\theta$ in a way, a striclty monotonic transformation is applied on the likelihood function. It leaves the optimum invariant. The standard procedure is thus to consider the the logarithm ot likelihood function:
$$\mathcal L_l = \sum_{i=1}^N \log p(y_i \mid x, \theta, \sigma).$$
Evaluating this expression for the Normal distribution gives
$$\mathcal L_l =  - \frac{1}{2\sigma^2}\sum_{i=1}^N (y_i - x_i^T \theta)^2  - \frac{N}{2}log(2\pi\sigma^2) $$
Now, we would like to _maximise_ the likelihood and thus the log likelihood with respect to $\theta$. This is equivalent to _minimizing_ the negative of it. Throwing away terms that, don't depend on $\theta$ gives the function for which we would like to find the minimizer. That is we want to solve this expression, 
$$\text{argmin}_\theta\left( Y - X \theta \right )^T \left( Y - X \theta \right ),$$
where we have rewritten the sum of squares over all training data again in Matrix notation. But this is exactly the same problem as in approach one and this gives the same solution (under the same circumstances), 

$$\theta ^* =  \left( X^T X  \right)^{-1} X^T Y$$

**Remarks**
* Note that this procedure only puts a probablilty distribution on the response $y$ while treating the remaining ingredients as variables (via the conditional pdf Ansatz). 
* This already implies that in this context the solutions $\theta^*$ just tell how the pdf is parametrized (not even completely as we did not consider the optimum value of $\sigma$).
* In either case this approach does not really tell us how to predict a specific value $y$ for a given $x$. It just tells us the corresponding distribution of $y$. The fundamental reason is that we did not make any use of a loss function in this approach. 
* Pragmatically and in practice  of course, the prediction is made by plugging into the linear model as e.g. in the first appraoch
* Note that a constant value for $\sigma$ is called homoscedasticity. This implies that the variance does may not be a function of the features but only the mean within the Normal model for the response.
* Furhtermore note that this approach does not consider uncertainties in the paramters. This would eventually require a Bayesian approach. 

### Approach 3: Probabilistic  +  discriminative approach using loss function and max likelihood

### Apporach 4: A Bayesian approach

Approaches: 
- Via max likelihood https://www.quantstart.com/articles/Maximum-Likelihood-Estimation-for-Linear-Regression
- Minimize quadratic error directly

# Linear Regression via the Normal equation - with Tensorflow

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from collections import namedtuple
import matplotlib.pyplot as plt
import numpy as np
supervised = namedtuple("supervised", ["features", "target"])


def split_test_train(data):
    X_train, X_test, Y_train, Y_test = train_test_split(data.features, data.target, test_size = 0.2, random_state=5)
    return supervised(X_train, Y_train.reshape(-1, 1)), supervised(X_test, Y_test.reshape(-1, 1))

def add_intercept(features):
    """Add intercept to features
    Todo: as an exercise use tensorflow"""
    m, n = features.shape
    return np.c_[np.ones((m, 1)), features]

housing = fetch_california_housing()
data = supervised(housing.data, housing.target)
train, test = split_test_train(data)

### Approach 1: using the Normal equation
#### Training

In [None]:
# Initialize variables + graph
tf.reset_default_graph()
X = tf.constant(add_intercept(test.features), dtype=tf.float64, name="X")
Y = tf.constant(test.target, dtype=tf.float64, name="Y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), Y)

In [None]:
# Intitalize varaibales an compute graph
# Why does it also work without variable initialization?
init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    theta_value = theta.eval()   
print(theta_value)

#### Prediction

In [None]:
tf.reset_default_graph()
X = tf.constant(add_intercept(test.features), dtype=tf.float64, name="X")
theta = tf.constant(theta_value, dtype=tf.float64, name="theta")
prediction = tf.matmul(X, theta)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    init.run()
    prediction_values = prediction.eval()

#### Plotting

In [None]:
# For comparison perform linear regression with scikit learn
from sklearn.linear_model import LinearRegression
lin_model = LinearRegression(fit_intercept=True, normalize=False)
lin_model.fit(train.features, train.target)

# plot
fig, ax = plt.subplots(ncols=2, figsize=(15, 7))
ax[0].plot(test.target, prediction_values, "o", color="red", alpha=.5)
ax[1].plot(test.target, lin_model.predict(test.features), "o", color="blue", alpha=.5)
for i in [0,1]:
    ax[i].plot([0,10], [0,10], "--", lw=2, color="black")
    ax[i].set_xlabel("True value",  fontsize=15)
    ax[i].set_ylim(0,5.3)
    ax[i].set_xlim(0,5.3)
ax[0].set_ylabel("Predicted value (Tensorflow)", fontsize=15);
ax[1].set_ylabel("Predicted value (Scikit)",  fontsize=15);

In [None]:
# MSE on training set:
tf.reset_default_graph()
X = tf.constant(add_intercept(train.features), dtype=tf.float64, name="X")
theta = tf.constant(theta_value, dtype=tf.float64, name="theta")
prediction = tf.matmul(X, theta)
y_true = tf.constant(train.target, dtype=tf.float64, name="true_label")
y_pred_scikit = tf.constant(lin_model.predict(train.features), dtype=tf.float64, name="prediction_label")
mse = tf.reduce_mean(tf.square(y_true - prediction))
mse_scikit = tf.reduce_mean(tf.square(y_true - y_pred_scikit))
init = tf.global_variables_initializer()


with tf.Session() as sess:
    init.run()
    print("Tensorflow MSE on training data:",  mse.eval())
    print("Scikit Learn MSE on training data:", mse_scikit.eval())
    
    
# MSE on test set: 
tf.reset_default_graph()
y_true = tf.constant(test.target, dtype=tf.float64)
y_pred_scikit = tf.constant(lin_model.predict(test.features), dtype=tf.float64)
y_pred = tf.constant(prediction_values, dtype=tf.float64)
mse = tf.reduce_mean(tf.square(y_true - y_pred))
mse_scikit = tf.reduce_mean(tf.square(y_true - y_pred_scikit))
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    print("Tensorflow MSE on test data:",  mse.eval())
    print("Scikit Learn MSE on test data:", mse_scikit.eval())

# Using functions
* The above code is ugly
* Rather we would like to put the graph construction in (a) function(s)

# Linear Regression using the estimator API

# Sharing variables via placeholders
# Name scopes

# Linear Regression via an optimizier 
* Assume that we could not find the minimizer analytically. 
* Thus we wish to find the minimzer computationally.

In [None]:
### Ops vs data objects