# Getting started with TensorFlow

**Learning Objectives**
  1. Practice defining and performing basic operations on constant Tensors
  1. Use Tensorflow's automatic differentiation capability
  1. Learn how to train a linear regression from scratch with TensorFLow


In this notebook, we will start by reviewing the main operations on Tensors in TensorFlow and understand how to manipulate TensorFlow Variables. We explain how these are compatible with python built-in list and numpy arrays. 

Then we will jump to the problem of training a linear regression from scratch with gradient descent. The first order of business will be to understand how to compute the gradients of a function (the loss here) with respect to some of its arguments (the model weights here). The TensorFlow construct allowing us to do that is `tf.GradientTape`, which we will describe. 

At last we will create a simple training loop to learn the weights of a 1-dim linear regression using synthetic data generated from a linear model. 

As a bonus exercise, we will do the same for data generated from a non linear model, forcing us to manual engineer non-linear features to improve our linear model performance.

In [None]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

In [None]:
# Ensure the right version of Tensorflow is installed.
!pip freeze | grep tensorflow==2.5

In [None]:
import numpy as np
from matplotlib import pyplot as plt
import tensorflow as tf

In [None]:
print(tf.__version__)

## Operations on Tensors

### Variables and Constants

Tensors in TensorFlow are either contant (`tf.constant`) or variables (`tf.Variable`).
Constant values can not be changed, while variables values can be.

The main difference is that instances of `tf.Variable` have methods allowing us to change 
their values while tensors constructed with `tf.constant` don't have these methods, and
therefore their values can not be changed. When you want to change the value of a `tf.Variable`
`x` use one of the following method: 

* `x.assign(new_value)`
* `x.assign_add(value_to_be_added)`
* `x.assign_sub(value_to_be_subtracted`



In [None]:
x = tf.constant([2, 3, 4])
x

In [None]:
x = tf.Variable(2.0, dtype=tf.float32, name='my_variable')

**Lab Task #1:** Use the `assign(..)` method to assign a new value to the variable `x` you created above. After each, print `x` to verify how the value changes. 

In [None]:
# TODO 1
x.assign( # TODO: Your code goes here. 
x

In [None]:
# TODO 2
x.assign( # TODO: Your code goes here. 
x

In [None]:
# TODO 3
x.assign( # TODO: Your code goes here. 
x

### Point-wise operations

Tensorflow offers similar point-wise tensor operations as numpy does:
    
* `tf.add` allows to add the components of a tensor 
* `tf.multiply` allows us to multiply the components of a tensor
* `tf.subtract` allow us to substract the components of a tensor
* `tf.math.*` contains the usual math operations to be applied on the components of a tensor
* and many more...

Most of the standard aritmetic operations (`tf.add`, `tf.substrac`, etc.) are overloaded by the usual corresponding arithmetic symbols (`+`, `-`, etc.)

**Lab Task #2:** Create two tensorflow constants `a = [5, 3, 8]` and `b = [3, -1, 2]`. Then, compute 
1. the sum of the constants `a` and `b` below using `tf.add` and `+` and verify both operations produce the same values.
2. the product of the constants `a` and `b` below using `tf.multiply` and `*` and verify both operations produce the same values.
3. the exponential of the constant `a` using `tf.math.exp`. Note, you'll need to specify the type for this operation.


In [None]:
# TODO 1
a = # TODO: Your code goes here.
b = # TODO: Your code goes here.
c = # TODO: Your code goes here.
d = # TODO: Your code goes here.

print("c:", c)
print("d:", d)

In [None]:
# TODO 2
a = # TODO: Your code goes here.
b = # TODO: Your code goes here.
c = # TODO: Your code goes here.
d = # TODO: Your code goes here.

print("c:", c)
print("d:", d)

In [None]:
# TODO 3
# tf.math.exp expects floats so we need to explicitly give the type
a = # TODO: Your code goes here.
b = # TODO: Your code goes here.

print("b:", b)

### NumPy Interoperability

In addition to native TF tensors, tensorflow operations can take native python types and NumPy arrays as operands. 

In [None]:
# native python list
a_py = [1, 2] 
b_py = [3, 4] 

**Lab Task #3:** Use `tf.add` to compute the sum of the native python arrays `a` and `b`. 

In [None]:
# TODO 1
# TODO: Your code goes here.

In [None]:
# numpy arrays
a_np = np.array([1, 2])
b_np = np.array([3, 4])

**Lab Task #4:** Use `tf.add` to compute the sum of the NumPy arrays `a` and `b`. 

In [None]:
# TODO 1
# TODO: Your code goes here.

In [None]:
# native TF tensor
a_tf = tf.constant([1, 2])
b_tf = tf.constant([3, 4])

**Lab Task #5:** Use `tf.add` to compute the sum of the Tensorflow constants `a` and `b`. 

In [1]:
# TODO 1
# TODO: Your code goes here.

You can convert a native TF tensor to a NumPy array using .numpy()

In [None]:
a_tf.numpy()

## Linear Regression

Now let's use low level tensorflow operations to implement linear regression.

Later in the course you'll see abstracted ways to do this using high level TensorFlow.

### Toy Dataset

We'll model the following function:

\begin{equation}
y= 2x + 10
\end{equation}

In [None]:
X = tf.constant(range(10), dtype=tf.float32)
Y = 2 * X + 10

print("X:{}".format(X))
print("Y:{}".format(Y))

Let's also create a test dataset to evaluate our models:

In [None]:
X_test = tf.constant(range(10, 20), dtype=tf.float32)
Y_test = 2 * X_test + 10

print("X_test:{}".format(X_test))
print("Y_test:{}".format(Y_test))

#### Loss Function

The simplest model we can build is a model that for each value of x returns the sample mean of the training set:

In [None]:
y_mean = Y.numpy().mean()


def predict_mean(X):
    y_hat = [y_mean] * len(X)
    return y_hat

Y_hat = predict_mean(X_test)

Using mean squared error, our loss is:
\begin{equation}
MSE = \frac{1}{m}\sum_{i=1}^{m}(\hat{Y}_i-Y_i)^2
\end{equation}

For this simple model the loss is then:

In [None]:
errors = (Y_hat - Y_test)**2
loss = tf.reduce_mean(errors)
loss.numpy()

This values for the MSE loss above will give us a baseline to compare how a more complex model is doing.

Now, if $\hat{Y}$ represents the vector containing our model's predictions when we use a linear regression model
\begin{equation}
\hat{Y} = w_0X + w_1
\end{equation}

we can write a loss function taking as arguments the coefficients of the model:

In [None]:
def loss_mse(X, Y, w0, w1):
    Y_hat = w0 * X + w1
    errors = (Y_hat - Y)**2
    return tf.reduce_mean(errors)

### Gradient Function

To use gradient descent we need to take the partial derivatives of the loss function with respect to each of the weights. We could manually compute the derivatives, but with Tensorflow's automatic differentiation capabilities we don't have to!

During gradient descent we think of the loss as a function of the parameters $w_0$ and $w_1$. Thus, we want to compute the partial derivative with respect to these variables. 

For that we need to wrap our loss computation within the context of `tf.GradientTape` instance which will reccord gradient information:

```python
with tf.GradientTape() as tape:
    loss = # computation 
```

This will allow us to later compute the gradients of any tensor computed within the `tf.GradientTape` context with respect to instances of `tf.Variable`:

```python
gradients = tape.gradient(loss, [w0, w1])
```

We illustrate this procedure with by computing the loss gradients with respect to the model weights:

**Lab Task #6:** Complete the function below to compute the loss gradients with respect to the model weights `w0` and `w1`. 

In [None]:
# TODO 1
def compute_gradients(X, Y, w0, w1):
    # TODO: Your code goes here.

In [None]:
w0 = tf.Variable(0.0)
w1 = tf.Variable(0.0)

dw0, dw1 = compute_gradients(X, Y, w0, w1)

In [None]:
print("dw0:", dw0.numpy())

In [None]:
print("dw1", dw1.numpy())

### Training Loop

Here we have a very simple training loop that converges. Note we are ignoring best practices like batching, creating a separate test set, and random weight initialization for the sake of simplicity.

**Lab Task #7:** Complete the `for` loop below to train a linear regression. 
1. Use `compute_gradients` to compute `dw0` and `dw1`.
2. Then, re-assign the value of `w0` and `w1` using the `.assign_sub(...)` method with the computed gradient values and the `LEARNING_RATE`.
3. Finally, for every 100th step , we'll compute and print the `loss`. Use the `loss_mse` function we created above to compute the `loss`. 

In [None]:
# TODO 1
STEPS = 1000
LEARNING_RATE = .02
MSG = "STEP {step} - loss: {loss}, w0: {w0}, w1: {w1}\n"


w0 = tf.Variable(0.0)
w1 = tf.Variable(0.0)


for step in range(0, STEPS + 1):

    dw0, dw1 = #TODO: Your code goes here.
    #TODO: Your code goes here.
    #TODO: Your code goes here.

    if step % 100 == 0:
        loss = #TODO: Your code goes here.
        print(MSG.format(step=step, loss=loss, w0=w0.numpy(), w1=w1.numpy()))


Now let's compare the test loss for this linear regression to the test loss from the baseline model that outputs always the mean of the training set:

In [None]:
loss = loss_mse(X_test, Y_test, w0, w1)
loss.numpy()

This is indeed much better!

## Bonus

Try modelling a non-linear function such as: $y=xe^{-x^2}$

In [None]:
X = tf.constant(np.linspace(0, 2, 1000), dtype=tf.float32)
Y = X * tf.exp(-X**2)

In [None]:
%matplotlib inline

plt.plot(X, Y)

In [None]:
def make_features(X):
    f1 = tf.ones_like(X)  # Bias.
    f2 = X
    f3 = tf.square(X)
    f4 = tf.sqrt(X)
    f5 = tf.exp(X)
    return tf.stack([f1, f2, f3, f4, f5], axis=1)

In [None]:
def predict(X, W):
    return tf.squeeze(X @ W, -1)

In [None]:
def loss_mse(X, Y, W):
    Y_hat = predict(X, W)
    errors = (Y_hat - Y)**2
    return tf.reduce_mean(errors)

In [None]:
def compute_gradients(X, Y, W):
    with tf.GradientTape() as tape:
        loss = loss_mse(Xf, Y, W)
    return tape.gradient(loss, W)

In [None]:
# TODO 2
STEPS = 2000
LEARNING_RATE = .02


Xf = make_features(X)
n_weights = Xf.shape[1]

W = tf.Variable(np.zeros((n_weights, 1)), dtype=tf.float32)

# For plotting
steps, losses = [], []
plt.figure()


for step in range(1, STEPS + 1):

    dW = compute_gradients(X, Y, W)
    W.assign_sub(dW * LEARNING_RATE)

    if step % 100 == 0:
        loss = loss_mse(Xf, Y, W)
        steps.append(step)
        losses.append(loss)
        plt.clf()
        plt.plot(steps, losses)


print("STEP: {} MSE: {}".format(STEPS, loss_mse(Xf, Y, W)))

plt.figure()
plt.plot(X, Y, label='actual')
plt.plot(X, predict(Xf, W), label='predicted')
plt.legend()

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License