# Introduction to TensorFlow
# Notebook - Machine Learning

*Disclosure: This notebook is an adaptation of an official TensorFlow tutorial and Toronto's CSC421 tutorial material.*

##Installation

Install any TensorFlow version via pip within the virtual machine provided by colab. If you are using the most recent stable version, no installation is required. The TensorFlow package is included by default within a colab.

If you want to use a hardware accelerator (GPU), go to Runtime -> Change runtyme type -> Hardware accelerator -> GPU

In [0]:
# Install TF stable version or release candidate
!pip install tensorflow

# Note: New package versions include CPU and GPU support
#       For releases 1.15 and older, CPU and GPU packages are separate (tensorflow-gpu)

In [0]:
# Import Tensorflow and NumPy
import tensorflow as tf  # regardless of cpu/gpu version , the module is always called tensorflow within python
import numpy as np

In [0]:
# Make sure correct version is installed
print(f"Imported TensorFlow version: {tf.__version__}")  # accessing the package version
print(f"Imported TensorFlow version: {tf.version.VERSION}")  # using the tf api

## Basic Operations

Tensors can be stored in the graph as **constants** or **variables**. As you might guess, constants hold tensors whose values can't change, while variables hold tensors whose values can change. However, what you may not have guessed is that constants and variables are just more operations in the graph. A constant is an operation that always returns the same tensor value. A variable is an operation that will return whichever tensor has been assigned to it.

In [0]:
a = tf.constant([1, 2])
b = tf.constant([3, 4])

c_1 = a+b
c_2 = tf.add(a, b)

# Show result
print(c_1, c_2)
print('Tensor c', c_1, type(c_1))  # return Tensor
print('c.numpy', c_1.numpy(), type(c_1))  # return numpy n-d array view

# Can be seamlessly used with numpy n-d arrays
d = np.eye(2, dtype=np.int32)  # identity matrix

# numpy dot product of tf.tensor and numpy array
e = np.dot(c_1,d) 
print('e', e, type(e))

# tf tensor dot product of tf.tensor and numpy array
f = tf.tensordot(c_1,d, axes=1) 
print('f', f, type(f))



Tensor c tf.Tensor([4 6], shape=(2,), dtype=int32) <class 'tensorflow.python.framework.ops.EagerTensor'>
c.numpy [4 6] <class 'tensorflow.python.framework.ops.EagerTensor'>
e [4 6] <class 'numpy.ndarray'>
f tf.Tensor([4 6], shape=(2,), dtype=int32) <class 'tensorflow.python.framework.ops.EagerTensor'>


## NumPy & TensorFlow n-d arrays

###Same functionality

In [0]:
# mostly similar commands
a = np.ones((2,2))
b = tf.ones((2,2))

print("Numpy n-d array: \n", a)
print("TensorFlow Tensor: \n", b)


Numpy n-d array: 
 [[1. 1.]
 [1. 1.]]
TensorFlow Tensor: 
 tf.Tensor(
[[1. 1.]
 [1. 1.]], shape=(2, 2), dtype=float32)


###Similar functionality with different name

In [0]:
# some commands may be called different
a = np.ones((2,2))
b = tf.ones((2,2))

c = np.sum(a)
d = tf.reduce_sum(b)

print("Numpy n-d array: \n", c)
print("TensorFlow Tensor: \n", d)

Numpy n-d array: 
 4.0
TensorFlow Tensor: 
 tf.Tensor(4.0, shape=(), dtype=float32)


### Symbolic operators and broadcasting

In [0]:
a = np.ones((2,2))
b = tf.ones((2,2))

# element-wise multiplication + addition
a = a * 5 + 1
b = b * 5 + 1

print("Numpy n-d array: \n", a)
print("TensorFlow Tensor: \n", b)


Numpy n-d array: 
 [[6. 6.]
 [6. 6.]]
TensorFlow Tensor: 
 tf.Tensor(
[[6. 6.]
 [6. 6.]], shape=(2, 2), dtype=float32)


###Indexing and slicing

In [0]:
a = np.ones((2, 2))
b = tf.ones((2, 2))

print("Scalar entry: ", a[0, 0])
print("First row: ", a[:, 0])

print("Scalar entry: ", b[0, 0])
print("First row: ", b[:, 0])

Scalar entry:  1.0
First row:  [1. 1.]
Scalar entry:  tf.Tensor(1.0, shape=(), dtype=float32)
First row:  tf.Tensor([1. 1.], shape=(2,), dtype=float32)


###Shape and rank

Numpy:

In [0]:
a = np.array([[1, 2, 3], [4, 5, 6]])
print("NumPy shape: ", a.shape)
print("NumPy ndim: ", a.ndim)  # Number of array dimensons

NumPy shape:  (2, 3)
NumPy ndim:  2


It is crucial to build your data pipeline and networks with the right shapes, so have a look at the following examples for TensorFlow.

In [0]:
a = tf.constant(1)                               # scalar
b = tf.constant([1, 2, 3])                       # vector
c = tf.constant([[1, 2, 3], [4, 5, 6]])          # matrix
d = tf.constant([[[1, 2, 3]], [[7, 8, 9]]])      # n-d array


elements = [a, b, c, d]

for idx, selected_element in enumerate(elements):
  print("Element Index: ", idx)
  print("TensorFlow Tensor: ", selected_element)
  print("TensorFlow Shape: ", selected_element.shape)  # Attribute
  print("TensorFlow Rank: ", tf.rank(selected_element))  # Function that returns number of array dimensions - not the rank of a matrix
  print("---------------------")

Element Index:  0
TensorFlow Tensor:  tf.Tensor(1, shape=(), dtype=int32)
TensorFlow Shape:  ()
TensorFlow Rank:  tf.Tensor(0, shape=(), dtype=int32)
---------------------
Element Index:  1
TensorFlow Tensor:  tf.Tensor([1 2 3], shape=(3,), dtype=int32)
TensorFlow Shape:  (3,)
TensorFlow Rank:  tf.Tensor(1, shape=(), dtype=int32)
---------------------
Element Index:  2
TensorFlow Tensor:  tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)
TensorFlow Shape:  (2, 3)
TensorFlow Rank:  tf.Tensor(2, shape=(), dtype=int32)
---------------------
Element Index:  3
TensorFlow Tensor:  tf.Tensor(
[[[1 2 3]]

 [[7 8 9]]], shape=(2, 1, 3), dtype=int32)
TensorFlow Shape:  (2, 1, 3)
TensorFlow Rank:  tf.Tensor(3, shape=(), dtype=int32)
---------------------


## Automatic Differentiation with TensorFlow
Like Autograd, TensorFlow records all operations executed to a tape. It is commonly performed within a python context.
    x = tf.ones((2, 2))

    with tf.GradientTape() as tape:
      tape.watch(x)

The Gradient Tape will be used in this notebook, and will be revisited in Prart II of the Tensorflow Introduction. For more information, see https://www.tensorflow.org/tutorials/customization/autodiff


# Examples - Linear, Polynomial, Non-linear Regression
The next three sections of the notebook show examples of using TensorFlow in the context of three problems:

1. **1-D linear regression**, where we try to fit a model to a function $y = wx + b$
2. **Linear regression using a polynomial feature map**, to fit a function of the form $y = w_0 + w_1 x + w_2 x^2 + \dots + w_M x^M$
3. **Nonlinear regression using a simple Neural Network**




## Linear Regression with TensorFlow

**Note:** This is a modified example of the google colab linear_regression.ipynb

**Assumption:** Data distribution follows a linear model $$
y_i = wx_i + b + \epsilon = 2 x_i + 0.5 + \epsilon
$$
where $\epsilon \sim \mathcal{N}(0, 0.001)$. 

A minimal example of linear regression in TensorFlow 2.0, written from scratch.  

We'll create a few points on a scatter plot, then find the best fit line,  solving for $C(w, b) = \frac{1}{N} \sum\limits_{i=1}^N (y_i - (w x_i + b))^2$ within TensorFlow.

In [0]:
# Import plotting functionality
import matplotlib.pyplot as plt

Create a noisy dataset that's roughly linear, according to the equation y = w * x + b + noise.

In [0]:
def make_linear_data(w=2.0, b=0.5, N=100):
  x = tf.random.uniform(shape=(N,))  # TensorFlow and NumPy provide similar functionality
  eps = tf.random.normal(shape=(len(x),), stddev=0.1)
  y = w * x + b + eps
  return x, y

In [0]:
x_train, y_train = make_linear_data()

In [0]:
plt.plot(x_train, y_train, 'r.')

Define trainable variables for our model. 

In [0]:
# Create trainable variables
w = tf.Variable(tf.random.normal(()))
b = tf.Variable(tf.random.normal(()))

Predict y given x.

In [0]:
def predict(x):
  y = w * x + b
  return y

Our loss will be the squared difference between the predicted values and the true values.

In [0]:
def squared_error(y_pred, y_true):
  err = tf.reduce_mean(tf.square(y_pred - y_true)) 
  return err

Calculate loss before training.

In [0]:
loss = squared_error(predict(x_train), y_train)
print("Starting loss", loss)

Use gradient descent to gradually improve our guess for `m` and `b`. At each step, we'll nudge them a little bit in the right direction to reduce the loss.

In [0]:
alpha = 0.025  # learning rate
steps = 2000

for i in range(steps):
  # GradientTape fetches all variables needed for gradient calculation
  # Discussed in Part II
  with tf.GradientTape() as tape:
    predictions = predict(x_train)
    loss = squared_error(predictions, y_train)

  # Calculate gradients via tape
  gradients = tape.gradient(loss, [w, b])
  
  # Manual gradient descent of w and b
  w.assign_sub(gradients[0] * alpha)
  b.assign_sub(gradients[1] * alpha)
  
  if i % 50 == 0:    
    print(f"Step {i}, Loss {loss.numpy()}")

The learned values for m and b.

In [0]:
print (f"w: {w.numpy()}, b: {b.numpy()}")


Plot the best fit line.

In [0]:
plt.plot(x_train, y_train, 'r.')
plt.plot(x_train, predict(x_train), 'b.')

A couple things you can explore:

* To understand gradient descent, try printing out the `gradients` calculated below. See how they're used to adjust the variables (`w` and `b`).

* You can use TF 2.0 a lot like NumPy.  Try printing out the training data we created (`x_train`, `y_train`) and understand the format. Next, do the same for the variables (w and b). Notice both of these can be converted to NumPy format (with `.numpy()`).

### Bonus
Let's visualize the error surface as a function of w and b. This section is included purely for fun, you can skip it without missing anything.

In [19]:
# Warning: hacky code ahead

import numpy as np
from mpl_toolkits.mplot3d import Axes3D

# To plot the error surface, we'll need to get the loss
# for a bunch of different values for w and b.

ws = np.linspace(-3, 3)
bs = np.linspace(-3, 3)
w_mesh, b_mesh = np.meshgrid(ws, bs)

def loss_for_values(m, b):
  y = w * x_train + b
  loss = squared_error(y, y_train)
  return loss

zs = np.array([loss_for_values(w, b) for (w,b) in zip(np.ravel(w_mesh), 
                                                      np.ravel(b_mesh))])
z_mesh = zs.reshape(w_mesh.shape)

fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(w_mesh, b_mesh, z_mesh, color='b', alpha=0.06)

# At this point we have an error surface. 
# Now we'll need a history of the gradient descent steps.
# So as not to complicate the above code,
# let's retrain the model here, keeping
# track of w, b, and loss at each step.

# Intentionally start with this guess to 
# make the plot nicer
w = tf.Variable(-.5)
b = tf.Variable(-.75)

history = []

for i in range(steps):
  with tf.GradientTape() as tape:
    predictions = predict(x_train)
    loss = squared_error(predictions, y_train)
  gradients = tape.gradient(loss, [w, b])
  history.append((w.numpy(), b.numpy(), loss.numpy()))
  w.assign_sub(gradients[0] * learning_rate)
  b.assign_sub(gradients[1] * learning_rate)

# Plot the trajectory
ax.plot([h[0] for h in history], 
        [h[1] for h in history], 
        [h[2] for h in history],
        marker='o')

ax.set_xlabel('w', fontsize=18, labelpad=20)
ax.set_ylabel('b', fontsize=18, labelpad=20)
ax.set_zlabel('loss', fontsize=18, labelpad=20)

ax.view_init(elev=22, azim=28)

NameError: ignored

## Polynomial Regression

In this example we will fit a polynomial using linear regression with a polynomial feature mapping.
The target function is:

$$
y = 0.6x^4 - 1.2 x^2 + 0.8 x + 0.5 + \epsilon
$$

where $\epsilon \sim \mathcal{N}(0, 0.001)$. 

This is an example of a _generalized linear model_, in which we perform a fixed nonlinear transformation of the inputs $\mathbf{x} = (x_1, x_2, \dots, x_D)$, and the model is still linear in the _parameters_. We can define a set of _feature mappings_ (also called feature functions or basis functions) $\phi$ to implement the fixed transformations.

In this case, we have $x \in \mathbb{R}$, and we define the feature mapping:
$$
\mathbf{\phi}(x) = \begin{pmatrix}\phi_1(x) \\ \phi_2(x) \\ \vdots \\ \phi_D(x) \end{pmatrix} = \begin{pmatrix}1\\x\\\vdots\\x^D\end{pmatrix}
$$

In [0]:
def make_polynomial_data(N=100):
  x = tf.random.uniform(shape=(N,))  # TensorFlow and NumPy provide similar functionality
  eps = tf.random.normal(shape=(len(x),), stddev=0.01)
  y = 0.6* x**4 - 1.2 * x**2 + 0.8 * x + 0.5 + eps
  return x, y

In [0]:
# Generate new polynomial synthetic data
x_train, y_train = make_polynomial_data()

In [0]:
plt.plot(x_train, y_train, 'r.')

In [0]:
# Create trainable variables
D = 4 # Degree of polynomial to fit to the data (this is a hyperparameter)
w = tf.Variable(tf.random.normal((D+1,)))

In [0]:
# Predict with feature mapping
def predict_poly(x):
  pred = w[0] * tf.ones_like(x) # fetch shape of x and create a tensor of ones
  for idx in range(1, D+1):
    pred = tf.add(pred, w[idx] * tf.pow(x, idx))
  return pred

In [0]:
# Redefine loss function
def squared_error_poly(y_pred, y_true):
  return tf.reduce_mean(tf.square(y_pred - y_true)) 

In [0]:
# Calculate loss
loss = squared_error_poly(predict_poly(x_train), y_train)
print("Starting loss", loss)

In [0]:
# Perform gradient descent for vector w
alpha = 0.5  # learning rate
steps = 10000

for i in range(steps):
  # GradientTape fetches all variables needed for gradient calculation
  # Discussed in Part II
  with tf.GradientTape() as tape:
    predictions = predict_poly(x_train)
    loss = squared_error_poly(predictions, y_train)
    
  # Calculate gradients via tape
  gradients = tape.gradient(loss, [w])
  
  # Manual gradient descent of the weight vector
  w.assign_sub(gradients[0] * alpha)
  
  if i % 1000 == 0:    
    print(f"Step {i}, Loss {loss.numpy()}")

In [0]:
print(f'w manual:    {[0.5, 0.8, -1.2, 0.0, 0.6]}')
print(f"w automatic: {w.numpy()}")

Plot the best fit line.

In [0]:
plt.plot(x_train, y_train, 'r.')
plt.plot(x_train, predict_poly(x_train), 'b.')

It would also be possible to perform Regression by a Neural Network in the same way. One would need to implement a Stochastic Gradient Descent and apply the gradients to each respective variable. As you will see, TensorFlow and Keras provide this functionality, making it easy to train complex networks.