<table align="center">
  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/andrew-nash/CS6421-labs/blob/main/Lab2.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/andrew-nash/CS6421-labs/blob/main/Lab2.ipynb">
        <img src="https://i.ibb.co/xfJbPmL/github.png"  height="70px" style="padding-bottom:5px;"  />View Source on GitHub</a></td>
</table>


# Basic Model Optimiztion with TensorFlow

During the first lab, we saw examples of defining simple feed-forward models containing a single layer of neurons.

In this lab, we will look at

1. The process of defining loss function, calculating gradients and backpropogating weight updates
2. How to monitor the training process - assessing loss and accuracy over time, and comparing the overall performane of models with different hyper-parameters

We will not focus on specific data pro-processing, or the optimality/sub-optimality of any particular modelling choices - these are topics for future labs, that will use the techniques discussed here

## Recap on a simple TensorFlow model from last week

In [None]:
import tensorflow as tf

In [None]:
### Defining a neural network using the Sequential API ###

# Import relevant packages
from tensorflow.keras import Sequential

# Define the number of outputs
n_output_nodes = 3

# First define the model
model = Sequential()

# Remember: dense layers are defined by the parameters W and b!
dense_layer = tf.keras.layers.Dense(10)

# Add the dense layer to the model
model.add(dense_layer)

We have seen that we can perform inference on this model with

In [None]:
x_input = tf.constant([[1,2.]], shape=(1,2))

print(model(x_input))

tf.Tensor(
[[-1.3462192   0.26335108 -1.2146258  -0.40529662 -0.87654626 -0.10545951
  -1.0769374   0.6932813   1.4673295   1.847245  ]], shape=(1, 10), dtype=float32)


The net question is, how can we perform *training* on this model, as discussed in class?

For the example of a supervised model:

1. We need to define some input data, with labels associated to each input. This data should be split into train and test partitions.
2. We need to define a *loss function*, to compute a measure of difference between predicted values and true labels
3. For each prediction and loss value, we must then *backpropogate* weight updates baack through the network

Luckily for us, TensorFlow can abstract much of this process into simple functions

## Input Data

For this example, we will use a pre-loaded dataset from TensorFlow - the MNIST (Modified NIST) dataset. This is a set of 70,000 28x28 greyscale images, with associated labels, of handwritten digits (0-9).

In this case TensorFlow has already split up the dataset to give us 60k images for training, and a separate 10k for evaluation.

Proper data loading, handling and pro-processing will be discussed more fully in the next lab - for now we will just consider this dataset as an example



In [None]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

## Define a simple model to train

We need to specify the "shape" of the input that will be passsed to the model. In this case, we pass the image as a 'flat' vector (rather than as a matrix) to simplify the code.


In [None]:
model = Sequential()
dense_layer = tf.keras.layers.Dense(10, input_shape=[784])
model.add(dense_layer)

In [None]:
model.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 10)                7850      
                                                                 
Total params: 7850 (30.66 KB)
Trainable params: 7850 (30.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Loss Function

In this simple example, the predicted values are a single integer - the predicted digit of the inputted image.

In [None]:
def mean_absolute_error(true_label, predicted_label):
    abs_err = tf.abs(tf.subtract(true_label, predicted_label))
    return tf.reduce_mean(abs_err)

In [None]:
def mean_squared_error(y_true, y_pred):
    ''' TODO Implement Mean Squared Error Loss
    '''
    return

## Back-propogation

The code we saw last week can be used to automatically differentiate our loss function with respect to the weight values, and from there apply updates to our weights.

### First, we must track the forward pass with tf.GradientTape()

In [None]:
sample_image = x_train[0].reshape(1,784)
sample_label = y_train[0]

with tf.GradientTape() as tape:
    prediction = ''' TODO: Perform inference on the sample image '''
    loss_value = ''' TODO: compute the loss under mse '''

print("Pre-update loss:", loss_value)

Pre-update loss: tf.Tensor(189.28352, shape=(), dtype=float32)


### Then, find the gradients

In [None]:
gradients = tape.gradient(loss_value, model.trainable_variables)

### We can now use these gradients to apply weight updates

$W_{t+1} = W_t-\lambda \star \nabla_W $

Where $\lambda$ is the *learning-rate* hyperparameter

In [None]:
lr = tf.constant(0.001)
# UPDATING THE WEIGHTS (of layer 1)
model.trainable_variables[0].assign(model.trainable_variables[0] - tf.multiply(lr,gradients[0]))
# UPDATING THE BIASES (of layer 1)
model.trainable_variables[1].assign(model.trainable_variables[1] - tf.multiply(lr,gradients[1]))

### Re-perform the inference, and see if the loss has reduced

In [None]:
new_prediction = ''' TODO: Perform inference on the sample image '''
new_loss_value = ''' TODO: compute the loss '''
print("Post-update loss:", new_loss_value)

Post-update loss: tf.Tensor(182.356, shape=(), dtype=float32)


As we can see, the model has completed successful forward and backward passes through the model

## Backpropogation, but simpler

TensorFlow has a much simpler method to perform back-propogation, which is especially important for more complicated models

In [None]:
model = Sequential()
dense_layer = tf.keras.layers.Dense(10, input_shape=[784])
model.add(dense_layer)

In [None]:
model.compile(
    optimizer = "SGD",
    loss = "mean_squared_error",
    metrics = ['accuracy']
)

In [None]:
model.fit(
    x_train.reshape(-1, 784),
    y_train,
    epochs=10,
    batch_size = 128,
    validation_data = (x_test.reshape(-1, 784),y_test)
)

This provides with a quick and easy way to initialise the training process, and gives us some nicely formatted loss & accuracy metrics.

One of the most useful additions to these training 'wrapper' functions are Callbacks - custom functions that enable us to log all sorts of other useful information to monitor the model training process


# Computation Graph \& Monitoring Training with TensorBoard

TensorBoard is TensorFlow's official (https://www.tensorflow.org/tensorboard/get_started) monitoring tool. This can be installed and used as a standalone package on your machine to run alongside any TensorFlow model training - or it can be used in the form of a Jupyter Notebook extension

In [None]:
%load_ext tensorboard

In [None]:
model = Sequential()
dense_layer = tf.keras.layers.Dense(10, input_shape=[784])
model.add(dense_layer)

model.compile(
    optimizer = "SGD",
    loss = "mean_squared_error",
    metrics = ['accuracy']
)

While your TensorBoard logs can be located and formatted however you want, it is highly recommended to place logs in a timestamped directory keep everything well organised

In [None]:
import datetime
log_dir = "logs/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

Tensorboard can now be start, specifying the location in which logs will be placed

In [None]:
%tensorboard --logdir logs

The TensorBoard Callback is an Object that attaches to a particular model, and streams data live to TensorBoard application - so we get real-time visualisation of our metrics.

The frequency at which this data is streamed is specified in a number of epochs with _freq arguments

E.g.:

update_freq = 1 means that the losses and metrics are sent to tensorboard at the end of every epoch

histogram_freq = 1 means that the model weights are sent to tensorboard at the end of every epoch

In [None]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, update_freq=1, histogram_freq=1)

In [None]:
model.fit(
    x_train.reshape(-1, 784),
    y_train,
    epochs=10,
    batch_size = 128,
    validation_data = (x_test.reshape(-1, 784),y_test),
    callbacks = [tensorboard_callback]
)

## Interpreting the TensorBoard Data

In your own time, it is well worth exploring this tool to see the sorts of information that it can visualise. Some of the most important aspects:

1. Plots of train and test loss (and other metrics such as accuracy) over time
2. A complete representation of the computation graph of the model
3. Distributions of the weight and bias values over time

# Dashboards with *Weights And Biases* (wandb.ai)  

Based on the tutorial from: https://colab.research.google.com/github/wandb/examples/blob/master/colabs/intro/Intro_to_Weights_&_Biases_keras.ipynb#scrollTo=ru06gHFl1B0M

While TensorBoard is an excellent tool for monitoring and assessing the model parameters (weights & biased) in training, there are other monitoring tools which are useful for comparing the performance of models for different choices of hyper-paramters (such as number of layers, choice of activation function, epochs of training, optimization function, etc)

An example of such a tool is `Weights And Biases` - wanb.ai . It focuses on Tableau-like dashboards, generated automatically from your models.

## Installation

In [None]:
!pip install wandb -qU

In [None]:
import wandb
from wandb.keras import WandbMetricsLogger, WandbModelCheckpoint

First, create an account at https://wandb.ai

Once that's done, copy the API key for your account into the text box that appears when running the cell below

In [None]:
wandb.login()

## 👟 Run an experiment
1️⃣. **Start a new run** and pass in hyperparameters to track

2️⃣. **Log metrics** from training or evaluation

3️⃣. **Visualize results** in the dashboard

Here, we will use a similar model as before, only with 2 layers instead of 1  

In [None]:
wandb.init(
    project="keras-intro",
    # (optional) set entity to specify your username or team name
    # entity="my_team",
    config={
        "layer_1_neurons": 10,
        "optimizer": "SGD",
        "loss": "mean_squared_error",
        "metric": "accuracy",
        "epoch": 10,
        "batch_size": 8,
    },
)
config = wandb.config

# Get the data
model = Sequential()
dense_layer_1 = tf.keras.layers.Dense( config['layer_1_neurons'], input_shape=[784])
model.add(dense_layer_1)
dense_layer_2 = tf.keras.layers.Dense(10)
model.add(dense_layer_2)


model.compile(optimizer=config.optimizer, loss=config.loss, metrics=[config.metric])

# Add WandbMetricsLogger to log metrics and WandbModelCheckpoint to log model checkpoints
wandb_callbacks = [
    WandbMetricsLogger(),
    WandbModelCheckpoint(filepath="model_{epoch:02d}"),
]

model.fit(
    x=x_train.reshape(-1, 784),
    y=y_train,
    epochs=config.epoch,
    batch_size=config.batch_size,
    validation_data=(x_test, y_test),
    callbacks=wandb_callbacks,
)

# Mark the run as finished
wandb.finish()

## Experiment 2

For this experiment, we will compare the performance of vaarying the number of neurons in the first layer

In [None]:
for neurons in [5,10,15,20,25]:
  wandb.init(
      project=    ''' TODO: SET AN APPROPIATE NAME FOR THIS RUN ''',
      # (optional) set entity to specify your username or team name
      # entity="my_team",
      config={
          ''' CONFIGURE THIS MODEL SO THAT THE FIRST LAYER COTAINS
            THE SPECIFIED NUMBER OF NEURONS '''
      },
  )
  config = wandb.config

  # Get the data
  model = Sequential()
  dense_layer_1 = tf.keras.layers.Dense( config['layer_1_neurons'], input_shape=[784])
  model.add(dense_layer_1)
  dense_layer_2 = tf.keras.layers.Dense(10)
  model.add(dense_layer_2)


  model.compile(optimizer=config.optimizer, loss=config.loss, metrics=[config.metric])

  # Add WandbMetricsLogger to log metrics and WandbModelCheckpoint to log model checkpoints
  wandb_callbacks = [
      WandbMetricsLogger(),
      WandbModelCheckpoint(filepath="model_{epoch:02d}"),
  ]

  model.fit(
      x=x_train.reshape(-1, 784),
      y=y_train,
      epochs=config.epoch,
      batch_size=config.batch_size,
      validation_data=(x_test, y_test),
      callbacks=wandb_callbacks,
  )

  # Mark the run as finished
  wandb.finish()

## Experiment 3

Choose a number of neurons for the first layer, and consider the impact of choosing different activation functions for the first layer

In [None]:
for act_funcs in ["relu", "tanh", "sigmoid"]:
  wandb.init(
      project=    ''' TODO: SET AN APPROPIATE NAME FOR THIS RUN ''',
      # (optional) set entity to specify your username or team name
      # entity="my_team",
      config={
          ''' CONFIGURE THIS MODEL TO SPECIFY THE ACTIVATION FUNCTION FOR THIS RUN'''
      },
  )
  config = wandb.config

  # Get the data
  model = Sequential()
  dense_layer_1 = tf.keras.layers.Dense(''' TODO: SPECIFY THE NUMBER
  OF NEURONS, AND CHOSEN ACTIVATION FUNCTION  ''',  input_shape=[784])
  model.add(dense_layer_1)
  dense_layer_2 = tf.keras.layers.Dense(10)
  model.add(dense_layer_2)


  model.compile(optimizer=config.optimizer, loss=config.loss, metrics=[config.metric])

  # Add WandbMetricsLogger to log metrics and WandbModelCheckpoint to log model checkpoints
  wandb_callbacks = [
      WandbMetricsLogger(),
      WandbModelCheckpoint(filepath="model_{epoch:02d}"),
  ]

  model.fit(
      x=x_train.reshape(-1, 784),
      y=y_train,
      epochs=config.epoch,
      batch_size=config.batch_size,
      validation_data=(x_test, y_test),
      callbacks=wandb_callbacks,
  )

  # Mark the run as finished
  wandb.finish()