# Deep Learning with TensorFlow
## Summative assessment 1

#### Instructions

There are 3 questions in this assessment. **You should attempt to answer all questions.** 

You can make imports as and when you need them throughout the notebook, and add code cells where necessary. Make sure your notebook executes correctly in sequence before submitting.

You have 2 hours and 30 minutes to complete this assessment.

#### How to submit

When you have finished and are happy with your code, make sure all cells are executed and their outputs printed, and then save as an html file. You should upload and submit the following files to Turnitin on Blackboard **in a single zip file**:

* Your completed jupyter notebook file (`.ipynb` file format)
* The executed notebook saved as an `.html` file

You are also required to name your zip file as _'SurnameCID.zip'_, e.g. _Smith1234567.zip_. Do not submit multiple files. The submitted ipynb file must produce the output that appears in your html file.

Make sure you submit your files before the exam deadline of **Friday 18th March, 12.40pm** (extra 10 minutes is included for preparing and uploading the files).

Please regularly check your email and Blackboard in case it is necessary to send any announcements during the assessment.

_Important:_ As this is assessed work you need to work on it individually. It must be your own and unaided work. You are not allowed to discuss the assessment with your fellow students or anybody else. All rules regarding academic integrity and plagiarism apply. Violations of this will be treated as an examination offence. In particular, letting somebody else copy your work constitutes an examination offence. 

In [None]:
import tensorflow as tf

### Question 1 (Total 30 marks)

This question uses the [CIFAR-10](https://keras.io/api/datasets/cifar10/) dataset, which you should load as follows:

In [None]:
from tensorflow.keras.datasets import cifar10
(cifar10_x_train, cifar10_y_train), (cifar10_x_test, cifar10_y_test) = cifar10.load_data()

a) Randomly select 20 examples from the CIFAR-10 training dataset and display the images along with their (integer) labels. **(5 marks)**

b) Use the Sequential API to implement an MLP classifier model for the CIFAR-10 dataset (as loaded above) with the following spec:

* The model first flattens the input image into a 1-D Tensor.
* The model then passes the image through 3 Dense layers of width 64, 32 and 16 respectively, using a sigmoid activation function.
* The final Dense layer has width 10, and no activation function.

Print the model summary, and compile it with a suitable optimizer and loss function, and an accuracy metric. **(10 marks)**

c) Create a custom layer called `GlobalMaxAveragePooling2D`, that performs a global pooling operation over the spatial dimensions (height and width). 

* The layer initialiser should take one required argument `data_format`, which can take the values `'channels_first'` or `'channels_last'`, corresponding to the ordering of dimensions in the inputs.   
  * `'channels_first'` assumes that the input is of the form `(batch_size, channels, height, width)`
  * `'channels_last'` assumes that the input is of the form `(batch_size, height, width, channels)`
* The layer should compute the global **mean** pixel value per channel, as well as the global **max** pixel value per channel, i.e. it should reduce out the height and width dimensions
* The layer should then concatenate the result of the two global pooling operations, to return a Tensor of shape `(batch_size, 2*channels)`
* Your layer implementation should not make use of any other Keras layers or TensorFlow pooling functions. 
* Test your layer by instantiating it (create a `GlobalMaxAveragePooling2D` object) and calling it on a Tensor of shape `(2, 8, 6, 3)` filled with randomly sampled values. **(10 marks)**

_Hint: the `tf.reduce_max`, `tf.reduce_mean` and `tf.concat` functions will be useful in defining the custom layer._

d) Use the Sequential API to implement a CNN classifier model for the CIFAR-10 dataset (as loaded above), using any number of `Conv2D` layers, one `GlobalMaxAveragePooling2D` layer and one `Dense` layer. Your `Conv2D` layers can use any hyperparameter settings. 

Your CNN model should have roughly the same number of trainable parameters as the MLP model above (within $\pm 10\%$). Print the model summary. **(5 marks)**

### Question 2 (Total 30 marks)

This question uses the [CIFAR-10](https://keras.io/api/datasets/cifar10/) dataset, which you should load as follows:

In [None]:
from tensorflow.keras.datasets import cifar10
(cifar10_x_train, cifar10_y_train), (cifar10_x_test, cifar10_y_test) = cifar10.load_data()

a) Create train and test `tf.data.Dataset` objects for the CIFAR-10 dataset, that return a tuple of input images and integer labels. 

You should then write the following functions to process the input images:

* Write a function `rescale` that converts the images to type `tf.float32` and rescales the pixel values to lie in the range $[-1, 1]$
* Write a function `grayscale` that converts the input images to grayscale by averaging the pixel values of the colour channels. The image Tensors should retain the channel axis, with length 1.
* Write a function `random_flip` that randomly flips the input image horizontally, with probability $0.5$

Apply each of the functions above in sequence to the train and test `tf.data.Dataset` objects using the `map` method and print out the `element_spec`. Do not batch or shuffle your Datasets. **(17 marks)** 

b) Write a function `filter_classes` that you will use with the Datasets' `filter` method, to filter out examples with any of the class labels $[0, 1, 8, 9]$.

Apply your function to the train and test Datasets with the `filter` method, and verify your function works correctly by iterating through the Datasets and confirming that the examples returned by the Datasets only contain the integer labels $[3, 4, 5, 6, 7]$. **(8 marks)**

c) Batch your Dataset objects with a batch size of 20 (but do not shuffle them). 

Draw one batch of examples from your training Dataset and display them along with their string label names. The label names are provided below. **(5 marks)**

In [None]:
# These label names correspond with the integer labels loaded above

class_names = [
    'airplane', 
    'automobile', 
    'bird', 
    'cat', 
    'deer', 
    'dog', 
    'frog', 
    'horse', 
    'ship', 
    'truck'
]

### Question 3 (Total 40 marks)

This question uses the [MNIST](https://keras.io/api/datasets/mnist/) dataset, which you should load as follows (this question only uses the train split):

In [None]:
from tensorflow.keras.datasets import mnist
(mnist_x_train, mnist_y_train), _ = mnist.load_data()

a) Create a `tf.data.Dataset` object from the train split arrays loaded above, that returns a tuple of input images and integer labels.

* Process the input images by rescaling the pixel values to the range $[0, 1]$
* Shuffle the Dataset with buffer size 1000
* Batch the Dataset with batch size 64

**(5 marks)**

b) Use the functional API to implement an MLP classification model for the MNIST dataset.

Your model should first flatten the input image into a 1-D Tensor, and should have a single hidden layer with 100 neurons and a ReLU activation. The output layer is a softmax layer with 10 classes. Compile your model with a suitable loss function and the SGD optimizer. Print the model summary. **(7 marks)**

c) A common method for finding a suitable learning rate for a given model and optimizer is to perform a dummy training run as a learning rate probe as follows:

* Set the initial learning rate to a small value
* Fix a number of iterations to train the model, `num_iterations` 
* After each iteration/batch, slightly increase the learning rate by multiplying by a fixed factor $q>1$
* Repeat until the learning rate reaches a high value, recording the loss at each training iteration

The aim is to obtain a plot that looks something like this:
<img src="figures/example_loss_lr_curve.png" alt="Life expectancy" style="width: 350px;"/>
The loss decreases in the beginning, but eventually starts rising when it is too large. A good choice of learning rate is one slightly to the left of the minimum.

Write a custom callback `LearningRateProbe` to implement the above scheme. 

* The initializer should take the arguments `num_iterations`, `initial_learning_rate` and `final_learning_rate`
* The callback should set the optimizer learning rate to `initial_learning_rate` at the start of training, and then increase it by a factor of $q$ after each training iteration
  * The factor $q$ should be calculated to satisfy:
  
  `initial_learning_rate * q**(num_iterations - 1) = final_learning_rate`
  
* After each training iteration, the callback should store the learning rate values and the loss values
  
_Hint: the optimizer learning rate is a non-trainable TensorFlow Variable stored at `model.optimizer.lr`, for a compiled model. This Variable should be updated after each training iteration as above._ **(18 marks)**

d) Use your `LearningRateProbe` callback to collect the loss values from training your (freshly initialized) model from part a) for a single epoch on the MNIST training data, with the batch size of 64.

* Set `num_iterations` to 938 (number of batches of size 64 in a single epoch through a training set of 60,000 examples)
* Set `initial_learning_rate` to $1e-8$
* Set `final_learning_rate` to $10$
* Train your model for 1 epoch, passing in your `LearningRateProbe` custom callback object
* Plot the loss values against the learning rate, using a log scale for the x-axis
* Calculate the learning rate that minimizes the loss curve, and suggest a good learning rate for this model, dataset and optimizer

**(10 marks)**