<a href="https://colab.research.google.com/github/DanRHowarth/Tensorflow-2.0/blob/master/tf2_Notebook_2_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# tensorflow 2.0: Notebook 2: Computer Vision with CNNs 

## 1. Introduction to this Notebook 

* In the previous notebook we introduced deep learning concepts and the tensorflow 2.0 code to implement these concepts.
* In this notebook, we will use another dataset - the **`mnist`** dataset -  to build on our knowledge. In particular, we will:
  * introduce **`Computer Vision`** 
  * introduce **`convolutional layers`** into our models 
  * introduce the concept of **`regularisation`** and implement **`dropout layers`** in our models
  * introduce the **`validation set`** in training our model 
  * introduce how to **`save`** and reuse our model 
* The image below sets out how this fits within our deep learning framework and exising knowledge

![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Summary%20of%20Notebook%202%20Concepts.png?raw=true)

### 1.1  Load Libraries

In [1]:
# we need to install tensorflow 2.0 on the google cloud notebook we have opened
!pip install -q tensorflow==2.0.0-alpha0

[K     |████████████████████████████████| 79.9MB 309kB/s 
[K     |████████████████████████████████| 3.0MB 27.6MB/s 
[K     |████████████████████████████████| 419kB 39.6MB/s 
[?25h

In [2]:
## importing as per previous notebook

# We are future proofing by importing modules that modify or replace exising modules that we may have used now 
from __future__ import absolute_import, division, print_function, unicode_literals

# import tensorflow and tf.keras
import tensorflow as tf
from tensorflow import keras

# import helper libraries
import numpy as np
import matplotlib.pyplot as plt

# let's print out the version we are using 
print(tf.__version__)

2.0.0-alpha0


In [0]:
## some additional imports for this notebook 
from tensorflow.keras import datasets, layers, models

### 1.2 Loading our Data

In [6]:
# split our data 
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [0]:
# lets have a quick look at our data 
plt.imshow(train_images[0])

In [11]:
# and at our labels
np.unique(train_labels)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

**What is the problem we are trying to solve?**
* As we can see, we have images of digits from 0 - 9, and labels from 0 - 9. We are trying to build a model that correctly classifies the digits in the image.

## 2. Data: Introduction to Computer Vision 

**What is Computer Vision?**
* Computer Vision is the field of how computers can gain understanding from images and videos. It includes tasks such as **`image recognition`** and **`object detection`**. Deep Learning is seen as the state of the art technology for solving computer vision problems. 

**Why is Deep Learning particularly good at it?**
* The layers within a deep learning model are good for identifying and modelling the different aspects of an image (such as edges, parts of faces, and other important parts of an image). The meaning that each layer extracts can be built up to form representations for lots of different image types that can then be classified.  
* In particular, **`convolutional layers`** are good at extracting representation from image data and they form the basis of deep learning models for image recognition. The ability to build larger and larger models that consist of these convolutional layers, and to train them with more and more data (thanks to increasing compute power), led to a leap forward in state of the art for computer vision. 

**How does it work?**
* Every image is represented by an array of numbers. You may have noticed this when we looked at the **`shape`** of the images we were processing. This shape represents the number of pixels in an image, and each pixel has a numerical value. This numerical value maps to a colour value that is displayed. It is also what we use as input values to our model. 

In [7]:
## lets start by looking at the shape of an image

## we can see that it is 28 x 28 pixels
train_images[0].shape

(60000, 28, 28)

In [0]:
## we can also see that these pixels are represented in an array of numbers 
train_images[0]

In [0]:
# we need plt.imshow() - or another library such as OpenCV or PIL - to output an image from this array
plt.imshow(train_images[0])

**What do the array values mean?**
* Each value leads to a colour for the pixel that the array value represents. Actualy what colour is displayed depends somewhat on the number of colour channels the array has. We have only one channel present in this dataset. This is grayscale channel. Typically, we will see three channels for colour images, with each channel representing one of Red, Green, Blue. A value in one channel will display a different colour than a value in another channel. 
* See the tutorials [here](https://www.w3schools.com/colors/default.asp) for more detail on how the values within a channel map to a colour.
* Its worth noting here that there are typically 256 values (0 - 255) available in each channel, making a total combination of c. 16.8m colours available per a three channel image!
* As per the previous notebook, we will rescale the arrays to between 0 and 1. This needs to happen in order to maximise the success of the training. 

**What about images of different shapes?**
* The size of an image can and does vary. In this case, we have small image of 28 x 28 pixels (or 28, 28, 1) given we have one channel. This was the same for the previous dataset and it makes it easy to train models. 
* Outside of introductory tutorials, It is likely that you will see much larger images, meaning many more pixels and therefore larger arrays to train and learn representations on. This will make the models larger and training more involved. 
* One final thing to note is that Deep Learning models always require an array of the same size to be passed to it. This means that images which differ in size need to be preprocessed so that they are the same size before being passed to the model. 

In [0]:
# we now need to reshape the data to add a colour channel 
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

In [9]:
# we can view the new shape 
train_images.shape

(60000, 28, 28, 1)

In [0]:
# and normalize the data 
train_images, test_images = train_images / 255.0, test_images / 255.0

**So, what did we cover in this section?**
* An introduction to computer vision, including how images are represented by arrays.
* How the shape of an array matters for our model and the preprocessing required prior to feeding the arrays to our model.

**How does it add to our existing knowledge?**
* This builds on the deep learning concepts from notebook 2.

**What else can I learn to improve my knowledge?**
* Images have to be fed into the model in the same shape each time. This requires pre-processing.
* Prior to feed images into a model, we can also change the image in certain ways to add noise and variety to the training data. This should mean that the model is more robust and better at generalizing to unseen data. We will look at both of these in the *Advanced: Data Augmentation* notebook.

## 3. Model Building 

### 3.1 Convolutional Models

**What did we do in the previous notebook?**
* In [Notebook 1](link), we looked at the main elements of deep learning models: input and output layers, hidden layers - which contain the learned parameters of the model, and activation functions. 
* We also looked briefly at the different sort of hidden layers available to us, such as dense and convolutional layers. 
* And, we built model that took an image as an input and flattened it to a **1D** array. 

**How do we build a convolutional neural network?**
* A convolutional neural network (CNN) contains both dense and convolutional layers. The convolutional layers form the **`base`** of the model and extracts representation from the image. The dense layers form the **`head`** of the model and takes this represetation and maps it to our output classes
* A convolutional layer takes our image as it (subject to any preprocessing to get it in a standard shape or augmented to add noise and variety to the dataset) - that is, we do not need to flatten the image into a **`1D`** array. We flatten the array after our final convolutional layer and prior to passing our input to the dense layer.

**Why use a convolutional layer?**
* A convolution better encodes the key information in an image than other types of layers. Their application to computer vision resulted in a marked improvement in what was state of the art. That's why we use them. 

**What is a convolutional layer?**
* Simply, a convolutional layer is a layer that performs a mathematical operations known as convolutional on the input data.  In contrast, a dense layer perfornms matrix multiplicaiton on its inputs. 
* Each convolutional layer have a user-defined set of filters (or windows) that we pass over the image. We define the number and size of filters, although they are typically a 3 x 3 matrix. 
* This filter contains a set of weights that will be learned by the model and which are used to multiply the input values and return a new value in the layer's output. Its these filters that contain the learning of the convolutional layers of the model, whose weights will be updated as we train so that they are more and more able to extract key information from the image.
* The filter is applied to all the image channels as it passes over each pixel location such that it will look at a specific row and column index position and all the array values available at that index:
$$(row, column, :)$$
* We won't go in to *how* convolutional works here, but see the cell at the end of this section for links that do explain how it works. 

**So what does a convolutional layer return?**
* A convolutional layer returns an output array of the same (row, column) shape as the input array, but with one channel only. 
* It tends to be the case that convolutional layer is paired with a **`pooling layer`**. We won't cover these in any detail, but its sufficient to know that a pooling layer tries to extract the key information from the convolutional layer while typically halving its size. 
* The diagram below set this out. 





![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Notebook%202%20Convolution%20and%20Pooling%20Layers.png?raw=true)

**How does these convolutional layers combine together?**
* This describes just one convolutional layer. A state of the art model has dozens of layers. So as our input is passed through the layers, more and more feature maps are created and more pooling is done.
* This makes the input array longer but wider??
* Conventions on how we add filters as the model builds
* And means we flatten it
* *add image of overall conv layer*

* **Add in the terminology here of that the filters and outputs are typically called - see FC CV section**
* **How does convolution work on the non-image layer - i.e do we take array (row, col, 28) and it become (row, col, 1).**

In [0]:
## lets build our convolutional base
## we use the Sequential API but use .add() rather than passing the layers in as a list

# build model using sequential -> use .add not list to show difference 
model = models.Sequential() 

In [0]:
# start adding layers. input shape has been defined, including the channel value we added via reshape earlier
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
# max pooling layers
model.add(layers.MaxPooling2D((2,2)))
# this is then repeated to build 
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# max pooling layers
model.add(layers.MaxPooling2D((2,2)))
# additional convolutional layer 
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

In [0]:
# print model 
model.summary()

**How do we get to the parameter count?**
* The parameters of a convolutional layer are defined by:
$$((filter\ height\ \times filter\ width)\ +\ bias\ term)\ \times number\ of\ filters$$

* The bias term is a value of 1 so the number of parameters for the first convolutional layer is:
$$((3 \times 3)+ 1)\ \times 32 = 320$$

In [0]:
# exercise with parameter count


**How do we get to the shape?**
* Answer

**So that's it...?**
* Need a classification layer
* Takes the final output shape of the Conv layers. 
* Dense layers take a 1d array so we need to flatten first

In [0]:
# flatten 
model.add(layers.Flatten())
#
model.add(layers.Dense(64, activation='relu'))
# 
model.add(layers.Dense(10, activation='softmax'))

In [0]:
# complete model summary
model.summary()

###3.1 Regularization

**What is regularization?**
* Dropout
* Weight Decay 
* Soemthing else.

**So, what did we cover in this section?**
* .add()

**How does it add to our existing knowledge?**
* Two ways of using Sequential 

**What else can I learn to improve my knowledge?**
* Detail of convolution - see advanced notebook 3. Francois Chollet, Cezanne, C Olah. look this up and write it out for yourself. Convolution arxiv paper
* different ways of specifying the conbolution 
* Other things include...

## 4. Training 

### 4.1 Building our training loop

**What did we do in the previous notebook/**
* Compile with...
* See what the differences are...

In [0]:
# Lets build a training loop again (with callbacks and learning rate, and validation)
# need to add in callbacks here
model.compile(optimizer = 'adam',
             loss = 'sparse_categorical_crossentropy',
             metrics = ['accuracy'])

**What is different?**
* Validation set, Callbacks, learning rate

**What is validation set?**
* Answer

**What is a callback?** 
* Answer here

**What is the learning rate?**
* Answer

In [0]:
# here is the training loop in action
model.fit(train_images, train_labels, epochs = 5)

**Again, what is going on with the training loop?**
* Answer 

**Did we do any good?**
* Answer

In [0]:
# plot 


In [0]:
# results


**How will we perform on the test set?**
* Answer

In [0]:
# 
test_loss, test_acc = model.evaluate(test_images, test_labels)

###4.2 Under and Overfitting

**What is this?**
* Answer 

**What can we do to improve it?**
* Regularization 

In [0]:
# code here showing options for changing 


**What do different learning rates do?**
* Answer
* Let's try different approaches 

In [0]:
# plot learning rate here 


In [0]:
# maybe more code here


In [0]:
# pull it all together here for more training and show results 


### 4.3 Saving Models

**What do we mean by *saving a model*?**
* We can save our progress as we train our models. We can save our progress in two ways:
  * *during training*, so that our models are saved after each epoch (or, after an epoch that shows model improvement). 
  * *after training*, so that our model has completed its training before we save it
* Of course, we can opt to save the model both during *and* after training. 
* Using `tensorflow 2.0`, we can opt to save the model either manually, i.e. after the model has trained, or by using callbacks - i.e. incorporating saving into the training process. 
  
**What are we saving?**
* We can save the model weights only, the full model (including the weights and the architecture), and the optimizer state.
* Its useful to remember that when we are training a model, the parameters we are updating during the training process are the weights at each layer of the model. Our aim is to train on data, labels pairs that mean we can predict effectively on unseen data using the weights we have trained. So it is these weights that are saved, optionally along with the model architecture. 
* Optimizer state. We haven't focussed too much on optimizers, but remember that this is the way that the model weights are updated. The size of the update is set by the (user defined) `learning rate`. When we save a model we can therefore save the `optimizer-state`, meaning we can continue training a loaded model from the state it was in when the model was saved.  

**Why would we save only the model weights?**
* Space, size? 
* Don't want to associate it with the model architecture for some reason? 

**Why save a model?**
* Saves on training time 
* Deploy model 
* Train later

**Restore**
* Once we have saved our weights and/or model, we can restore the model in a couple of different ways. If we decide to save the weights only, we need to create an identical model to the one that was used to create our weights. If we saved both the model and weights, we can load this entire model.

**Why do we need an identical model?**
* Blah 

**What are the ways of doing it?**
* In this tutorial (notebook 2), we will look at saving and loading model weights and model + weight manually, i.e. after training. 
* In notebook 3, we will look at how to save during and after training using callbacks. 
* We will use the Keras API. Note there are some other ways to save the model covered in the tensorflow tutorials 

#### 4.3.1 Saving and Loading Weights Only

In [0]:
# 
model.save_weights('./checkpoints/my_checkpoint')

In [0]:
# to load the weights we need to create a new instance of the same model architecture 
new_instance = sequential_model...

# then we can load the weights 
new_instance.load_weights('./checkpoints/my_checkpoint')

## perhaps extract a weight from one of the layers

#### 4.3.2 Saving and Loading an entire model

In [0]:
## just look into this - might need to create a new instance here
## or perhaps save new_instance and compare to above

# this saves it to the HDF5 format
model.save('my_model.h5')

In [0]:
# recreate the saved model, including weights and optimizer ## where do i look to extract that?
new_model = keras.models.load_model('my_model.h5')

**So, what did we cover in this section?**
* Answer

**How does it add to our existing knowledge?**
* Answer

**What else can I learn to improve my knowledge?**
* Detail 
* Other things include...

##5. Inference 

In [0]:
# load our model up 


In [0]:
# show how the model performs (re)using some code from previous notebook


## 6. Summary

**Did do:**
* Training - validation, overfitting, 

**Didn't do:**
* callbacks?