When getting into deep learning for the first time, you will hear a lot of common terms being thrown around. The first is Tensorflow and Pytorch. These are two competing deep learning frameworks. Tensorflow was created by the Google Brain team and Pytorch was developed by Meta AI. Each of these frameworks have their own set of pros and cons which we will not get into here[<sup id="fn1-back">1</sup>](#fn1). For the purpose of this exemplar we will be using Tensorflow. 

The second common term you will hear is Keras. Keras is an API written in python, it interfaces with many different deep learning backends and makes building models considerably easier. Thankfully, we do not need to concern ourselves with how this all works because as of Tensorflow v2, keras has been fully integrated. 

[<sup id="fn1">1</sup>](#fn1-back)For more on the pros and cons of each framework refer to this great blog post: https://www.v7labs.com/blog/pytorch-vs-tensorflow.


Now, with our new found knowledge lets see if tensorflow is installed.

In [1]:
import tensorflow as tf
print(tf.__version__)

2023-07-04 19:24:46.978528: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-04 19:24:47.015074: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-07-04 19:24:47.015578: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.12.0


If everything went well with setting up the virtual environment then you should see a tensorflow version >= 2.12

Next we will import some packages that may prove to be useful later on.

In [2]:
import numpy as np 
import pylab as pl 
import seaborn as sns #pretty plots

The first thing we are going to do is fimilarise ourselves with Tensorflow. As with any programming related thing, the easiest way to do this is through example. So, we will be following a small tensorflow tutorial. The first thing we will need is some data. Luckily, tensorflow comes prepackaged with some data.


In [None]:
data = tf.keras.datasets.mnist.load_data()

This is the very popular Mnist dataset, which you may or may not have seen in other tutorials. It's a dataset of hand written numbers from 0 - 9. While that is a fairly boring dataset, it will work just fine for our purposes. 

The next think we need to do is explore the dataset: 

In [None]:
print('Type: ', type(data))
print('Shape: ', len(data))

Okay so we know our data is a tuple with length 2. So lets extract that into its own variables (the variable names may contain spoilers for what they are). Then repeat the steps above:


In [None]:
train_data, test_data = data

print('---------Train Data-----------')
print('Type: ', type(train_data))
print('Shape: ', len(train_data))


print('---------Test Data-----------')
print('Type: ', type(test_data))
print('Shape: ', len(test_data))


Extract once more.....

In [None]:
x_train, y_train = train_data
x_test, y_test = test_data

print('---------Train x_data -----------')
print('Type: ', type(x_train))
print('Shape: ', len(x_train))

print('---------Train y_data -----------')
print('Type: ', type(y_train))
print('Shape: ', len(y_train))



Now we're getting somewhere! So now we can see that we have numpy arrays, with numpy arrays we use a more informative command instead of len() to see what the array looks like:

In [None]:
print('---------Train data -----------')
print('x Shape: ', x_train.shape )
print('y Shape: ', y_train.shape )

print('---------Test data -----------')
print('x Shape: ', x_test.shape )
print('y Shape: ', y_test.shape )


Now we have a picture of what our data is. We can see that the training data consists of 60000 images which are 28X28 in dimension and the test data consists of 10000 with the same dimensions. We can also see the x_data contains the images and y_data contains the labels. With this, lets use more informative variable names:

In [None]:
train_images, train_labels = x_train, y_train 
test_images, test_labels = x_test, y_test 


Now lets have a look at one of the images with the corresponding label:

In [None]:
cmap = sns.color_palette("Blues", as_cmap=True) #better colourmap from seaborn

image_idx = 1 #image to plot, change this number to plot different images from the training set

pl.figure(figsize = (5,4))
pl.imshow(train_images[image_idx],cmap  = cmap)
pl.colorbar()
pl.grid(False)
pl.show()
print('label:', train_labels[image_idx])

In [None]:
print('Min: ', np.min(train_images[image_idx]))
print('Max: ', np.max(train_images[image_idx]))
print('Ylabels: ', np.unique(train_labels))

After playing around with the dataset we notice two things. The first is that the images have a flux/brightness range of 0 - 255. For optimal weight training, we need our dataset to be normalised (sometimes even standardised, depending on what you're trying to do). So lets do that now[<sup id="fn2-back">2</sup>](#fn2): *** Explain norm v std


[<sup id="fn2">2</sup>](#fn2-back) Remember that everything you do to the training set, you must also do to the test set. 

In [None]:
train_images = train_images/255.
test_images = test_images/255. 

print('Min: ', np.min(train_images[image_idx]))
print('Max: ', np.max(train_images[image_idx]))

The second thing we notice is that the labels range from 0 - 9. We are trying to classify each of the images into one of these types. However, having the labels in this form will not work for the easy model we will be using. We need our labels to be a binary vector. The most common way to change categorical labels into binary vector labels is called One-Hot Encoding[<sup id="fn3-back">3</sup>](#fn3). Our data has 10 categorys, therefore we can represent it as a 10 digit binary label, with all digits except for one being 0. Lets have a look at some examples:




[<sup id="fn3">3</sup>](#fn3-back) For more on One-Hot encoding see: https://towardsdatascience.com/how-and-why-performing-one-hot-encoding-in-your-data-science-project-a1500ec72d85

In [None]:
num_classes = 10

label = 7 #change this number to see different binary representations


print('Label:  ',label)
print('Binary: ',tf.keras.utils.to_categorical(label, num_classes))

In [None]:
# convert class vectors to binary class matrices - this is for use in the categorical_crossentropy loss

train_labels = tf.keras.utils.to_categorical(y_train, num_classes)
test_labels = tf.keras.utils.to_categorical(y_test, num_classes)

In [None]:
# reshape the data into a 4D tensor - (sample_number, x_img_size, y_img_size, num_channels)
# because the MNIST is greyscale, we only have a single channel - RGB colour images would have 
image_shape = train_images[0].shape
train_images = train_images.reshape(len(train_images), image_shape[0], image_shape[1], 1)
test_images = test_images.reshape(len(test_images), image_shape[0], image_shape[1], 1)


Now we have everything we need to move onto building the network. Tensorflow makes this really easy and intuitive. All we need to build a network is to know what our architecture is going to be then add each layer line by line. An example of this is shown below. 

In [None]:
def make_model_simple(num_classes,input_shape):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(8, kernel_size=(16, 16), strides=(1, 1),
                     activation='relu',
                     input_shape=input_shape))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
    return model

Lets go through this line by line. The first line initiates a tensorflow neural network. In the second line we add the first layer of the network. This first layer is a convolutional layer[<sup id="fn4-back">4</sup>](#fn4). This layer performs the mathematical operation of convolution[<sup id="fn5-back">5</sup>](#fn5) on the input layer and filter. To help with the explaination an illustration of the process is shown below:

![ConvUrl](https://miro.medium.com/v2/resize:fit:2340/1*Fw-ehcNBR9byHtho-Rxbtw.gif "conv")

Here we have a 5X5 input image shown in blue and a 3X3 kernel shown in shaded grey.  The convolution operation starts with the kernel in the top left corner of the input image, it then performs a matrix multiplication operation between the kernel and the part of the image thats underneath it. The result of this is the first pixel of the output image. The kernel is then slid across the image and this is repeated until you reach the top right corner, the kernel is then moved down and back to the left most pixel. This whole process is repeated building the output image pixel by pixel. The output image is shown being built pixel by pixel in white.

There are 3 important parameters to consider in convolution layers. The first is the number of filter (this is another name for kernels) to use, in our case we have chosen 32. The second is the kernel size, this is the dimensions of the kernel, in our case this is 5X5 (in the illustration this is 3X3). The final parameter is strides, this determines the 'sliding' action of the kernel. In the illustration the kernel slides right one pixel until it gets to the end, then slides on pixel down, this means it has a stride of (1,1), which also happens to be the stride of our convolutional layer.




[<sup id="fn4">4</sup>](#fn4-back) For more on CNNs and convolutional layers, see this blog post www.machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/

[<sup id="fn5">5</sup>](#fn5-back) For more on the convolutional operation, see this great video explaination by 3Blue1Brown: www.youtube.com/watch?v=KuXjwB4LzSA

After that really long aside, we can now move on to describing the rest of the network. The second layer is flatten layer. This layer works much the same way as the flatten operation in python, it converts the multidimentional output of the convolutional layer into a 1-D array. 

The final layer of our network is a Dense layer. These are standard nueral network layers, which are just a layer of fully connected neurons. The important parameter in these layers are the number of neurons, which in our case is the number of catagories in our dataset. 


In [None]:
#Calls the function and makes a model
model = make_model_simple(num_classes,train_images[0].shape)


#Compiles the network into a graph
model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])


In the last line we have introduced a few different concepts, lets go through them one by one. First let's talk about the function ```model.compile()```[<sup id="fn6-back">6</sup>](#fn6). This function compiles our network architecture with the neccessary loss function and optimiser into a computational graph. This is a directional graph that expresses mathematical expressions. Computational graphs underpin all neural networks and is what allows forward and back-propagation to work [<sup id="fn7-back">7</sup>](#fn7). 

The loss function, also known as a cost function or objective function, is used to quantify how well our machine learning model is performing on a given task. The primary goal of the model is to minimize this loss function during the training process. The choice of loss function depends on what you are trying to do and the nature of the data being analyzed[<sup id="fn8-back">8</sup>](#fn8). Selecting an appropriate loss function is essential for training a model effectively, in our case we are working with a multi-class classification problem, hence have chosen to use a categorical cross-entropy loss function. 

Optimizers are algorithms or methods used to update the parameters of a model during the training process in order to minimize the chosen loss function thereby improving the model's performance on a given task. The most suitable optimizer depends on the specific task, the architecture of the model, and the size of the dataset. The most common multi-purpose optimiser is called Adaptive Moment Estimation (Adam) and is the one we choose to use here[<sup id="fn9-back">9</sup>](#fn9).



[<sup id="fn6">6</sup>](#fn6-back) www.tensorflow.org/api_docs/python/tf/keras/Model

[<sup id="fn7">7</sup>](#fn7-back) For more on computational graphs and its relation to ML see: www.towardsdatascience.com/evolution-of-graph-computation-and-machine-learning-3211e8682c83#

[<sup id="fn8">8</sup>](#fn8-back) For more on the pros and cons of different loss functions see: www.towardsdatascience.com/loss-functions-in-machine-learning-9977e810ac02

[<sup id="fn9">9</sup>](#fn9-back) For more on optimisers see this great blog post: www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/

Okay, with that all out of the way are we ready to finally train some models? Well not quite yet, we need to introduce two parameters that are fundamental to training a model, namely batches and epochs. 

In the previous section we went through loss functions and optimisers. Models train by calculating the loss for a given dataset and then use the optimiser to find a better set of parameters. Now, depending on the dataset, we not be able to load the entire dataset into memory to calculate the loss[<sup id="fn10-back">10</sup>](#fn10). Instead, the dataset is split into a number of batches. The loss is then calculated for each batch of data and then optimised. One pass through all the batches in a dataset is called an epoch. 

With these definitions we can finally train our first model!

[<sup id="fn10">10</sup>](#fn10-back) For more on when and why to use batches see: https://medium.com/analytics-vidhya/when-and-why-are-batches-used-in-machine-learning-acda4eb00763

In [None]:
batch_size = 256 #Number of datapoints in one batch
epochs = 5  #Total number of training passes through the dataset

model.fit(train_images, train_labels,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1)

In [None]:
#Evaluate the model using the metric chosen above, which was accuracy.
predictions = model.evaluate(test_images,test_labels)

print('')

print('###--------------------------###')
print('###      Simple Model        ###')
print('###--------------------------###')
print('     Loss:      {} '.format(np.round(predictions[0],4)))
print('     Accuracy:  {}%\n\n'.format(np.round(predictions[1]*100,2)))

Okay, even with this super simple model we get pretty good results. But this is to be expect given the easy dataset we have. We could make the results a lot better by adding a few more layers. One has been added for you already. This is a max pooling layer[<sup id="fn11-back">11</sup>](#fn11). See how you can change the preformance of the network by adding different combinations of the four layers you have been introduced to in this tutorial. To get some intiution for what adding more layers and nuerons does to the performance of a network have a play around with this visual toy model provided by tensorflow https://playground.tensorflow.org/.

[<sup id="fn11">11</sup>](#fn11-back) For more on pooling see: www.machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/

In [None]:
def make_model_intermediate(num_classes,input_shape):

    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(16, kernel_size=(5, 5), strides=(1, 1),
                     activation='relu',
                     input_shape=input_shape))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
    ####------------------------------------------------------------------####
    #                                                                        #
    #                                                                        #
    #                                                                        #
    #                                                                        #
    #                    Add more layers here                                #
    #                                                                        #
    #                                                                        #
    #                                                                        #   
    ####------------------------------------------------------------------####
    model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
    
    return model

In [None]:
#This will not run if you didn't add anything to the previous cell. Hint: look at the shapes in the error message, what's wrong here?

model_inter = make_model_intermediate(num_classes,train_images[0].shape)

model_inter.compile(loss=tf.keras.losses.categorical_crossentropy,
                    optimizer=tf.keras.optimizers.Adam(),
                    metrics=['accuracy'])


#Have a play around with the batch sizes and epochs and see what effect they have.
batch_size = 256
epochs = 5

model_inter.fit(train_images, train_labels,
                batch_size=batch_size,
                epochs=epochs,
                verbose=1)

In [None]:
#Evaluate the model using the metric chosen above, which was accuracy.
predictions = model_inter.evaluate(test_images,test_labels)

print('')

print('###--------------------------###')
print('###      Simple Model        ###')
print('###--------------------------###')
print('     Loss:      {} '.format(np.round(predictions[0],4)))
print('     Accuracy:  {}%\n\n'.format(np.round(predictions[1]*100,2)))

The final network we will be introducing here follows the VGG[<sup id="fn12-back">12</sup>](#fn12) network architecture. This is currently the state-of-the-art architecture in image classification. This is of course over-kill for our current toy problem, but it is a very powerful architecture that will be useful for the science case that we will be going through in the next section. This is quite a step up from the other architectures we have used so far, so take your time to have a look at the different layers in the network and use all the resources presented in this notebook to help you understand what each of them do.




[<sup id="fn12">12</sup>](#fn12-back) https://paperswithcode.com/method/vgg

In [None]:
def make_model_VGG(output = 1, loss = 'mean_squared_error', l_rate = 0.01):

    initializer = tf.keras.initializers.GlorotNormal()
    
    
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    model.add(tf.keras.layers.Conv2D(256, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.Conv2D(256, kernel_size=(3, 3), strides=(1, 1),padding ='same',kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))

    model.add(tf.keras.layers.Flatten())

    model.add(tf.keras.layers.Dense(1024,kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))

    model.add(tf.keras.layers.Dense(1024,kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))

    model.add(tf.keras.layers.Dense(1024,kernel_initializer=initializer,use_bias =False))
    model.add(tf.keras.layers.BatchNormalization(beta_initializer=initializer,momentum = 0.9))
    model.add(tf.keras.layers.Activation('relu'))

    model.add(tf.keras.layers.Dense(output,kernel_initializer=initializer,use_bias =False))

    model.compile(loss=loss,
              optimizer=tf.keras.optimizers.Adam(learning_rate = l_rate),
              metrics=[tf.keras.metrics.RootMeanSquaredError()])
    
    return model

