# Practical Week 9: Neural Nets and Deep Learning

In this practical you will implement simple convolutional neural nets (CNNs) to classify images. First, we will consider the well-known MNIST Handwriting Recognition example to familiarise ourselves with the Keras Tensorflow library that we will use. Then, you will implement a simple CNN that is capable of identifying what type of object is shown in an image.

This work is not assessed, and you do not need to submit it. Please ask questions if you are facing difficulties with any of the content in this practical.



For this practical, we may want to use a GPU. Training deep neural nets on a CPU will take a long time.
Google Colab provides us with some limited access to GPU-enabled kernels. There are some restrictions on uptime (~3 hours) and maximum GPU usage per day/week.

To switch to a GPU-enabled kernel, go the *Runtime* menu at the top of the page, select *Change Runtime Type*, and select *GPU* under *Hardware accelerator*.

Once you have done that, the kernel will restart.
Verify that the GPU is indeed available by executing this cell:

In [None]:
!nvidia-smi

If a GPU is available, we can see the GPU type (model name), GPU memory, and some further information about its state. Google Colab currently provides Tesla K80 and T4 GPUs. These are not the latest hardware, but will suffice for our purpose. After all, they are free to use!

If you can't use Google Colab and don't have a GPU, you can still run the code but it will take significantly longer. What may take 2 minutes on a GPU can take more than 30 minutes on a fast CPU!

Next, load the relevant libraries.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Conv2D, BatchNormalization, MaxPooling2D, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from sklearn.metrics import classification_report


## Activity 1: Load MNIST

The MNIST dataset consists of 60,000 images of handwritten digits for training, and another 10,000 images for testing. Here are a few examples of the images:

![MNIST Example Images](https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png)

Our task is to create a neural net that can infer from the images which digit (0,...,9) is shown on each image.

First, let's load the MNIST dataset and examine the data.

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

We learn that the training set consists of 60,000 square images of size 28x28, and that the test set contains 10,000 images of the same size. Each image is associated with the target label. 

Let's look at the first training image:

In [None]:
plt.imshow(X_train[0], cmap='gray');

It appears to represent the digit '5'.

Let's verify that the correct label associated with the first training image is indeed '5':


In [None]:
y_train[0]

Next, let's examine the distribution of the labels. You can use [`np.unique()`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) to determine the frequency of each label in the training vector (use `return_counts=True`).

Is the data set balanced?

In [None]:
# TODO

We see that each class appears about equally frequent, roughly 9-11% each, in the dataset. Thus, the dataset is balanced.

Let's see how the image is represented in the data set:

In [None]:
X_train[0]

We have a 2-dimensional array of size 28x28, where the first dimension represents the rows and the second dimension represents the columns. 

We see that each entry in the image is a single value. That is, the image is a monochrome image (greyscale). Each value in the array reprsents a pixel intensity. In the image, maximum intensity (255) is shown as white colour, while zero is black.

Let's see what the pixel value is at row 10 column 9. (The top left of the image is row 0 column 0.)

In [None]:
# TODO

Next, let's look at the value ranges in this image and in all images. Use [`min`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.min.html) and [`max`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html).

In [None]:
# TODO

We see that the pixel values in the first training image range in 0,...,255. The same range is obained for the full data set. That is, there are no other images that have values outside this range.

## Activity 2: Transform the Data

To feed a 2-dimensional image into a neural net, we must first transform it into a 1-dimensional array. In this case, we must *flatten* the 28x28 array into an array of length 784. We can use [`np.reshape`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html) do perform this task for all images in the entire training set at once. Thus, the 60000x28x28 array becomes a 60000x784 array. We transform the test set in a similar way.

In [None]:
X_train = X_train.reshape(60000,784)
X_test = X_test.reshape(10000,784)


In [None]:
X_train.shape, X_test.shape

Next, we need to **normalize** the data. Neuronal nets work best when the input values are all small numbers in the range 0,...,1 or -1,...,1. Since the pixel intensities in the MNIST dataset range over 0,...,255, we must normalise them to 0,...,1. Do this by dividing each value in the training and test set by 255.

In [None]:
# TODO

Verify that all values are in the range 0,...,1.

In [None]:
X_train.min(), X_train.max()

Next we look at the target labels associated with the images. Since there are 10 labels (digits '0',...,'9'), our neuronal net will have ten output units, one for each class. To train such a network, we must change the way the target label is represented. 

Instead of using a single variable that can take on ten different values, we will use ten different target variables which each can take on either zero or one. If a variable has value `1` then this means that the target label is the class associated with that variable. Since each image is associated with a unique class, at most one of these variables can have value 1, all other variables must be zero for each image.

This encoding is called `one-hot encoding`, since at most one of the variables is non-zero for each image. Suppose that we introduce ten target variables, one for each digit. In this encoding, the first image (class 5) would be encoded as `[0,0,0,0,0,1,0,0,0,0]`. The variable representing class 5 is set to one while the other variables are set to zero.

Next, we will change `y_train` and `y_test` to use one hot encoding. Function [`to_categorical`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) does this for us.

In [None]:
Y_train = to_categorical(y_train, 10)
Y_test = to_categorical(y_test, 10)

We verify that the result has the correct shape and content:

In [None]:
# TODO

We obtain the correct 60000x10 and 10000x10 shape. The first few rows of the 2-dimensional array look correct, too. They each have exactly one 1 in the position of the target class, and are zero elsewhere.

## Activity 3: Train a Fully Connected Feed Forward Network

Now that we have prepared the data, we can train a fully connected feed forward neural net. 

We will create a simple network that consists of the following layers:

* Hidden layer 1: 512 units, ReLU activation, followed by Dropout with probability 0.2
* Hidden layer 2: 512 units, ReLU activation, Dropout (0.2)
* Output layer: 10 units (one per class), softmax activation

For each layer, we specify the number of units, the activation function. For units in the hidden layers we also adopt Dropout regularisation to avoid overfitting. 

Each unit in the layers is connected to each unit in the layer below (or the input if there is no lower layer). This architecture is called a *fully connected* (*"dense"*) network.

The input to this network will be a vector of length 784 (one input per pixel in the input image).

Let's create this network architecture using Keras: 

we create a [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) net architecture where we can add the layers one after another. We begin by adding the [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) connections between input and the first hidden layer. Here, we must specify the number of hidden units and the size of the input that the network will receive. Next, we add the [`relu`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Activation) activation function and [`Dropout`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) for the first hidden layer. We repeat this for the second layer, omitting the input size specification (as Keras infers this from the units in the layer below). Finally, we add the output layer units.

In [None]:
def get_ff_mnist_model():
  model = Sequential()
  # Hidden layer 1
  # (784,) means a vector of length 784
  model.add(Dense(512, input_shape=(784,)))
  model.add(Activation('relu'))
  model.add(Dropout(0.2))
  # Hidden layer 2
  model.add(Dense(512))
  model.add(Activation('relu'))
  model.add(Dropout(0.2))
  # Output layer
  model.add(Dense(10))
  model.add(Activation('softmax'))
  return model
model = get_ff_mnist_model()

Let's print and verify the structure of the model.

In [None]:
model.summary()

We see that the elements we added are stacked on top of each other (the input enters at the top and the output is produced by the bottomost layer). For each layer, we wee the output shape that the layer produces, and the number of learnable weights ("Param #") that the model contains. We see that the net has 669,706 parameters that it must learn from the data.

The `None` value in the first dimension on each layer represents the sample index in our data set. Since the model can be trained with any number of samples (we haven't told it how many there are), the value of this dimension is `None` (that means "variable").

Next, we must compile the model. Keras is built on top of Tensorflow, which is able to transform the model into code that can run directly on GPUs. This transformation is called model compilation. 

When compiling a model, we must specify a loss function, an optimiser, and one or more metrics. 

* The loss function we will use here is called *categorical cross-entropy*, and is a loss function that is well-suited to comparing two probability distributions. This is a good loss function for our classification problem, since we wish to minimise the differences between the distribution of `y_train` and the distribution of labels predicted by the model. Ideally, these two would be identical and the loss would be zero.

* The optimiser helps to minimise the loss by gradient descent. It determines how to update the weights in the model to reduce the loss. This process is guided by a hyperparameter called *learning rate*. A larger learning rate may help the model learn faster, but it may not be able to learn the optimal weights if the learning rate is too large. We will use the popular *Adam* optimiser here, using its default learning rate.

* The metrics measure how well the model is doing on the classification task. Here, we will use *Accuracy* as the metric to monitor. 


Let's compile our model.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Now we are ready to train the model.

Recall that neural nets are trained iteratively in *epochs*. In each epoch, the entire training data is fed into the model. Since even modern GPUs don't have sufficient memory to take all training data at once, we feed the data in batches. We must decide how many samples per batch size we shall use, and for how many epochs we may want to train our model.

Here, we will use 128 samples per batch and train for 5 epochs. 

To be able to select a good model that does not suffer from overfitting, we hold out 20% of the 60,000 training samples. The optimiser will compute loss ("val_loss") and accuracy ("val_accuracy") on this held-out data set so that we can detect when the model begins to overfit. Recall that an increasing validation loss and decreasing validation accuracy are signs of overfitting.

Keras provides a [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) function to train the model. It is similar to the fit function we have been using in scikit-learn in the past practicals. The fitting function returns a *history* object that we can interrogate for information about each epoch. 

In [None]:
history = model.fit(X_train, Y_train, batch_size=128, epochs=5, validation_split=0.2, verbose=1)

Training this model on a CPU takes about 7 second per epoch, for at total of ~34 seconds. On a GPU, we can achieve the same in about 2 seconds per epoch.

We wee that the optimiser has reduced the loss from ~0.28 to ~0.047 while accuracy has increased from ~0.92 to ~0.98 on the trainig set as training progressed. At the same time, accuracy on the validation set increased from 0.96 to 0.975 after epoch 3, then started to decline slightly. This tells us that the model obtained at the end of epoch 3 is perhaps the one we shall select. (Your results may differ slightly.)

Let's plot the learning curve to confirm:

In [None]:
training_df = pd.DataFrame.from_dict(history.history).assign(epoch=np.array(history.epoch)+1)
sns.lineplot(data=training_df, x='epoch', y='accuracy', color='green');
sns.lineplot(data=training_df, x='epoch', y='val_accuracy', color='red');

We confirm that the model at epoch 3 yields best validation accuracy.

Now we can re-create the best model by stopping training after 3 epochs. We re-create the model, compile, and train. However this time we don't need the validation split as we have already chosen the best model.

In [None]:
model = get_ff_mnist_model()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=128, epochs=3, verbose=1)

## Activity 4: Evaluate the model

Now that we have trained our model, let's test how well it generalises to unseen data.

In [None]:
model.evaluate(X_test, Y_test)

We see that the accuracy on the test set closely matches the accuracy on the validation set during training. This builds our confidence that the model has learned well.

To analyse its performance in mode detail, let's obtain predictions for the each of the test set images:

In [None]:
Y_pred_proba = model.predict(X_test)

In [None]:
Y_pred_proba.shape

In [None]:
Y_pred_proba[:10].round(2)

For each image, we obtain 10 values, one for each class. The largest value denotes the class that the model predicts. If there is a single largest value then the model is "sure" of its prediction (but it may be sure yet be wrong!); otherwise if there are multiple values that are close to being largest then the model does not yield a clear unique prediction. After rounding to two decimal places, we see that for many images there is a single class predicted (a single 1 in each row). For some images, multiple non-zero values exist, but one dominates the others. This means that there is a good margin between the prediction of the most likely class and the next less-likely class. Although the classifier predicts a single most likely class for the samples we have inspected, this does not tell us if that prediction is actually correct. We need to compare this to the ground truth in the test set.

Let's find the predicted class (the index where the predicted probability is largest) in each row, and compute a classification report.

In [None]:
y_pred = np.argmax(Y_pred_proba, axis=1)

In [None]:
# TODO

We observe that all metrics are close to one. Our classifier works well. :-)

# Activity 5: Train a CNN

We will repeat the above process with a Convolutional neural network architecture.

We begin by preparing the data in the same way as before, except that
* we do not flatten the images into a 1-dimensional array, and
* we add an additional dimension at the end (for channel).

We keep its two dimensions since the CNN will apply convolution operations to the 2-dimensional structure, and we add the additional dimension since the CNN architecture assumes that there is a channel dimension present, even if there is only once channel in a monochrome image.


In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
# add an additional dimension to represent the single-channel
X_train = X_train.reshape(60000, 28, 28, 1)
X_test = X_test.reshape(10000, 28, 28, 1)
X_train = X_train / 255
X_test = X_test / 255

We'll verify shape and contents of the first image:

In [None]:
# TODO

We see that the additional dimension (1 at the end of the shape vector) has been added, and that each value in the image array is now a singleton array instead of a plain floating point number.

We can use one hot encoding for the target labels in the training and the testing set as before.

In [None]:
# TODO

Next, we create a CNN.

We'll use a simple architecture with 4 layers of convolutions, followed by 2 fully connected layers. Each convolution block begins with convolution filters of size 3x3. The number of filters varies in different layers; the first two layers use 32 [`Conv2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) filters whereas the subsequente two layers use 64 filters. After the convolution, [`BatchNormalization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization) is applied. Batch normalization is a technique to rescale the activations to improve training performance. After normalization, the [`relu Activation`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Activation) is applied, followed by 2x2 [`MaxPooling`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D). This process repeats 4 times. At the end, we obtain 64 4x4 feature maps as the output of the last MaxPooling operation. These 64 matrices are then flattened into a 1-dimensional vector of length 1024. The remaining layers add a fully connected [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) network on top of that vector. [`BatchNormalization`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization) is used to improve the training, and [`Dropout`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) is applied to combat overfitting. 


In [None]:

def get_mnist_cnn_model():
  model = Sequential()

  # Convolution Layer 1: 32 feature maps
  model.add(Conv2D(32, (3, 3), input_shape=(28,28,1)))
  model.add(BatchNormalization())
  model.add(Activation('relu', name='layer1'))

  # Convolution Layer 2: 32 feature maps
  model.add(Conv2D(32, (3, 3)))
  model.add(BatchNormalization())
  model.add(Activation('relu'))
  model.add(MaxPooling2D(pool_size=(2,2), name='layer2'))

  # Convolution Layer 3: 64 feature maps
  model.add(Conv2D(64,(3, 3)))
  model.add(BatchNormalization())
  model.add(Activation('relu', name='layer3'))

  # Convolution Layer 4: 64 feature maps
  model.add(Conv2D(64, (3, 3)))
  model.add(BatchNormalization())
  model.add(Activation('relu'))
  model.add(MaxPooling2D(pool_size=(2,2), name='layer4'))
  model.add(Flatten())

  # Fully Connected Layer 5
  model.add(Dense(512))
  model.add(BatchNormalization())
  model.add(Activation('relu'))

  # Fully Connected Layer 6                       
  model.add(Dropout(0.2))
  model.add(Dense(10))
  model.add(Activation('softmax'))

  return model
model = get_mnist_cnn_model()

Inspect the model structure.

In [None]:
# TODO

We can see that the layers progressively reduce the size of the image (from 28x28 down to 4x4) while doubling the number of channels (feature maps) in each layer. Eventually, the 64 feature maps of size 4x4 are flattened into a 1-dimensional vector of length 1024, which is then fed into two layers of fully connected layers that compute the classification output.

Let's compile and train the model as before. It will take ~ 1 minute when using a GPU.

In [None]:
# TODO

Plot the accuracy on the training set and the validation set, and determine the optimal number of epochs.

In [None]:
# TODO

We conclude that the validation result has stabilised after 2 (or 3) epochs. We now re-train the model on the full training set, stopping after 2 epochs.

In [None]:
# TODO

After ~22 seconds the model has completed training (when using a GPU).

## Activity 6: Evaluate the CNN Model

Let's evalute the model on the test set. Use the same procedure as for the fully connected network we trained earlier.

In [None]:
# TODO

We find that all metrics are very close to 1. The model works even better than the fully connected network we trained earlier!

However, it is not perfect. Let's see where the CNN errs. Let's see on which images the model produces results that differ from the ground truth in the test set.

In [None]:
# images it gets wrong
wrong_idx = np.nonzero(y_pred != y_test)[0]
len(wrong_idx)

We find that there are 116 images where predictions are wrong. (Your numbers may differ.) Let's visualise some of them.

In [None]:
def show_image(idx):
  wrong_img_data = X_test[idx].reshape(28,28)
  print(f'Image {idx} predicted as {y_pred[idx]} but is actually {y_test[idx]}.')
  plt.imshow(wrong_img_data, cmap='gray')
for idx in wrong_idx[:5]:
  show_image(idx)
  plt.show()

Some of them are actually quite difficult to decipher.

Finally, we can peek inside the CNN and visualise how the network sees an image. Let's pick the first training image (a '5') and visualise the feature maps that the model generates at the output of each of the 4 convolution blocks in the model. (The corresponding layers are labelled 'layer1'...'layer4' in the model.)

In [None]:
from tensorflow.keras.models import Model

In [None]:
def visualise_features(model, layers, image):
  outputs = [model.get_layer(layer).output for layer in layers]
  vis_model = Model(inputs=model.inputs, outputs=outputs)
  image_data = np.expand_dims(image, axis=0)
  feature_maps = vis_model.predict(image_data)
  for fmap in feature_maps:
    channels = fmap.shape[-1]
    square = int(np.ceil(np.sqrt(channels)))
    for ix in range(channels):
      # specify subplot and turn of axis
      ax = plt.subplot(square, square, ix+1)
      ax.set_xticks([])
      ax.set_yticks([])
      # plot filter channel in grayscale
      plt.imshow(fmap[:, :, ix], cmap='gray')

In [None]:
for layer in ['layer1','layer2','layer3','layer4']:
  print(f'Layer {layer}:')
  visualise_features(model, [layer], X_train[0] )
  plt.show()

## Activity 8: Train a CNN for Image Classification

Now that we have successfully built a CNN for handwritten digits, let's expand our work to image classification. You will create a CNN classifier for images across a wide range of different objects based on the CIFAR10 data set.

You will follow the same process as before: understand and prepare the data, define and train a model, and evaluate the model on the test set. 

Let's begin by loading the CIFAR10 dataset. The dataset consists of 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

In [None]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

In [None]:
X_train.shape, X_test.shape

In [None]:
y_train.shape, y_test.shape

In [None]:
# The CIFAR labels happen to be arrays. Reshape the arrays to obtain 1-dimensional vectors.
y_train = y_train.reshape((-1,))
y_test = y_test.reshape((-1,))

We see that there are 60,000 images in total. Each image is 32x32 in size and has three channels (RGB).

As for MNIST, we must normalize the images to the range [0,..,1] and one-hot encode the target class labels.

In [None]:
# TODO

Let's plot a few images to see what is in this data set.

In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer','dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i])
    plt.xlabel(class_names[y_train[i]])

Although the images are a little blurry, we can make out what is on each of them reasonably easily.

Let's see if we can adapt the CNN we used earlier to this new data set.

We will use the same layers, but change the input dimensions to 32x32x3 to match the CIFAR10 images.


In [None]:
# TODO

Summarise the model to see its structure:

In [None]:
# TODO

Then, compile and train the model.

We follow the same steps as for MNIST but train for 10 epochs and use a validation split of 10%. This will take ~3 minutes on a GPU.

In [None]:
# TODO

Plot the learning curve determine the number of epochs that yield the model that performs best on the validation split.

In [None]:
# TODO

We wee that the validation results climb until the end of epoch 6, and then decline and fluctuate. Although the model at epoch 10 has slightly higher validation accuracy, the gap between the training and the validation split has become large (which you may recall that this is a sign that the model has overfitted). Hence, we will select the model at 6 epochs. (Your results may differ.)

Let's train the model on the full training set for 6 epochs. This may take ~2 minutes on a GPU.

In [None]:
# TODO

Now, test how well the model performs on the test set. Construct a classification report to assess the model's results for each class.

In [None]:
# TODO

Looking at the F1 scores, we learn that the model performs reasonably well for classes automobile, frog, horse, ship, and truck. It performed worst overall for cat and dog. 

Construct a confusion matrix to see how the model errs. Use the function below to plot the matrix, since plugging the Keras classifier into the scikit-learn function plot_confusion_matrix() will not work.

In [None]:
from sklearn.metrics import confusion_matrix
def plot_cm(y_true, y_pred, class_names):
  cm = confusion_matrix(y_true, y_pred)
  cm_df = pd.DataFrame(cm, index=class_names, columns=class_names)
  sns.heatmap(cm_df, annot=True, fmt='d'); # rows are true labels, columns are predictions

In [None]:
# TODO

We see that cats are often mistaken as deer or frog, and dogs are often mistaken as cats.

Let's look at some of the images that the model gets wrong.

In [None]:
wrong_idx = np.nonzero(y_pred != y_test)[0]
print(len(wrong_idx),' images classified incorrectly')

def show_image(idx):
  print(f'Image {idx} predicted as {class_names[y_pred[idx]]} but is actually {class_names[y_test[idx]]}.')
  plt.imshow(X_test[idx], cmap='gray')
for idx in wrong_idx[:5]:
  show_image(idx)
  plt.show()

We can also look at some of the images that the model predicted correctly.

In [None]:
correct_idx = np.nonzero(y_pred == y_test)[0]
print(len(correct_idx),' images classified correctly')

def show_image(idx):
  print(f'Image {idx} correctly predicted as {class_names[y_pred[idx]]}.')
  plt.imshow(X_test[idx], cmap='gray')
for idx in correct_idx[:5]:
  show_image(idx)
  plt.show()

This concludes our investigation into CNNs for CIFAR10. The model we have designed here served us to learn how to construct, train, and analyse a model. The model is however still far from the best possible model that can be constructed. It is possible to design improved models that exhibit >96% accuracy on CIFAR10, and ~99.8% on MNIST. You can see the [list of high scores](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d3130). To achieve this, more advanced models and data augmentation techniques must be used. If you are interested, [read about further ways to improve CNN models for image classification](https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-cifar-10-photo-classification/). 

This concludes the practical. You should now be able to define a given CNN architecture in Keras and train and evaluate the model.