In [None]:
# Initialize Otter Grader
import otter
grader = otter.Notebook()

![data-x](https://raw.githubusercontent.com/afo/data-x-plaksha/master/imgsource/dx_logo.png)

___

#### NAME:

#### STUDENT ID:
___



# **HW3-4: Convolutional Neural Networks**
**(Total 120 points)**



In this homework, you will compare the performance achieved by convolutional neural networks (CNNs) with the fully connected networks and also some shallow learning methods such as SVM in classifying the images from the CIFAR-10 dataset.

## 1. Loading and Exploring the CIFAR-10 Dataset

Run the following cell to load the required modules.

In [1]:
## Load the required modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
from timeit import default_timer as timer

CIFAR is an acronym that stands for the Canadian Institute For Advanced Research. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. This dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Read more about this dataset [here](https://www.cs.toronto.edu/~kriz/cifar.html).

The class labels and their standard associated integer values are listed below:

* 0: airplane
* 1: car
* 2: bird
* 3: cat
* 4: deer
* 5: dog
* 6: frog
* 7: horse
* 8: ship
* 9: truck


Run the following cell without any modifications to load the CIFAR-10 dataset.

In [2]:
## Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

Let's make sure that the number and shape of the training and test images are as described above.

In [3]:
## Run this cell, no need to modify
print('Training: x_train=%s, y_train=%s' % (x_train.shape, y_train.shape))
print('Test: x_test=%s, y_test=%s' % (x_test.shape, y_test.shape))

The next cell plots the first 16 images from this dataset. It is clear that the images are indeed very small compared to modern photographs; it can be challenging to see what exactly is represented in some of the images given the extremely low resolution. Check the 11th image for example.


In [4]:
# Run this cell, no need to modify
int2label = {0: 'airplane', 1: 'car', 2: 'bird', 3: 'cat', 
             4: 'deer', 5: 'dog', 6: 'frog', 7: 'horse', 8: 'ship', 
             9: 'truck'}
fig, axs = plt.subplots(4, 4, figsize=(10, 10))
axs = axs.flatten()
for i, ax in enumerate(axs):
  ax.imshow(x_train[i])
  ax.set_title('This is a %s' % int2label[y_train[i].item()])
plt.tight_layout()
plt.show()

You might think that this low resolution is likely to limit the performance achieved by machine learning algorithms, but you should not underestimate the power of deep learning. Checkout this [leaderbord](https://paperswithcode.com/sota/image-classification-on-cifar-10) to see the performance that top-of-the-line deep learning algorithms are able to achieve on this dataset.

## 2. Data Preprocessing 

For the purpose of this homework, we pick the first 49000 training images for as training set and the last 1000 training images as the validation set. We do not touch the test set until the last part of this homework. 

Run the following cell to get the training, validation, and test sets and their corresponding labels.

In [5]:
## Run this cell, no need to modify

# Valdiation set
x_val = x_train[49000:]
y_val = np.squeeze(y_train[49000:])

# Training set
x_train = x_train[:49000]
y_train = np.squeeze(y_train[:49000])

# Test set
x_test = x_test
y_test = np.squeeze(y_test)

print('Training: x_train=%s, y_train=%s' % (x_train.shape, y_train.shape))
print('Validation: x_val=%s, y_val=%s' % (x_val.shape, y_val.shape))
print('Test: x_test=%s, y_test=%s' % (x_test.shape, y_test.shape))

The pixel values for each image in the dataset are unsigned integers in the range between no color and full color, or 0 and 255. Thus, we need to convert the data type from unsigned integers to floats.

Furthermore, neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such, it is a good practice to normalize the pixel values so that each pixel value has a value between 0 and 1. Dividing the pixel values by the maximum value does the job.

Running the following cell changes the data type of each pixel and normalize their value. We store the pixels as 16-bit (half-precision) float to save the memory without sacrificing much accuracy.

In [6]:
# Run this cell, no need to modify
x_train = x_train.astype('float16') / 255.0
x_val = x_val.astype('float16') / 255.0
x_test = x_test.astype('float16') / 255.0

##  3. Shallow Learning on CIFAR-10

**(Total 40 points)**

Our ultimate goal is to have a model achieving a high accuracy on the **validation set** (why not the test set?). First, let's see how some of the models you learnt in the previous homework (core concepts) perform on this dataset. In case they achieve a high validation accuracy, then there is no need to bother ourselves with the neural nets.

Run the following cell to load the required modules.



In [7]:
## Load the required modules
from sklearn.decomposition import PCA
from sklearn.svm import SVC

Since the models you havve seen in the HW2 do not process images and rather use a vector of features, we first need to flatten the images in our training and valdiation sets. 


**3.1) (5 points)** Flatten the images in the training and validation sets and store the results in `x_train_flat` and `x_val_flat`. 

> **Note:** The shape of `x_train_flat` and `x_val_flat` should be (49000, 3072) and (1000, 3072), respectively.

<!--
BEGIN QUESTION
name: q31
manual: false
points: 5
-->

In [8]:
## Your code here
x_train_flat = ...
x_val_flat = ...

In [None]:
grader.check("q31")


Currently, we have 49000 training data points each with 3072 features. In order to be able to run some of the algorithms in HW2 on this dataset in a reasonable amount of time, we need to reduce the dimensionality of the features. 

There are plenty of ways to reduce the dimensionality of the problem, but here we use PCA. Do not worry if you are not familiar with this method as we have implemented it for you. If you are curious to know how PCA works, check [this](https://en.wikipedia.org/wiki/Principal_component_analysis#:~:text=Principal%20component%20analysis%20(PCA)%20is,components%20and%20ignoring%20the%20rest.) out. 


In [11]:
## Run this cell, no need to modify
combined = np.vstack((x_train_flat, x_val_flat))
pca = PCA().fit(combined)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');

The plot that you observe above depicts how much variance in the data is retained for different choices of the number of principal components. For example, by reducing the number of components to 50 (from 3072), we retain 84.3% of the variance in the data. Run the following cell for different values of `n` to further explore the plot above. 

In [12]:
## Run this cell for different values of n
n = 50
print('By reducing the number of components to %d, \
we retain %s percent of the variance in the data.' \
% (n, np.round(100 * pca.explained_variance_ratio_.cumsum()[n-1], 1)))

Choose how many principal components you want to keep and store it in the variable `your_n`. Then, run the following cell to get the low dimensional training and validation sets (`x_train_reduced` and `x_val_reduced`).

In [13]:
## Your code here
your_n = ...

## Do not modify the following lines
pca = PCA(n_components=your_n) 
pca.fit(combined)
transformed = pca.transform(combined)
x_train_reduced = transformed[:49000]
x_val_reduced = transformed[49000:]

**3.2) (35 points)** Train a kernel SVM using the low dimensional training set that achieves a reasonable training and validation accuracy. You will receive credit according to the following scheme:

 > Full credit if **99% $<$ training accuracy** and  **50% $<$ validation accuracy**.

 > 15 points if **99% $<$ training accuracy** and **45% $<$ validation accuracy $\leq$ 50%**.

 > 0 points otherwise. 

**Make sure you follow these instructions:**

* You should already be familiar with sklearn function [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) from part 5 of HW2. You need to set the following options/hyperparameters in SVC: `C`, `kernel`, `degree`, `gamma`, and `max_iter`.

* The variables `x_train_3` and `y_train_3` contain the training set and its labels, respectively. And `x_val_3`, and `y_val_3` contain the validation set and its labels, respectively. 

 * Although the number of features is significantly reduced (depending on your choice of the number of principal components), it could still take quite a bit of time to train the SVM model on the whole training set. You may want to choose a subset of your training data and their corresponding labels and assign them to `x_train_3` and `y_train_3`, respectively. Set `n_train` to choose the first `n_train` training data points.
   > For example, you may choose to set `n_train = 1000` in which case you are using the first 1000 training data points and their corresponding labels to train your model.

 * You must not modify neither of `x_val_3` nor `y_val_3`.   

* Note that the number of components you chose to keep in the previous cell, `your_n`, is important for both the training time and the validation accuracy of your model.

* The autograder will be timed out after 30 minutes. Adjust the running time of your code (the entire notebook) accordingly. 

The test "q32a" checks if you acheive >99% training accuracy and >45% validation accuracy.

<!--
BEGIN QUESTION
name: q32a
manual: false
points: 15
-->

In [14]:
## Your code here
...

## Do not modify the following lines
x_train_3 = x_train_reduced[:n_train] 
y_train_3 = y_train[:n_train] 
x_val_3 = x_val_reduced
y_val_3 = y_val

time_start = timer()
svm_model = SVC(C=C, kernel=kernel, degree=degree, gamma=gamma, max_iter=max_iter)
svm_model.fit(x_train_3, y_train_3)
time_end = timer()
print ("Wall time for training the model: {0} second".format(time_end-time_start))

time_start = timer()
train_acc = np.mean(svm_model.predict(x_train_3) == y_train_3)
print('Training Accuracy = {0:f}'.format(train_acc))
val_acc = np.mean(svm_model.predict(x_val_3) == y_val_3)
print('Validation Accuracy = {0:f}'.format(val_acc))
time_end = timer()
print ("Wall time for computing the training and validation accuracies: {0} second".format(time_end-time_start))

In [None]:
grader.check("q32a")

The test "q32b" checks if you acheive >99% training accuracy and >50% validation accuracy.

<!--
BEGIN QUESTION
name: q32b
manual: false
points: 20
-->

In [21]:
## Dummy Cell, DO NOT MODIFY

In [None]:
grader.check("q32b")

Note that a random classifier would achieve an accuracy of about 10% on the CIFAR-10 dataset. The 50% accuracy (although it's the validation accuracy, not the test accuracy) we achieved using SVM is a big step up from a random classifier, but still it is far from being ideal. 

You may also pick another model from HW2, train it the way we trained the SVM model above, and see if you could achieve a higher accuracy.

## 4. Fully Connected Neural Networks on CIFAR-10
**(Total 40 points)**

In this part, you will train a fully connected neural network on CIFAR-10 dataset with the hope of getting a higher validation accuracy than the one you obtained using an SVM model.

Run the cell below to load the required modules.

In [28]:
## Load the required modules
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import BatchNormalization
from keras.layers import Dropout
from keras import initializers



First of all, make sure you study the neural networks [module](https://datax.berkeley.edu/wp-content/uploads/2020/09/NN-.pdf) in the course website and go through the corresponding jupyter notebook to understand how neural nets, specifically fully connected neural nets, work. To exercise your knowledge, you may either work on the neural nets homework on the course webpage, or check the [NeuralNet](https://github.com/scetx/datax/tree/master/dataxHWSp2021/HW3-4_NeuralNet/student) homework that we have recently released.

Before you start building and training your own model, as a demo, let's train a simple fully connected neural net with only one hidden layer on the CIFAR-10 dataset. 

Like the SVM model, we need to flatten the images before feeding them into the fully connected neural network. Note that this throws away the information about the 2D structure of the image. Also, since we will be using the [categorical_crossentropy](https://keras.io/api/losses/probabilistic_losses/#categoricalcrossentropy-class) as the model's loss function, we need to encode different image classes using [one-hot](https://en.wikipedia.org/wiki/One-hot) encoding. 





**4.1) (5 points)** Store the training and validation sets and their labels in the variables `x_train_4`, `y_train_4`, `x_val_4`, and `y_val_4`. Note that we have already flattened the images in the previous part. Use keras function [to_categorical](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) to encode your training and validation labels. 

<!--
BEGIN QUESTION
name: q41
manual: false
points: 5
-->

In [29]:
## Your code here

# Training set
x_train_4 = ...
y_train_4 = ...

# Validation set
x_val_4 = ...
y_val_4 = ...

In [None]:
grader.check("q41")

We start by creating a sequential model using keras [Sequential](https://keras.io/guides/sequential_model/) class. This allows us to build neural nets like legos, by adding one layer on top of the other, and swapping layers. Note that a sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Such model is not appropriate when:

*  Your model has multiple inputs or multiple outputs.
*  Any of your layers has multiple inputs or multiple outputs.
*  Your model requires layer sharing.
*  You want to model non-linear topology (e.g. a residual connection, a multi-branch model).

In [34]:
## Run this cell, no need to modify
fc_demo = Sequential()

Now, we can import layer classes and stack layers by using `fc_demo.add()`. Model needs to know what input shape it should expect. For this reason, the first layer in a sequential model needs to receive information about its input shape. 

We let the hidden layer have 1000 neurons and activate them by relu function. To know more about the syntax, consult with the [Dense layer](https://keras.io/api/layers/core_layers/dense/) API.

In [35]:
## Run this cell, no need to modify
fc_demo.add(Dense(units=1000, activation='relu', \
                 input_shape=(3072, ), name='hidden'))
fc_demo.add(Dense(units=10, activation='softmax', name='output'))

Let's review the summary of our model.

In [36]:
## Run this cell, no need to modify
fc_demo.summary() 

As we mentioned at the beginning of this homework, the images in CIFAR-10 dataset are indeed very small compared to modern photographs, yet as you can see, in a simple fully connected neural network with only one hidden layer, we have about 3 million trainable paramteres. You can imagine how this number would grow if we wanted to work with high rsolution images and have a network consisting of several hidden layers. 

Before training a model, you need to configure the learning process, which is done via the [compile](https://keras.io/api/models/model_training_apis/) method `.compile()`. `.compile` receives at least the following three arguments:

* 1) A [loss](https://keras.io/api/losses/) function - This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as `categorical_crossentropy` or `mse`), or it can be an objective function.
* 2) An [optimizer](https://keras.io/api/optimizers/) - This could be the string identifier of an existing optimizer (such as `rmsprop`, `gradientdescent`, or `adam`), or an instance of the Optimizer class.
* 3) A list of [metrics](https://keras.io/api/metrics/) (optional) -  For any classification problem you will want to set this to `metrics=['accuracy']`. A metric could be the string identifier of an existing metric or a custom metric function.





We use the `categorical_crossentropy` as our loss function, and [SGD](https://keras.io/api/optimizers/sgd/) as our optimizer. 

In [37]:
## Run this cell, no need to modify
fc_demo.compile(loss='categorical_crossentropy',
             optimizer=keras.optimizers.SGD(lr=0.001, momentum=0.9, nesterov=True),
             metrics=['accuracy'])

Now that we have created our model and configured its learning process, we need to use the [fit](https://keras.io/api/models/model_training_apis/) method `.fit()` to train the model. 

In [38]:
## Run this cell, no need to modify
fc_demo.fit(x=x_train_4, 
          y=y_train_4, 
          epochs=2, 
          validation_data=(x_val_4, y_val_4),
          batch_size=32)

_, train_acc = fc_demo.evaluate(x_train_4, y_train_4, verbose=0)
print('Training Accuracy = {0:f}'.format(train_acc))
_, val_acc = fc_demo.evaluate(x_val_4, y_val_4, verbose=0)
print('Validation Accuracy = {0:f}'.format(val_acc))

The training and validation accuracies are relatively low. Some potential reasons could be our training process (the optimizer is not tuned, or the number of epochs is small) or the simple structure of our network. 

**4.2) (35 points)** Train your own fully connected neural network that achieves a reasonable training and validation accuracy. You will receive credit according to the following scheme:

 > Full credit if **50% $<$ training accuracy** and  **50% $<$ validation accuracy**.

 > 15 points if **50% $<$ training accuracy** and **45% $<$ validation accuracy $\leq$ 50%**.

 > 0 points otherwise. 

**Make sure you follow these instructions:**

* You must use `x_train_4`, `y_train_4`, `x_val_4`, and `y_val_4` to train and validate your model. You must not modify any of them. So, you will be training your model on the whole training set as we did with our demo model. 

* A sequential model named `fc_model` is created below. As shown above, you need to add your desired layers to this model using the add method `.add()`. 

* You can only use dense layers, batchnormalization layers, and dropout layers to build your model. You may want to leverage [Batchnormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization?version=nightly) to accelerate the training process and [Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout) to get a higher validation accuracy.

 > If you are not familiar with these techniques, check the [NeuralNet](https://github.com/scetx/datax/tree/master/dataxHWSp2021/HW3-4_NeuralNet/student) homework that we have recently released.

* The way you [initialize](https://keras.io/api/layers/initializers/) the layer weights is also important in the learning process.

* You need to choose your own optimizer `fc_optimizer` as well the number of training epochs `n_epochs` and the batch size `batch_size`.

* The autograder will be timed out after 30 minutes. Adjust the running time of your code (the entire notebook) accordingly. 

The test "q42a" checks if you achieve >50% training accuracy and >45% validation accuracy.

<!--
BEGIN QUESTION
name: q42a
manual: false
points: 15
-->

In [41]:
fc_model = Sequential()

## Your code here
fc_optimizer = ...
n_epochs = ...
batch_size = ...

## Do not modify the following cells
fc_model.compile(loss='categorical_crossentropy',
             optimizer=fc_optimizer,
             metrics=['accuracy'])

history = fc_model.fit(x=x_train_4, 
          y=y_train_4, 
          epochs=n_epochs, 
          validation_data=(x_val_4, y_val_4),
          batch_size=batch_size)

fig, [ax1, ax2] = plt.subplots(2, 1, figsize=(5, 5))
# plot loss
ax1.set_title('Cross Entropy Loss')
ax1.plot(np.arange(n_epochs) + 1, history.history['loss'], color='blue', label='training')
ax1.plot(np.arange(n_epochs) + 1, history.history['val_loss'], color='orange', label='validation')
ax1.legend()
# plot accuracy
ax2.set_title('Classification Accuracy')
ax2.set_xlabel('Epoch')
ax2.plot(np.arange(n_epochs) + 1, history.history['accuracy'], color='blue', label='training')
ax2.plot(np.arange(n_epochs) + 1, history.history['val_accuracy'], color='orange', label='validation')
ax2.legend()
plt.tight_layout()
plt.show()

In [None]:
grader.check("q42a")

The test "q42b" checks if you acheive >50% training accuracy and >50% validation accuracy.

<!--
BEGIN QUESTION
name: q42b
manual: false
points: 20
-->


In [48]:
## Dummy Cell, DO NOT MODIFY

In [None]:
grader.check("q42b")

Compare the performance of `fc_model` and `svm_model`. Which one of these models would achieve a higher test accuracy? (You don't need to write any answers.) We will compare their performance on the test set in the last part of this homework. 

## 5. Convolutional Neural Networks on CIFAR-10

**(total 40 points)**

With the limitations that shallow learning methods have and the fact that fully connected neural networks are not appropriate to process images (huge number of parameters, lack of the ability to exploit the spatial information in the images, etc.), convolutional neural networks are the to-go model for most of the tasks in computer vision. In this part, you will train a convolutional neural network that achieves a relatively high validation accuracy on the CIFAR-10 dataset.

Run the following cell to load the required modules.

In [55]:
## Load the required modules 
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import AveragePooling2D

Check the convolutional neural networks [module](https://datax.berkeley.edu/wp-content/uploads/2020/09/slides-m430-convolutional-neural-networks.pdf) to understand how conv nets work. To get more familiar with this architecture and pick up some coding skills you may take a look at the two corresponding jupyter notebooks as well. 

Since conv nets use the 2D/3D structure of the image, we do not need to flatten the images before feeding them into these networks. Moreover, due to the relatively small number of trainable parameters in each filter, conv nets allow us to go much deeper (in terms of the number of layers) than the fully connected networks.

The training and validation sets and their corresponding labels are given to you below. Note that we again need to encode different image classes using [one-hot](https://en.wikipedia.org/wiki/One-hot) encoding. 
> Make sure you pass the test "q41" before running the cell below. 

In [56]:
## Run this cell, no need to modify 

# Training set
x_train_5 = x_train
y_train_5 = y_train_4

# Validation set
x_val_5 = x_val 
y_val_5 = y_val_4

As a demo, we build and train a simple conv net on our dataset. Same as the previous part, we start by creating a sequential model and then use the add method to import different layers into it and stack them on top of each other. 



In [57]:
## Run this cell, no need to modify
cnn_demo = Sequential()

Convolutional and pooling layers are the main building blocks of a conv net. We implement convolutional layers using  [Conv2D](https://keras.io/api/layers/convolution_layers/convolution2d/). 
> Make sure you understand the concepts of stride and padding. 

Pooling layers are used to subsample the input image to reduce computational load, memory usage, and number of prameters. Pooling layers also require size, stride and padding type, but unlike convolutional layers, neurons in pooling layers do not have weights. We use [MaxPooling2D](https://keras.io/api/layers/pooling_layers/max_pooling2d/) and [AveragePooling2d](https://keras.io/api/layers/pooling_layers/average_pooling2d/) to implement these layers. 

After using several blocks of convolutional and pooling layers, we get something small enough that we can flatten and feed into a standard fully connected layer. 

The architecture that we have chosen for `cnn_demo` is quite similar to LeNet-5, the architecture that Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner proposed for handwritten character recognition in 1990’s.

In [58]:
# Run this cell, no need to modify
cnn_demo.add(Conv2D(filters=6, kernel_size=(3, 3), padding='same', \
                        activation='relu', input_shape=(32,32,3))) # First layer
cnn_demo.add(AveragePooling2D()) # Second layer
cnn_demo.add(Conv2D(filters=16, kernel_size=(5, 5), \
                           activation='relu')) # Third layer
cnn_demo.add(AveragePooling2D()) # Fourth layer
cnn_demo.add(Flatten()) 
cnn_demo.add(Dense(units=100, activation='relu')) # Fifth layer
cnn_demo.add(Dense(units=10, activation = 'softmax')) # Output layer

Let's review the summary of our model.

In [59]:
cnn_demo.summary()

Note the difference between the number of trainable parameters in the network above and the simple fully connected network (`fc_demo`) we built in the previous part. Even though `fc_demo` has only one hidden layer, it has considerably more parameters than the 6-layer conv net above.  

Now, we need to configure the learning process and then train our model using the compile and fit methods, respectively. 

In [60]:
# Run this cell, no need to modify
cnn_demo.compile(loss = 'categorical_crossentropy',
             optimizer = keras.optimizers.SGD(lr=0.001, momentum = 0.9, nesterov=True),
             metrics = ['accuracy'])

cnn_demo.fit(x=x_train_5, 
          y=y_train_5, 
          epochs=2, 
          validation_data=(x_val_5, y_val_5),
          batch_size=32)

_, train_acc = cnn_demo.evaluate(x_train_5, y_train_5, verbose=0)
print('Training Accuracy = {0:f}'.format(train_acc))
_, val_acc = cnn_demo.evaluate(x_val_5, y_val_5, verbose=0)
print('Validation Accuracy = {0:f}'.format(val_acc))

Like the previous demo, the training and validation accuracies are relatively low. You need to try different architectures and fine-tune some of the hyperparameters to get outstanding results.

**5.1) (40 points)** Train your own convolutional neural network that achieves a high training and validation accuracy. You will receive credit according to the following scheme:

 > Full credit if **70% $<$ training accuracy** and  **69% $<$ validation accuracy**.

 > 15 points if **70% $<$ training accuracy** and **65% $<$ validation accuracy $\leq$ 69%**.

 > 0 points otherwise. 

**Make sure you follow these instructions:**

* You must use `x_train_5`, `y_train_5`, `x_val_5`, and `y_val_5` to train and validate your model. You must not modify any of them. So, you will be training your model on the whole training set as we did with our demo model. 

* A sequential model named `cnn_model` is created below. You need to add your desired layers to this model using the add method `.add()`. 

* You can only use the following layers: Conv2D, MaxPooling2D, AveragePooling2D, Flatten, Dense, BatchNormalization, and Dropout. 

* The way you [initialize](https://keras.io/api/layers/initializers/) the layer weights is also important in the learning process.

* You need to choose your own optimizer `cnn_optimizer` as well the number of training epochs `n_epochs` and the batch size `batch_size`.

* The autograder will be timed out after 30 minutes. Adjust the running time of your code (the entire notebook) accordingly. 

The test "q51a" checks if you acheive >70% training accuracy and >65% validation accuracy.

<!--
BEGIN QUESTION
name: q51a
manual: false
points: 15
-->

In [69]:
cnn_model = Sequential()

## Your code here
cnn_optimizer = ...
n_epochs = ...
batch_size = ...

## Do not modify the following cells
cnn_model.compile(loss='categorical_crossentropy',
             optimizer=cnn_optimizer,
             metrics=['accuracy'])

history = cnn_model.fit(x=x_train_5, 
          y=y_train_5, 
          epochs=n_epochs, 
          validation_data=(x_val_5, y_val_5),
          batch_size=batch_size)

fig, [ax1, ax2] = plt.subplots(2, 1, figsize=(5, 5))
# plot loss
ax1.set_title('Cross Entropy Loss')
ax1.plot(np.arange(n_epochs) + 1, history.history['loss'], color='blue', label='training')
ax1.plot(np.arange(n_epochs) + 1, history.history['val_loss'], color='orange', label='validation')
ax1.legend()
# plot accuracy
ax2.set_title('Classification Accuracy')
ax2.set_xlabel('Epoch')
ax2.plot(np.arange(n_epochs) + 1, history.history['accuracy'], color='blue', label='training')
ax2.plot(np.arange(n_epochs) + 1, history.history['val_accuracy'], color='orange', label='validation')
ax2.legend()
plt.tight_layout()
plt.show()

In [None]:
grader.check("q51a")

The test "q51b" checks if you acheive >70% training accuracy and >69% validation accuracy.

<!--
BEGIN QUESTION
name: q51b
manual: false
points: 25
-->


In [76]:
## Dummy Cell, DO NOT MODIFY

In [None]:
grader.check("q51b")

## 6. Test Accuracy

At the beginning of this homework, we put a test set aside as we would like to test our models' performance on a totally unseen dataset. As we fine-tune the hyper paramteres in our models and change their structure to achieve a higher validation accuracy, the models gain access to some information from the validation set, and therefore it should not be interpreted as an unseen dataset at all.

Run the cell below to find out the accuracy of the different models you trained above on the test set.

In [83]:
## Run this cell, no need to modify
svm_test_acc = np.mean(svm_model.predict(pca.transform(x_test.reshape((x_test.shape[0], -1)))) == y_test)
print('Test accuracy of the SVM model= {0:f}'.format(svm_test_acc))

fc_test_acc = fc_model.evaluate(x_test.reshape((x_test.shape[0], -1)), to_categorical(y_test), verbose=0)[1]
print('Test accuracy of the fully connected neural network= {0:f}'.format(fc_test_acc))

cnn_test_acc = cnn_model.evaluate(x_test, to_categorical(y_test), verbose=0)[1]
print('Test accuracy of the convolutional neural network= {0:f}'.format(cnn_test_acc))


<font color='Blue'> *Important: Note that the models you initialize and train are perhaps not deterministic, unless you set all the random states (for the layers and also the optimizers). Consequently, it could be possible that you pass the tests in one run, but fail some of them in another run. The same goes with the autograder, that is you might need to submit a few times till you get all the points. So plan in advance and don't wait till the last minute to submit your homework.* 

# Submit
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output.
**Please save before submitting!**

In [None]:
# Save your notebook first, then run this cell to create a pdf for your reference.