#**Implementing CNN architecture**
<font color='grey' size='1.5'> Created by Parisa Hosseinzadeh for *Machine learning for proteins*, Spring 2022. 
This notebook is adapted from [Victor Zhou](https://victorzhou.com/blog/keras-cnn-tutorial/), [Jason Brownlee](https://machinelearningmastery.com/how-to-visualize-filters-and-feature-maps-in-convolutional-neural-networks/), and [Rohit Thakur](https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-a833c686ae6c).

In today's in-class activity, we will be building simple CNN modules for image recognition and will build our way up to more difficult examples.

## 1. Simple CNN for handwritten digit recognition

In the first part of this exercise, we will be building a very simple CNN model to perform predictions on MNIST dataset.

A bit about MNIST dataset from [MNIST wikipedia page](https://en.wikipedia.org/wiki/MNIST_database):

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. 

<img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png?format=250w" >

### Step 1. Loading and preparing MNIST dataset

In [None]:
pip install tensorflow numpy mnist

In [None]:
import numpy as np
import mnist
from tensorflow import keras

train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5

# Reshape the images.
train_images = np.expand_dims(train_images, axis=3)
test_images = np.expand_dims(test_images, axis=3)

print(train_images.shape) 
print(test_images.shape)  

#### Q1. Size and shape of data

Based on the numbers above, answer question 1:

1. How many training data you have? 
2. How many test data you have?
3. What is the shape of data? (pixels/colors)







### Step 2. Building the model

Similar to a fully connected ANN, a CNN is also a *sequential* model. 

We will be using **Conv2D** and **MaxPooling2D** from keras to build this model and we add at the end a **Dense** layer for prediction. See the architecture below:

<img src="https://victorzhou.com/media/cnn-post/cnn-dims-3.svg">



Let's give it a try.

#### Importing necessary modules

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.models import Sequential

#### Setting up parameters

In [None]:
num_filters = 8 # we choose to use 8 filters to check 8 features
filter_size = 3 # our filters are 3x3
pool_size = 2 #our maxpool is 2x2

#### Building the model

In [None]:
# define the keras sequential model
model = Sequential()
# Build the convolution layer
# The input dimension is the number of your features.
model.add(
    Conv2D(
        num_filters, # number of filters
        filter_size, # size of filter
        input_shape=(28, 28, 1), #input image size, printed above
        strides=1,
        padding='valid',
        activation='relu',
    )
)
# add the maxpooling layer
model.add(
    MaxPooling2D(pool_size=pool_size)
)
# flatten (you need to flatten so that 
# everything is a vector to feed into dense layer)
model.add(Flatten())
# Add a dense layer for prediction
model.add(
    Dense(
         , # number of neurons
        activation= #activation
    )
)

#### Q2. Output layer

Based on the dataset (and the image above), what should be the number of neurons for your final layer? What activation function will you pick?

Let's take a look at your model:

In [None]:
!pip3 install keras-visualizer

In [None]:
from keras_visualizer import visualizer 
from IPython.display import Image

visualizer(model, format='png', view=True)
Image('graph.png')

#### Q3. Upload model photo

This model photo is saved to your drive. Upload it to your question.

### Step 3. Compiling your model

Let's now compile the model. Try running the code yourself using what we used before for [ANNs](https://colab.research.google.com/drive/1Co5QuNEVsSIccx-dL6fGh2OTYok9fXYT?usp=sharing). Note that because we have more than one category, the loss will be **categorical_crossentropy** instead of binary_crossentropy.

In [None]:
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

### Step 4. Training

Now let's train our model.

In [None]:
from tensorflow.keras.utils import to_categorical

# let's check some of our labels

print(train_labels[0:5])

If you remember, the categories need to be one-hot encoded. Let's try to do that and see how they look like.

In [None]:
print(to_categorical(train_labels[0:5]))

In [None]:
model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=3,
  validation_data=(test_images, to_categorical(test_labels)),
)

### Step 5. Evaluation

Let's evaluate the model's performance on your test set.

In [None]:
# Predict on the first 5 test images.
predictions = model.predict(test_images[:5])

# Print our model's predictions.
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]
# Check our predictions against the ground truths.
print(test_labels[:5]) # [7, 2, 1, 0, 4]

#### Q4. Accuracy on test

What is accuracy on test model? You can check how to calcualte it from [ANN code](https://colab.research.google.com/drive/1Co5QuNEVsSIccx-dL6fGh2OTYok9fXYT?usp=sharing).

In [None]:
# your code

In [None]:
#@markdown Sample solution

_, accuracy = model.evaluate(test_images, to_categorical(test_labels))
print('Accuracy: %.2f' % (accuracy*100))

#### Q5. Changing number of filters

In the code above, try to change number of filters from 8 to something else. Maybe 16. What do you observe in terms of performance (speed and accurcay)?

### Q6. Adding more layers

Try adding another convolutional layer and re-running your model. How does the accuracy change?

In [None]:
# your code

In [None]:
#@markdown Sample solution

# define the keras sequential model
model = Sequential()
# Build the convolution layer
# The input dimension is the number of your features.
model.add(
    Conv2D(
        num_filters, # number of filters
        filter_size, # size of filter
        input_shape=(28, 28, 1), #input image size, printed above
        strides=1,
        padding='valid',
        activation='relu',
    )
)
#add second convolution
model.add(
    Conv2D(
        num_filters, # number of filters
        filter_size, # size of filter
        input_shape=(28, 28, 1), #input image size, printed above
        strides=1,
        padding='valid',
        activation='relu',
    )
)
# add the maxpooling layer
model.add(
    MaxPooling2D(pool_size=pool_size)
)
# flatten (you need to flatten so that 
# everything is a vector to feed into dense layer)
model.add(Flatten())
# Add a dense layer for prediction
model.add(
    Dense(
        10 , # number of neurons
        activation='sigmoid' #activation
    )
)

model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=3,
  validation_data=(test_images, to_categorical(test_labels)),
)

_, accuracy = model.evaluate(test_images, to_categorical(test_labels))
print('Accuracy: %.2f' % (accuracy*100))

### Step 6. Check filters and feature maps

Now let's take a look at what our filters are learning and how are feature maps look like.

In [None]:
# summarize filter shapes
for layer in model.layers:
	# check for convolutional layer
	if 'conv' not in layer.name:
		continue
	# get filter weights
	filters, biases = layer.get_weights()
	print(layer.name, filters.shape)

In [None]:
# retrieve weights from the second hidden layer
filters, biases = model.layers[0].get_weights()

# normalize filter values to 0-1 so we can visualize them
f_min, f_max = filters.min(), filters.max()
filters = (filters - f_min) / (f_max - f_min)

#let's see how many filters we have
print('number of filters is {}'.format(len(filters)))

# let's take a look at one of our convolution filters
# These are the kernels you manually set in in-class activity
# last lecture
print(filters[0])

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

# plot first few filters
n_filters, ix = 3, 1
for i in range(n_filters):
	# get the filter
	f = filters[:, :, :, i]
	# plot each channel separately
	# specify subplot and turn of axis
	ax = plt.subplot(n_filters, 3, ix)
	ax.set_xticks([])
	ax.set_yticks([])
	# plot filter channel in grayscale
	plt.imshow(f[:, :, 0], cmap='gray')
	ix += 1
# show the figure
plt.show()

In [None]:
from keras.models import Model
# redefine model to output right after the first hidden layer
model = Model(inputs=model.inputs, outputs=model.layers[0].output)

In [None]:
# get feature map for first hidden layer
feature_maps = model.predict(test_images[:1])

In [None]:
# plot all 64 maps in an 8x8 squares
square1 = 4
square2 = 2
ix = 1
for _ in range(square1):
	for _ in range(square2):
		# specify subplot and turn of axis
		ax = plt.subplot(square1, square2, ix)
		ax.set_xticks([])
		ax.set_yticks([])
		# plot filter channel in grayscale
		plt.imshow(feature_maps[0, :, :, ix-1], cmap='gray')
		ix += 1
# show the figure
plt.show()

#### Q7. Feature maps and filters

What did you notice after running the cells above?

## 2. VGG16

Now let's take it a step further and implement VGG16. If you remember from lecture, VGG looks like this:

<img src="https://miro.medium.com/max/940/1*3-TqqkRQ4rWLOMX-gvkYwA.png">

We will be using keras to implement a smaller version of VGG to run on MNISt data.

### Step 1. Setting up for the run

Let's import all necessary modules.

In [None]:
import keras,os
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPool2D , Flatten
from keras.preprocessing.image import ImageDataGenerator
import numpy as np

### Step 2. Building the model

Below is the code that builds a VGG-like model for our datasets.

In [None]:
model = Sequential()
model.add(
    Conv2D(
        input_shape=(28,28,1),
        filters=64,
        kernel_size=(3,3),
        padding="same", 
        activation="relu"
    )
)

model.add(
    MaxPool2D(
        pool_size=(2,2),
        strides=(2,2)
    )
)

model.add(
    Conv2D(
        filters=128, 
        kernel_size=(3,3), 
        padding="same", 
        activation="relu"
    )
)


model.add(
    MaxPool2D(
        pool_size=(2,2),
        strides=(2,2)
    )
)

model.add(
    Conv2D(
        filters=256, 
        kernel_size=(3,3), 
        padding="same", 
        activation="relu"
    )
)

model.add(
    Conv2D(
        filters=256, 
        kernel_size=(3,3), 
        padding="same", 
        activation="relu"
    )
)

model.add(
    MaxPool2D(
        pool_size=(2,2),
        strides=(2,2)
    )
)

model.add(
    MaxPool2D(
        pool_size=(3,3)
    )
)

model.add(Flatten())
model.add(Dense(units=512,activation="relu"))
model.add(Dense(units=512,activation="relu"))
model.add(Dense(units=10, activation="softmax"))

#### Q8. Size of the model

Based on the model above and our input data, can we add more CNN + pooling layers?

In [None]:
model.compile(optimizer='adam', loss=keras.losses.categorical_crossentropy, metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
from keras_visualizer import visualizer 
from IPython.display import Image

visualizer(model, format='png', view=True)
Image('graph.png')

In [None]:
model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=3,
  validation_data=(test_images, to_categorical(test_labels)),
)

#### Q9. Accuracy

What is the accuracy of your VGG model? How do you compare the overall performance (accuracy/speed) of VGG and your simpler CNN?

In [None]:
# your code

In [None]:
#@markdown sample solution

_, accuracy = model.evaluate(test_images, to_categorical(test_labels))
print('Accuracy: %.2f' % (accuracy*100))

## Optional ResNet

Try to see if you can build ResNet. Check [this blog](https://machinelearningknowledge.ai/keras-implementation-of-resnet-50-architecture-from-scratch/) for step-by-step of code addition. 

For such well-known models, there is an easier way too. Check [this blog](https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33) for a guide to an easy solution. As you can see, for pretrained models, you can easily use pre-built models.