A typical convolutional neural network architecture
<img src="https://cdn-images-1.medium.com/max/1600/1*NQQiyYqJJj4PSYAeWvxutg.png"></img>
[image source](https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050)

In this notebook we'll review basic considerations for building any neural network, build a simple convolutional neural network, visualize its weights, and compare the learned 'representation' with that of a tradition non-convolutional neural network, aka a fully connected (FC) network / multilayer perceptron (MLP)

Our models might need a bit more computational power this week, so be sure to change your Google colab runtime type by clicking `Runtime`, `Change runtime type`, and then selecting `GPU`. We'll see if Google let's us all get free GPUs!

<br><br><br>
Python / notebook cheatsheet
* `Shift+Enter` to execute a given cell and move to / make a new next cell
* `#` is for single line comments
* each cell in this notebook can be either of type `code` (the cell below) or type `markdown` (this cell)
* tabs (indents) are required for any flow control (`for`, `if..else`, `while` etc)
* `=` is for assignment
* `()` is to call a function or method
* `[]` is for indexing into a variable
* a quick way to post-hoc to put parenthesis or brackets around something, just select that something and type `(` or `[`

In [None]:
# keras-specific modules we need
from keras.callbacks import LambdaCallback
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Dropout, Input, Conv2D, Flatten, MaxPooling2D, Conv3D, Reshape, MaxPooling3D
from keras.datasets import mnist, fashion_mnist, cifar10
from keras.utils import to_categorical
from keras.initializers import RandomUniform, Zeros, Ones, glorot_uniform
from keras.regularizers import l1,l2

In [None]:
# more packages / functions that we'll need
from IPython.core.pylabtools import figsize
from IPython.display import HTML
import matplotlib.pyplot as plt
from matplotlib.animation import ArtistAnimation
import numpy as np

Let's first load in our sample data

In [None]:
# built-in keras datasets: https://keras.io/datasets/
# load the data
(x_train,y_train),(x_test,y_test) = mnist.load_data()
#(x_train,y_train),(x_test,y_test) = fashion_mnist.load_data()
#(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# one-hot encode labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# NOTE: we no longer need to linearize the data as we had with a non-convolutional network!
# add a dummy 4th dim for channel
if len(x_train.shape)<4: 
    x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],x_train.shape[2],1))
    x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],x_test.shape[2],1))    

In [None]:
# check data dimensions: n samples, by n pix, by n pix, by n channels
x_train.shape

In [None]:
# visualize sample images from input dataset
figsize(5,5)
for a in range(0,36):
    plt.subplot(6,6,a+1)
    plt.imshow(np.squeeze(x_train[a,:,:,:]),cmap='gray')
    plt.axis('off')  

In [None]:
# rescale and normalize
x_train = (x_train/255.-0.5)*2
x_test = (x_test/255.-0.5)*2

# get subsets of the data for faster model training -- for example purposes only
x_train_subset = x_train[:6000,:,:,:]
y_train_subset = y_train[:6000]
x_test_subset = x_test[:1000,:,:,:]
y_test_subset = y_test[:1000]

Now we'll define our convolutional network. We will explicitly define parameters as much as we can, but do note that most of these are just the defaults. E.g. see the [keras documentation](https://keras.io/layers/about-keras-layers/)

Let's make a convolutional neural network with the following
* Two convolutional layers with `5x5` filters followed by maxpooling for each
* One fully connected layer
* One output classification layer

In [None]:
def makeCNN(): # a function we can call to build our model
    myCNN = Sequential() # instantiate a model
    ## add your layers here

    ## end add your layers
    myCNN.compile(loss='categorical_crossentropy', metrics=['categorical_accuracy'],optimizer='SGD')
    return myCNN

Some notes from last month's exercise on data processing and model-building steps
* load data
    * vectorize data
    * rescale data: not entirely necessary; but do need conversion to `float` from `int`
    * one-hot encode the data labels
* build the model
    * add a layer with n units: you choose!
    * initialize weights: can't do 0 weights! can't do the same weights!
    * specify activation function: some work better than others, given gradient descent
    * (not shown: add regularization via dropout `Dropout(..)` or L1/L2, `kernel_regularizer=l1(0.001)`)
    * repeat for next layers
    * add a final layer of 10 units to get predictions
* compile the model
    * specify loss function: https://keras.io/losses/
    * specify how to apply the loss function: learning rates, learning rate decay, momentum
    * specify evaluation metric(s): model doesn't care about these metrics!
* implementation notes
    * every time you run `fit()`, model will continue from where it left off

In [None]:
myCNN = makeCNN() # make our model
myCNN.summary() # show model summary

# callback functions to store weights from our convolutional layers after each epoch
layer1weights = [] # empty list to hold our weights
layer2weights = []
get_weights1 = LambdaCallback(on_epoch_end=lambda epoch,logs: layer1weights.append(myCNN.layers[0].get_weights())) 
get_weights2 = LambdaCallback(on_epoch_end=lambda epoch,logs: layer2weights.append(myCNN.layers[2].get_weights()))

Note that the number of parameters for the second convolutional layer is not `5*5*2+2`, rather it's `5*5*32*2+2` because you have a 32-channel 'image' from the previous layer for which we need to learn a kernel for each channel (you could also think of this as having `(5,5,32)` as our effective filter size). [See also visualization here](https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1)

In [None]:
# now train the model: note you'll need to run the cell above if you don't want to start from where you left off
history = myCNN.fit(x=x_train_subset,y=y_train_subset,batch_size=64,epochs=30,validation_data=(x_test_subset,y_test_subset), callbacks=[get_weights1,get_weights2])

Let's look at our model's performance over time with training

In [None]:
figsize(12,4)
plt.subplot(1,2,1)
plt.plot(history.history['loss'])
plt.title('Loss per epoch')
plt.xlabel('Training epoch')
plt.ylabel('categorical crossentropy loss')

plt.subplot(1,2,2)
plt.plot(history.history['categorical_accuracy'])
plt.title('10-way classification accuracy per epoch')
plt.xlabel('Training epoch')
plt.ylabel('categorical accuracy')
''''''

Now let's visualize the learned weights. Per the lecture, we expect the network has learned a set of edge-detecting filters.

In [None]:
# first let's see what weights there are
modelweights = myCNN.get_weights()
modellayers = ['Conv1','C1bias','Conv2','C2bias','FC1','FC1bias']
for layer,item in zip(modellayers,modelweights):
    print(str(layer) + ' weights shape: \t' + str(item.shape))

In [None]:
# plot out first convolutional layer's learned filters
modelweights = myCNN.get_weights() # get model weights
conv1weights = modelweights[0] # get first conv layer's weights

figsize(8, 4)
Counter = 0
plt.suptitle('1st convolutional layer filters')
for a in range(0,4):
    for b in range(0,8):
        plt.subplot(4,8,Counter+1)
        plt.imshow(np.squeeze(conv1weights[:,:,0,Counter]))
        plt.axis('off')
        Counter += 1

We can also visualize how the filters change after each training epoch

In [None]:
figsize(8, 4)
fig, ax = plt.subplots(4,8)
fig.suptitle('1st convolutional layer filters by epoch')
weightplots = []
for c in range(0,len(layer1weights)):
    currweights = layer1weights[c][0] # get weights from current epoch
    Counter = 0
    weightsubplots = []
    for a in range(0,4):
        for b in range(0,8):
            ax[a,b].axis('off')
            img = ax[a,b].imshow(np.squeeze(currweights[:,:,0,Counter]))
            Counter += 1
            weightsubplots.append(img) # append to list of subplots
    weightplots.append(weightsubplots) # append to list of figure frames

anim = ArtistAnimation(fig, weightplots, interval=200) # animate with prerendered plots in `weightplots`
plt.close() # close the actual plotted figure
HTML(anim.to_jshtml()) # ...and instead display a jshtml animation

What do learned filters look like for the 2nd convolutional layer?

In [None]:
# sample some random filters from the second convolutional layer
modelweights = myCNN.get_weights() # get model weights
conv2weights = modelweights[2] # get second conv layer's weights
conv2weights = np.reshape(conv2weights,(conv2weights.shape[0],conv2weights.shape[1],conv2weights.shape[2]*conv2weights.shape[3])) # reshape to put all weights into a single dimension

figsize(8, 8)
Counter = 0
shuffled_indices = np.arange(conv2weights.shape[2]) # all indices
np.random.shuffle(shuffled_indices) # ..shuffled
plt.suptitle('Random 2nd convolutional layer filters')
for a in range(0,8):
    for b in range(0,8):
        plt.subplot(8,8,Counter+1)
        plt.imshow(np.squeeze(conv2weights[:,:,shuffled_indices[Counter]]))
        plt.axis('off')
        Counter += 1

Hmm, that's strange! It just looks like noise? Why is this the case? (go back and examine our network, modify it, and re-run the code)

After re-running the code with new parameters, let's now see how the 2nd convolutional layer weights evolve with training

In [None]:
figsize(8, 8)
fig, ax = plt.subplots(8,8)
fig.suptitle('2nd convolutional layer filters by epoch (first 64)')
weightplots = []
for c in range(0,len(layer2weights)):
    currweights = layer2weights[c][0] # get weights from current epoch
    currweights = np.reshape(currweights,(5,5,1,64))
    Counter = 0
    weightsubplots = []
    for a in range(0,8):
        for b in range(0,8):
            ax[a,b].axis('off')
            img = ax[a,b].imshow(np.squeeze(currweights[:,:,0,Counter]))
            Counter += 1
            weightsubplots.append(img) # append to list of subplots
    weightplots.append(weightsubplots) # append to list of figure frames

anim = ArtistAnimation(fig, weightplots, interval=200) # animate with prerendered plots in `weightplots`
plt.close() # close the actual plotted figure
HTML(anim.to_jshtml()) # ...and instead display a jshtml animation

Note how the network is 'forced' to learn higher order edge features (keep in mind that we're still doing a bit of qualitative hand-waving here)

In addition to visualizing the learned filters, we can also visualize how a given input image is transformed at different stages of the network, much like we did in the lecture

In [None]:
# use the keras functional api (as opposed to Sequential model that we used before)
# make a new model that outputs at a layer of named choice
layer_name = 'Conv1'

# note how we reuse our previous model, and thus don't need to train this new model
middle_model = Model(inputs=myCNN.input,outputs=myCNN.get_layer(layer_name).output)

# get a prediction from the new model
layeroutput = middle_model.predict(x_test[:2,:,:,:]) 
print(layeroutput.shape)

In [None]:
# first let's see the example we pushed through the network
plt.imshow(np.squeeze(x_test[0,:,:,:]))

In [None]:
# plot sample transformed 'images' from the layer
figsize(8,4)
plt.suptitle('Data representation at layer: ' + layer_name)
for a in range(32):
    plt.subplot(4,8,a+1)
    plt.imshow(np.squeeze(layeroutput[0,:,:,a]))
    plt.axis('off')

Finally, as a point of comparison, here's (a slightly modified version of) the simple MLP we made during last month's coding exercise

In [None]:
# get a linearize version of the data
x_train_subset_lin = np.reshape(x_train_subset,(x_train_subset.shape[0],x_train_subset.shape[1]*x_train_subset.shape[2]*x_train_subset.shape[3]))
x_test_subset_lin = np.reshape(x_test_subset,(x_test_subset.shape[0],x_test_subset.shape[1]*x_test_subset.shape[2]*x_test_subset.shape[3]))

In [None]:
def makeNN(): # a function we can call to build our model
    myNN = Sequential() # instantiate a model
    myNN.add(Dense(64,activation='relu',input_shape=(x_train_subset_lin.shape[1],)))
    myNN.add(Dense(32,activation='relu'))
    myNN.add(Dense(10,activation='softmax'))
    myNN.compile(loss='categorical_crossentropy', metrics=['categorical_accuracy'],optimizer='Adam')
    return myNN

In [None]:
# make our model
myNN = makeNN() 

# train our model
myNN.fit(x=x_train_subset_lin,y=y_train_subset,batch_size=64,epochs=10,validation_data=(x_test_subset_lin,y_test_subset))

In [None]:
myNN.summary()

Much as we did with the convolutional neural network, we can visualize the learned weights

In [None]:
# plot out first layer's learned weights
modelweights = myNN.get_weights() # get model weights
layerweights = modelweights[0] # get first layer's weights

figsize(8, 8)
Counter = 0
plt.suptitle('1st layer filters')
for a in range(0,8):
    for b in range(0,8):
        plt.subplot(8,8,Counter+1)
        plt.imshow(np.reshape(layerweights[:,Counter],(x_train.shape[1],x_train.shape[2])))
        plt.axis('off')
        Counter += 1

In [None]:
# plot out second layer's learned weights
modelweights = myNN.get_weights() # get model weights
layerweights = modelweights[2] # get second layer's weights

figsize(8, 4)
Counter = 0
plt.suptitle('2nd layer filters')
for a in range(0,4):
    for b in range(0,8):
        plt.subplot(4,8,Counter+1)
        plt.imshow(np.reshape(layerweights[:,Counter],(int(layerweights.shape[0]**0.5),int(layerweights.shape[0]**0.5))))
        plt.axis('off')
        Counter += 1

Take-home messages
* Simple 'feed-foward' networks such as MLP's and convolutional neural networks are easy to code up
    * we'll find that these days, even more complex architectures such as recurrent neural networks are almost as easy to code up with modern tools
* There are simple methods to visualize what the network has learned
    * via observing weights or intermediate outputs, as done here
    * or via other methods: [deconvolutional networks](https://arxiv.org/abs/1311.2901), [compute input that maximally activates a layer](https://arxiv.org/abs/1312.6034), [saliency maps](https://arxiv.org/abs/1810.03292), etc.
* Neural networks, and machine learning models in general, will learn whatever weights they need in order to solve the problem (i.e. minimize the loss function). And so there are no guarantees as to the structure of what they learn without imposing constraints
    * but then by imposing significant constraints (via loss functions, hyperparameters, architectures, etc), you limit how the network can learn and we get back to the 'hand-crafting' approaches of yore
    * nonetheless, this comes back to the concept from our lecture, i.e. choosing the 'right' prior
    * a convolutional neural network can be thought of as a network with constraints or priors built-in, and which (favorably) constrain how the network can learn
* There are canonical datasets that can be easily dropped into your given model for rapid testing, prototyping, and benchmarking, e.g. https://keras.io/datasets/
    * uncomment the lines for the other two datasets at the top and see how your model performs!
* Further still, there are pre-trained networks that can be easily loaded and used for, e.g. transfer learning: https://keras.io/applications/

Cheatsheet below to complete the coding exercise
```
    # change filters of 2nd conv layer to 2, and run for 50, not 30 epochs; or run for 10
    # epochs with full training set
    
    myCNN.add(Conv2D(filters=32,kernel_size=(5,5),strides=(1,1),activation='relu', 
                     padding='valid',kernel_initializer='random_uniform',name='Conv1',
                     input_shape=(x_train.shape[1],x_train.shape[2],x_train.shape[3])))
    myCNN.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))
    myCNN.add(Conv2D(filters=64,kernel_size=(5,5),strides=(1,1),activation='relu',name='Conv2',
                     padding='valid',kernel_initializer='random_uniform'))
    myCNN.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))
    myCNN.add(Flatten())
    myCNN.add(Dense(100,activation='relu',kernel_initializer='random_uniform'))
    myCNN.add(Dense(10,activation='sigmoid',kernel_initializer='random_uniform'))
```