# Exercise Sheet 6

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import decomposition
import tensorflow as tf
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import TensorBoard

## 2 AlexNet

Again, the aim of this exercise is to build a network. In this exercise you should implement the network which is discussed in: [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (Krizhevsky, Sutskever, Hinton). The network architecture is summarised in Figure 2 of that paper and more detailed descriptions are found in the text.  
You only need to implement the architecture and check that your network is consistent. Note, that you can check your results by  
a) checking your model is compiling in Keras and  
b) by comparing your `model.summary()` with the desired dimensions.

### Solution

The parallel structure of AlexNet is due the fact that it was designed to run on two GPUs. If you don't have access to a GPU, you can use [Google Colab](https://colab.research.google.com). There you can choose to run on a backend with a GPU or TPU. In the Menu, click "Runtime" and then "Change runtime type".  
Unfortunately we only have one GPU available there.

##### Simplified version with Batch Normalization

In [2]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='same')(inputs)
batchnorm1_up = BatchNormalization(axis=-1)(conv1_up)
act1_up = Activation('relu')(batchnorm1_up)

conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='same')(inputs)
batchnorm1_down = BatchNormalization(axis=-1)(conv1_down)
act1_down = Activation('relu')(batchnorm1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(act1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(act1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_up)
batchnorm2_up = BatchNormalization(axis=-1)(conv2_up)
act2_up = Activation('relu')(batchnorm2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_down)
batchnorm2_down = BatchNormalization(axis=-1)(conv2_down)
act2_down = Activation('relu')(batchnorm2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(act2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(act2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 56, 56, 48)   17472       input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 56, 56, 48)   17472       input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 56, 56, 48)   192         conv2d[0][0]                     
______________________________________________________________________________________________

As you can see, the output dimension after the first convolution does not match with the one from the paper. I have found several references that state that the $224$ is a typo in the paper and should be $227$. I am not sure if this is true, as the authors quote the number at so many places in the paper. A quick fix that gives the right output dimensions is to change the padding to valid in the first layer and add a symmetric padding of 2. We cannot know for sure what the authors did without their pre-processed data or code.

In [3]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
pad1_up = ZeroPadding2D(2)(inputs) # fix for output dimensions
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='valid')(pad1_up)
batchnorm1_up = BatchNormalization(axis=-1)(conv1_up)
act1_up = Activation('relu')(batchnorm1_up)

pad1_down = ZeroPadding2D(2)(inputs) # fix for output dimensions
conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='valid')(pad1_down)
batchnorm1_down = BatchNormalization(axis=-1)(conv1_down)
act1_down = Activation('relu')(batchnorm1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(act1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(act1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_up)
batchnorm2_up = BatchNormalization(axis=-1)(conv2_up)
act2_up = Activation('relu')(batchnorm2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same')(maxpool1_down)
batchnorm2_down = BatchNormalization(axis=-1)(conv2_down)
act2_down = Activation('relu')(batchnorm2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(act2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(act2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 228, 228, 3)  0           input_2[0][0]                    
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 228, 228, 3)  0           input_2[0][0]                    
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 55, 55, 48)   17472       zero_padding2d[0][0]             
____________________________________________________________________________________________

##### Version with Local Response Normalization (LRN)

First we have to implement the LRN (see e.g. [here](https://resources.oreilly.com/examples/9781787128422/blob/0e1be827d0179cc535da74957866ed87a4ea0224/DeepLearningwithKeras_Code/Chapter07/tf-keras-func.py))

In [4]:
from tensorflow.python.keras.layers import Layer, InputSpec

class LocalResponseNormalization(Layer):

    def __init__(self, n=5, alpha=0.0001, beta=0.75, k=2, **kwargs):
        self.n = n
        self.alpha = alpha
        self.beta = beta
        self.k = k
        super(LocalResponseNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.shape = input_shape
        super(LocalResponseNormalization, self).build(input_shape)

    def call(self, x, mask=None):
        if K.image_data_format() == "th":
            _, f, r, c = self.shape
        else:
            _, r, c, f = self.shape
        squared = K.square(x)
        pooled = K.pool2d(squared, (self.n, self.n), strides=(1, 1),
        padding="same", pool_mode="avg")
        if K.image_data_format() == "th":
            summed = K.sum(pooled, axis=1, keepdims=True)
            averaged = self.alpha * K.repeat_elements(summed, f, axis=1)
        else:
            summed = K.sum(pooled, axis=3, keepdims=True)
            averaged = self.alpha * K.repeat_elements(summed, f, axis=3)
        denom = K.pow(self.k + averaged, self.beta)
        return x / denom

    def get_output_shape_for(self, input_shape):
        return input_shape

In [5]:
# Input layer
inputs = Input(shape=(224,224,3))

# 1st convolution block
pad1_up = ZeroPadding2D(2)(inputs)
conv1_up = Conv2D(48, kernel_size=11, strides=4, padding='valid', activation='relu')(pad1_up)
LRN1_up = LocalResponseNormalization()(conv1_up)

pad1_down = ZeroPadding2D(2)(inputs)
conv1_down = Conv2D(48, kernel_size=11, strides=4, padding='valid', activation='relu')(pad1_down)
LRN1_down = LocalResponseNormalization()(conv1_down)

maxpool1_up = MaxPooling2D(pool_size=3, strides=2)(LRN1_up)
maxpool1_down = MaxPooling2D(pool_size=3, strides=2)(LRN1_down)

# 2nd convolution block
conv2_up = Conv2D(128, kernel_size=5, strides=1, padding='same', activation='relu')(maxpool1_up)
LRN2_up = LocalResponseNormalization()(conv2_up)

conv2_down = Conv2D(128, kernel_size=5, strides=1, padding='same', activation='relu')(maxpool1_down)
LRN2_down = LocalResponseNormalization()(conv2_down)

maxpool2_up = MaxPooling2D(pool_size=3, strides=2)(LRN2_up)
maxpool2_down = MaxPooling2D(pool_size=3, strides=2)(LRN2_down)

# 3rd convolution block
merge3 = concatenate([maxpool2_up, maxpool2_down])
conv3_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)
conv3_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(merge3)

# 4th convolution block
conv4_up = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_up)
conv4_down = Conv2D(192, kernel_size=3, strides=1, padding='same', activation='relu')(conv3_down)

# 5th convolution block
conv5_up = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_up)
conv5_down = Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(conv4_down)
maxpool5_up = MaxPooling2D(pool_size=3, strides=2)(conv5_up)
maxpool5_down = MaxPooling2D(pool_size=3, strides=2)(conv5_down)

# Dense Layers 1st block (use dropout)
merge_dense1 = concatenate([maxpool5_up, maxpool5_down])
flatten_dense1 = Flatten()(merge_dense1)
dense1_up = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_up = Dropout(rate=0.5)(dense1_up)
dense1_down = Dense(2048, activation='relu')(flatten_dense1)
dense1_dropout_down = Dropout(rate=0.5)(dense1_down)

# Dense Layers 2nd block (use dropout)
merge_dense2 = concatenate([dense1_dropout_up, dense1_dropout_down])
flatten_dense2 = Flatten()(merge_dense2)
dense2_up = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_up = Dropout(rate=0.5)(dense2_up)
dense2_down = Dense(2048, activation='relu')(flatten_dense2)
dense2_dropout_down = Dropout(rate=0.5)(dense2_down)

# Softmax
merge_dense3 = concatenate([dense2_dropout_up, dense2_dropout_down])
flatten_dense3 = Flatten()(merge_dense3)
output = Dense(1000, activation='softmax')(flatten_dense3)

# Model
alex_net = Model(inputs=inputs, outputs=output)

# summarize layers
alex_net.compile(loss="binary_crossentropy", optimizer='adam')
print(alex_net.summary())

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 3, expecting 4
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 3, expecting 4
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 3, expecting 4
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 3, expecting 4
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full 

Total params: 60,965,224
Trainable params: 60,965,224
Non-trainable params: 0
__________________________________________________________________________________________________
None


In [6]:
# get graph
graph = K.get_session().graph

# write to files
tb_path = "logs_alexnet/"
writer = tf.summary.FileWriter(logdir=tb_path, graph=graph)
K.clear_session()

AttributeError: module 'tensorflow_core.keras.backend' has no attribute 'get_session'

In [None]:
# run tensorboard in shell -> interrupt kernel to stop
# you can click the link to see the graph of the network we built
!tensorboard --logdir=logs_alexnet

If the above link does not work for some reason, you should find tensorboard at [localhost:6006](http://localhost:6006)