<a href="https://colab.research.google.com/github/daysm/id2223/blob/master/lab2/lab2-1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Going Deeper with convolutions

Before **Inception v1** (**GoogLeNet**), which is the winner of the **ILSVRC** (ImageNet Large Scale Visual Recognition Competition) in 2014, most popular CNNs just stacked convolution layers deeper and deeper, hoping to get better performance.
The Inception network, however, uses a lot of tricks to improve performance in terms of speed and accuracy.
Compared to other networks, **Inception v1** has significant improvement over **ZFNet** (the winner in 2013) and **AlexNet** (the winner in 2012), and has relatively lower error rate compared with the VGGNet.

In this task, we will be implementing the inception architecture [in this paper](https://arxiv.org/abs/1409.4842) with TensorFlow/Keras. 

The goal of this task is to understand how to write code to build the model, as long as you can verify the correctness of the code (e.g., through Keras model summary), it is not necessary to train the model.

## Set up environment

In [0]:
!pip install wandb -q

In [2]:
import wandb

wandb.login()

True

In [3]:
wandb.init(entity = "daysm", project = "id2223-lab2-1")

W&B Run: https://app.wandb.ai/daysm/id2223-lab2-1/runs/ar7wgoqq

In [4]:
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

TensorFlow 2.x selected.


## The inception architecture

### The inception module

![](https://images.deepai.org/django-summernote/2019-06-18/2cec735b-2347-4ded-ae2b-e8a8384f7b46.png)

In [0]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Dense, Input, concatenate, GlobalAveragePooling2D, AveragePooling2D, Flatten

In [0]:
def inception_module(x,
            filters_1x1,
            filters_3x3_reduce,
            filters_3x3,
            filters_5x5_reduce,
            filters_5x5,
            filters_pool_proj,
            kernel_initializer=tf.keras.initializers.glorot_uniform(),
            bias_initializer=tf.keras.initializers.Constant(value=0.2),
            name=None):
    
    conv_1x1 = Conv2D(filters_1x1, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, name=name+"_conv_1x1")(x)
    
    conv_3x3_reduce = Conv2D(filters_3x3_reduce, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, name=name+"_conv_3x3_reduce")(x)
    conv_3x3 = Conv2D(filters_3x3, (3, 3), padding='same', activation='relu', kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, name=name+"_conv_3x3")(conv_3x3_reduce)

    conv_5x5_reduce = Conv2D(filters_5x5_reduce, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_init, bias_initializer=bias_initializer, name=name+"_conv_5x5_reduce")(x)
    conv_5x5 = Conv2D(filters_5x5, (5, 5), padding='same', activation='relu', kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, name=name+"_conv_5x5")(conv_5x5_reduce)

    max_pool_3x3 = MaxPool2D((3, 3), strides=(1, 1), padding='same', name=name+"_max_pool_3x3")(x)
    pool_proj = Conv2D(filters_pool_proj, (1, 1), padding='same', activation='relu', kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, name=name+"_pool_proj")(max_pool_3x3)

    output = concatenate([conv_1x1, conv_3x3, conv_5x5, pool_proj], axis=3, name=name)
    
    return output

### Stacking inception modules

To be able to train on the CIFAR10 dataset, our architecture differs slightly from the original GoogLeNet.
The last dense layer determining the output (including for the auxilliary classifiers) has 10 neurons (instead of the 1000 that are necessary for ImageNet).

![](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/10/googlenet.png)

In [0]:
kernel_init = tf.keras.initializers.glorot_uniform()
bias_init = tf.keras.initializers.Constant(value=0.2)

In [22]:
input_layer = Input(shape=(224, 224, 3), name='input')

# 1
x = Conv2D(64, (7, 7), padding='same', strides=(2, 2), activation='relu', name='1_conv_7x7/2', kernel_initializer=kernel_init, bias_initializer=bias_init)(input_layer)
x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='1_max_pool_3x3/2')(x)

# 2
x = Conv2D(64, (1, 1), padding='same', strides=(1, 1), activation='relu', name='2_conv_1x1/1')(x)
x = Conv2D(192, (3, 3), padding='same', strides=(1, 1), activation='relu', name='2_conv_3x3/1')(x)
x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='2_max_pool_3x3/2')(x)

# 3a
x = inception_module(x,
                     filters_1x1=64,
                     filters_3x3_reduce=96,
                     filters_3x3=128,
                     filters_5x5_reduce=16,
                     filters_5x5=32,
                     filters_pool_proj=32,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='3a_inception')
# 3b
x = inception_module(x,
                     filters_1x1=128,
                     filters_3x3_reduce=128,
                     filters_3x3=192,
                     filters_5x5_reduce=32,
                     filters_5x5=96,
                     filters_pool_proj=64,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='3b_inception')

x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='3_max_pool_3x3/2')(x)

# 4a
x = inception_module(x,
                     filters_1x1=192,
                     filters_3x3_reduce=96,
                     filters_3x3=208,
                     filters_5x5_reduce=16,
                     filters_5x5=48,
                     filters_pool_proj=64,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='4a_inception')

# 1st auxilliary classifier
x1 = AveragePooling2D((5, 5), strides=3, name='aux1_avg_pool_5x5')(x)
x1 = Conv2D(128, (1, 1), padding='same', activation='relu', name='aux1_conv_1x1_reduce')(x1)
x1 = Flatten(name='aux1_flatten')(x1)
x1 = Dense(1024, activation='relu', name='aux1_dense')(x1)
x1 = Dropout(0.7, name='aux1_dropout')(x1)
x1 = Dense(10, activation='softmax', name='aux1_output')(x1)

# 4b
x = inception_module(x,
                     filters_1x1=160,
                     filters_3x3_reduce=112,
                     filters_3x3=224,
                     filters_5x5_reduce=24,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='4b_inception')

#4c
x = inception_module(x,
                     filters_1x1=128,
                     filters_3x3_reduce=128,
                     filters_3x3=256,
                     filters_5x5_reduce=24,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='4c_inception')

#4d
x = inception_module(x,
                     filters_1x1=112,
                     filters_3x3_reduce=144,
                     filters_3x3=288,
                     filters_5x5_reduce=32,
                     filters_5x5=64,
                     filters_pool_proj=64,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='4d_inception')

# 2nd auxilliary classifier
x2 = AveragePooling2D((5, 5), strides=3, name='aux2_avg_pool_5x5')(x)
x2 = Conv2D(128, (1, 1), padding='same', activation='relu', name='aux2_conv_1x1_reduce')(x2)
x2 = Flatten(name='aux2_flatten')(x2)
x2 = Dense(1024, activation='relu', name='aux2_dense')(x2)
x2 = Dropout(0.7, name='aux2_dropout')(x2)
x2 = Dense(10, activation='softmax', name='aux2_output')(x2)


# 4e
x = inception_module(x,
                     filters_1x1=256,
                     filters_3x3_reduce=160,
                     filters_3x3=320,
                     filters_5x5_reduce=32,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='4e_inception')

x = MaxPool2D((3, 3), padding='same', strides=(2, 2), name='4_max_pool_4_3x3/2')(x)

# 5a
x = inception_module(x,
                     filters_1x1=256,
                     filters_3x3_reduce=160,
                     filters_3x3=320,
                     filters_5x5_reduce=32,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='5a_inception')

# 5b
x = inception_module(x,
                     filters_1x1=384,
                     filters_3x3_reduce=192,
                     filters_3x3=384,
                     filters_5x5_reduce=48,
                     filters_5x5=128,
                     filters_pool_proj=128,
                     kernel_initializer=kernel_init,
                     bias_initializer=bias_init,
                     name='5b_inception')

x = GlobalAveragePooling2D(name='5_avg_pool_3x3/1')(x)

x = Dropout(0.4)(x)

x = Dense(10, activation='softmax', name='output')(x)
model = Model(input_layer, [x, x1, x2], name='inception_v1')
model.summary()

Model: "inception_v1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input (InputLayer)              [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
1_conv_7x7/2 (Conv2D)           (None, 112, 112, 64) 9472        input[0][0]                      
__________________________________________________________________________________________________
1_max_pool_3x3/2 (MaxPooling2D) (None, 56, 56, 64)   0           1_conv_7x7/2[0][0]               
__________________________________________________________________________________________________
2_conv_1x1/1 (Conv2D)           (None, 56, 56, 64)   4160        1_max_pool_3x3/2[0][0]           
_______________________________________________________________________________________

## Training
We will limit the train set to 10000 samples and the validation set to 2000 samples to be able to keep everything in RAM.

In [0]:
import numpy as np 
import cv2
from tensorflow.keras.datasets import cifar10

num_classes = 10

def load_cifar10_data(img_rows, img_cols):

    # Load cifar10 training and validation sets
    (X_train, y_train), (X_valid, y_valid) = cifar10.load_data()

    # Resize training images
    X_train = np.array([cv2.resize(img, (img_rows,img_cols)) for img in X_train[:10000,:,:,:]]).astype('float32') / 255.0
    X_valid = np.array([cv2.resize(img, (img_rows,img_cols)) for img in X_valid[:2000,:,:,:]]).astype('float32') / 255.0

    # Transform targets to keras compatible format
    y_train = tf.keras.utils.to_categorical(y_train[:10000], num_classes)
    y_valid = tf.keras.utils.to_categorical(y_valid[:2000], num_classes)

    return X_train, y_train, X_valid, y_valid

X_train, y_train, X_test, y_test = load_cifar10_data(224, 224)

In [0]:
import math 
from tensorflow.keras.optimizers import SGD 
from tensorflow.keras.callbacks import LearningRateScheduler

epochs = 25
initial_lrate = 0.01

def decay(epoch, steps=100):
    initial_lrate = 0.01
    drop = 0.96
    epochs_drop = 8
    lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
    return lrate

sgd = SGD(lr=initial_lrate, momentum=0.9, nesterov=False)

lr_sc = LearningRateScheduler(decay, verbose=1)

model.compile(loss=['categorical_crossentropy', 'categorical_crossentropy', 'categorical_crossentropy'], loss_weights=[1, 0.3, 0.3], optimizer=sgd, metrics=['accuracy'])

In [11]:
from wandb.keras import WandbCallback

history = model.fit(X_train, [y_train, y_train, y_train], validation_data=(X_test, [y_test, y_test, y_test]), epochs=epochs, batch_size=256, callbacks=[lr_sc, WandbCallback()])

Train on 10000 samples, validate on 2000 samples

Epoch 00001: LearningRateScheduler reducing learning rate to 0.01.
Epoch 1/25

Epoch 00002: LearningRateScheduler reducing learning rate to 0.01.
Epoch 2/25

Epoch 00003: LearningRateScheduler reducing learning rate to 0.01.
Epoch 3/25

KeyboardInterrupt: ignored

View run on [WandB](https://app.wandb.ai/daysm/id2223-lab2-1/runs/ajee4drk/overview)