In [1]:
import numpy as np

import tensorflow as tf
from tensorflow import keras


# Introduction

![](./Sources/idea.jpg)

**Inception v3** is a convolutional neural network for assisting in image analysis and object detection, and got its start as a module for Googlenet. It is the third edition of Google's Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. Just as ImageNet can be thought of as a database of classified visual objects, Inception helps classification of objects in the world of computer vision. One such use is in life sciences, where it aids in the research of Leukemia. The original name (Inception) was codenamed this way after a popular "'we need to go deeper' internet meme" went viral, quoting a phrase from Inception film of Christopher Nolan.

# Inception modules

Inception V3 is as the name suggests a architecture of the Inception type. These are build up of so called Inception module, of which the idea is that instead of chosing a number of filters of one filtersize in a convolutional layer at a point, you chose multiple filter size, and then stack the outputs into one. This allows for detection of objects of different sizes in images in an effective way. The inception module is illustrated in the figure below, from the paper "Going deeper with convolutions"(Szegedy et al., 2014), where the inception network was first introduced.

## Naive version
![](./Sources/naive.png)

Call this module, the model (a), the naive version. This uses 3 types convolutional filters of size 1x1, 3x3 and 5x5, and a pooling filter. To ensure the output is of the different filters are of the same size, a same padding is used on both the convolutional filter, but also the pooling filter.

This is however an operational costly layer, where the numbers of operations for each convolutional filter is the dimension of the input, height width number of channels, times the dimension of the filter, height width number of filters. To reduce the number of operation, module b is proposed.

In [2]:
def inception_module_naive(x, f1, f2, f3):
    # 1x1 conv
    conv1 =  keras.layers.Conv2D(f1, (1,1), padding='same', activation='relu')(x)
    # 3x3 conv
    conv3 = keras.layers.Conv2D(f2, (3,3), padding='same', activation='relu')(x)
    # 5x5 conv
    conv5 = keras.layers.Conv2D(f3, (5,5), padding='same', activation='relu')(x)
    # 3x3 max pooling
    pool = keras.layers.MaxPooling2D((3,3), strides=(1,1), padding='same')(x)
    # concatenate filters
    out = keras.layers.merge.concatenate([conv1, conv3, conv5, pool])
    return out

## Inception with reduction
![](./Sources/inception_module_with_reduction.png)

Here the inventors have added a 1x1 filter before the 3x3 and 5x5 filters. Doing this, the dimension of the input are reduced, specifically the number of channels of the input is reduced, and the output of the 1x1 filters, have a number of channels equal to the number of 1x1 filters. Doing this, the cost of going from the input to the output of the 5x5 filters is reduced to the dimension of the input, height width number of channels, times 1x1 times number of 1x1 filters, plus the second filtering, through 5x5, which is equal to the original one in b, divided by the ratio of number of input channels / number of 1x1 filters.

In [3]:
def inception_module_with_reduction(x, f1, f2, f3):
    # 1x1 conv
    conv1 =  keras.layers.Conv2D(f1, (1,1), padding='same', activation='relu')(x)
    
    # 3x3 conv
    conv3 = keras.layers.Conv2D(f2, (3,3), padding='same', activation='relu')(x)
    conv3 =  keras.layers.Conv2D(f1, (1,1), padding='same', activation='relu')(conv3)

    # 5x5 conv
    conv5 = keras.layers.Conv2D(f3, (5,5), padding='same', activation='relu')(x)
    conv5 =  keras.layers.Conv2D(f1, (1,1), padding='same', activation='relu')(conv5)
    
    # 3x3 max pooling
    pool =  keras.layers.Conv2D(f1, (1,1), padding='same', activation='relu')(x)
    pool = keras.layers.MaxPooling2D((3,3), strides=(1,1), padding='same')(pool)
    
    # concatenate filters
    out = keras.layers.merge.concatenate([conv1, conv3, conv5, pool])
    return out

The popular versions are as follows:
* Inception v1.
* Inception v2 and Inception v3.
* Inception v4 and Inception-ResNet.

# Inception V3

Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset.

As we can see, the Inception V3 architecture also involves reduction modules, which in principle are the same as inception module, except that it is designed to decrease the dimensions of the input. In total the Inception V3 includes about 24M parameters. It is also worth mentioning that the V3 takes as default input 299x299x3, and uses a RSMProp optimizer. As this is designed for the ImageNet dataset, it outputs 1000 different classes, but as we use it for the plankton dataset, we change the last layers to fit to our desired output.

![](./Sources/InceptionV3.png)

# Details and Implementation 

![](./Sources/stem.png)

In [4]:
def conv2d_bn(x,filters,num_row,num_col,padding='same',strides=(1, 1)):
   
    x = keras.layers.Conv2D(filters, (num_row, num_col),strides=strides,padding=padding)(x)
    x = keras.layers.BatchNormalization(axis=3, scale=False)(x)
    x = keras.layers.Activation('relu')(x)
    return x

In [5]:
def inc_stem(x):
    x = conv2d_bn(x, 32, 3, 3, strides=(2, 2), padding='valid') # 149 x 149 x 32
    x = conv2d_bn(x, 32, 3, 3, padding='valid')  # 147 x 147 x 32
    x = conv2d_bn(x, 64, 3, 3) # 147 x 147 x 64

    x = keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)   # 73  x 73 x 64
    x = conv2d_bn(x, 80, 1, 1, padding='valid') # 73 x 73 x 80
    x = conv2d_bn(x, 192, 3, 3, padding='valid')  # 71 x 71 x 192
    x = keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)  # 35 x 35 x 192
    return x

![](./Sources/inc_a.png)

In [6]:
def inc_block_a(x):    
    branch1x1 = conv2d_bn(x, 64, 1, 1)  # 64 filters of 1*1

    branch5x5 = conv2d_bn(x, 48, 1, 1)  #48 filters of 1*1
    branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)

    branch3x3dbl = conv2d_bn(x, 64, 1, 1)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)

    branch_pool = keras.layers.AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
    branch_pool = conv2d_bn(branch_pool, 32, 1, 1)
    x = keras.layers.concatenate([branch1x1, branch5x5, branch3x3dbl, branch_pool], axis=channel_axis)
    return x

![](./Sources/reduct_a.png)

In [7]:
def reduction_block_a(x):  
    branch3x3 = conv2d_bn(x, 384, 3, 3, strides=(2, 2), padding='valid')

    branch3x3dbl = conv2d_bn(x, 64, 1, 1)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
    branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3, strides=(2, 2), padding='valid')

    branch_pool = keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
    x = keras.layers.concatenate([branch3x3, branch3x3dbl, branch_pool],axis=channel_axis)
    return x

![](./Sources/inc_b.png)

In [8]:
# 17 x 17 x 768
def inc_block_b(x):
    branch1x1 = conv2d_bn(x, 192, 1, 1)

    branch7x7 = conv2d_bn(x, 128, 1, 1)
    branch7x7 = conv2d_bn(branch7x7, 128, 1, 7)
    branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)

    branch7x7dbl = conv2d_bn(x, 128, 1, 1)
    branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
    branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 1, 7)
    branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
    branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)

    branch_pool = keras.layers.AveragePooling2D((3, 3), strides=(1, 1),padding='same')(x)
    branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
    x = keras.layers.concatenate([branch1x1, branch7x7, branch7x7dbl, branch_pool], axis=channel_axis)
    return x

![](./Sources/reduct_b.png)

In [9]:
# mixed 8: 8 x 8 x 1280
def reduction_block_b(x): 
    branch3x3 = conv2d_bn(x, 192, 1, 1)
    branch3x3 = conv2d_bn(branch3x3, 320, 3, 3,strides=(2, 2), padding='valid')

    branch7x7x3 = conv2d_bn(x, 192, 1, 1)
    branch7x7x3 = conv2d_bn(branch7x7x3, 192, 1, 7)
    branch7x7x3 = conv2d_bn(branch7x7x3, 192, 7, 1)
    branch7x7x3 = conv2d_bn( branch7x7x3, 192, 3, 3, strides=(2, 2), padding='valid')

    branch_pool = keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
    x = keras.layers.concatenate([branch3x3, branch7x7x3, branch_pool], axis=channel_axis)
    return x

![](./Sources/inc_c.png)

In [10]:
def inc_block_c(x):        
        branch1x1 = conv2d_bn(x, 320, 1, 1)

        branch3x3 = conv2d_bn(x, 384, 1, 1)
        branch3x3_1 = conv2d_bn(branch3x3, 384, 1, 3)
        branch3x3_2 = conv2d_bn(branch3x3, 384, 3, 1)
        branch3x3 = keras.layers.concatenate([branch3x3_1, branch3x3_2],axis=channel_axis)

        branch3x3dbl = conv2d_bn(x, 448, 1, 1)
        branch3x3dbl = conv2d_bn(branch3x3dbl, 384, 3, 3)
        branch3x3dbl_1 = conv2d_bn(branch3x3dbl, 384, 1, 3)
        branch3x3dbl_2 = conv2d_bn(branch3x3dbl, 384, 3, 1)
        branch3x3dbl = keras.layers.concatenate([branch3x3dbl_1, branch3x3dbl_2], axis=channel_axis)

        branch_pool = keras.layers.AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
        branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
        x = keras.layers.concatenate( [branch1x1, branch3x3, branch3x3dbl, branch_pool],axis=channel_axis)
        return x

# Stacking blocks and creating models

In [13]:
def Inception_V3(img_input,classes):
    x=inc_stem(img_input)

    x=inc_block_a(x) #35, 35, 256
    x=inc_block_a(x) #35, 35, 256
    x=inc_block_a(x) #35, 35, 256

    x=reduction_block_a(x) #17, 17, 736

    x=inc_block_b(x) #17, 17, 768
    x=inc_block_b(x) #17, 17, 768
    x=inc_block_b(x) #17, 17, 768
    x=inc_block_b(x) #17, 17, 768

    x=reduction_block_b(x) #shape=(None, 8, 8, 1280)

    x=inc_block_c(x) # shape=(None, 8, 8, 2048) 
    x=inc_block_c(x) # shape=(None, 8, 8, 2048) 

    x = keras.layers.GlobalAveragePooling2D(name='avg_pool')(x) # shape=(None, 2048)

    x = keras.layers.Dense(classes, activation='softmax', name='predictions')(x) #shape=(None, 1000) 


    # Create model.
    inputs = img_input
    outputs = x
    model =  keras.Model(inputs,outputs, name='inception_v3')
    return model


In [15]:
img_input = keras.Input(shape=(299, 299, 3)) 
classes
model = Inception_V3(img_input,classes=1000)
model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
conv2d_94 (Conv2D)              (None, 149, 149, 32) 896         input_2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_94 (BatchNo (None, 149, 149, 32) 96          conv2d_94[0][0]                  
__________________________________________________________________________________________________
activation_94 (Activation)      (None, 149, 149, 32) 0           batch_normalization_94[0][0]     
_______________________________________________________________________________________

In [12]:
# plot model architecture
# from tensorflow.keras.utils import plot_model
# plot_model(model, show_shapes=True, to_file='inception_model_3.png')

Thanks to JakobKallestad https://github.com/JakobKallestad/InceptionV3-on-plankton-images