# Transfer Learning Retraining Inception V3

- InceptionV3 is a transfer learning model. Transfer learning refers to the practice of using a pre-trained neural network model as a starting point for a new task instead of training a new model from scratch.

- Inception Modules: InceptionV3 introduced the concept of Inception modules, which are designed to capture features at multiple scales by using filters of different sizes within the same layer.
- Efficient Use of Parameters: It uses 1x1 convolutions to reduce the dimensionality of the input before applying more computationally expensive operations.
-  Depthwise Separable Convolutions: Utilizes depthwise separable convolutions to reduce computational complexity and parameters.
- Auxiliary Classifiers: Includes auxiliary classifiers during training to address the vanishing gradient problem.
- Global Average Pooling: Uses global average pooling instead of fully connected layers to reduce overfitting.
- Good for Small Datasets: InceptionV3's design helps it perform well on tasks with limited training data.

In [1]:
import os
from glob import glob

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical # convert to one-hot-encoding

from keras.preprocessing.image import ImageDataGenerator
from keras import layers
from keras import Model
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, EarlyStopping


%matplotlib inline
import matplotlib.pyplot as plt

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Load in the Dataset

In [3]:
X_train = np.load("/content/drive/MyDrive/Project_DBDA/x_train.npy")

In [4]:
y_train = np.load("/content/drive/MyDrive/Project_DBDA/y_train.npy")

In [5]:
X_val = np.load("/content/drive/MyDrive/Project_DBDA/x_validate.npy")

In [6]:
y_val = np.load("/content/drive/MyDrive/Project_DBDA/y_validate.npy")

In [7]:
X_train.shape, X_val.shape

((6759, 100, 125, 3), (752, 100, 125, 3))

In [8]:
y_train.shape, y_val.shape

((6759, 7), (752, 7))

## Load in Pretrained Inception Model

In [9]:
pre_trained_model = InceptionV3(input_shape=(224, 224, 3), include_top=False, weights="imagenet")

In [10]:
for layer in pre_trained_model.layers:
    print(layer.name)
    layer.trainable = False

print(len(pre_trained_model.layers))

input_1
conv2d
batch_normalization
activation
conv2d_1
batch_normalization_1
activation_1
conv2d_2
batch_normalization_2
activation_2
max_pooling2d
conv2d_3
batch_normalization_3
activation_3
conv2d_4
batch_normalization_4
activation_4
max_pooling2d_1
conv2d_8
batch_normalization_8
activation_8
conv2d_6
conv2d_9
batch_normalization_6
batch_normalization_9
activation_6
activation_9
average_pooling2d
conv2d_5
conv2d_7
conv2d_10
conv2d_11
batch_normalization_5
batch_normalization_7
batch_normalization_10
batch_normalization_11
activation_5
activation_7
activation_10
activation_11
mixed0
conv2d_15
batch_normalization_15
activation_15
conv2d_13
conv2d_16
batch_normalization_13
batch_normalization_16
activation_13
activation_16
average_pooling2d_1
conv2d_12
conv2d_14
conv2d_17
conv2d_18
batch_normalization_12
batch_normalization_14
batch_normalization_17
batch_normalization_18
activation_12
activation_14
activation_17
activation_18
mixed1
conv2d_22
batch_normalization_22
activation_22
conv2d

In [11]:
last_layer = pre_trained_model.get_layer('mixed10')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output

last layer output shape: (None, 5, 5, 2048)


## Define the Model

- GlobalMaxPooling2D layer is applied to the last_output layer. This layer performs global max pooling, which reduces the spatial dimensions of the feature maps to a single value per channel. It captures the maximum value in each channel, which helps to retain important features.

- A fully connected (dense) layer with 512 hidden units is added after the global max pooling layer. The ReLU activation function is commonly used in hidden layers to introduce non-linearity.

- A dropout layer is added to prevent overfitting. Dropout randomly sets a fraction of the input units to 0 during each update, which helps prevent the network from relying too heavily on any particular feature.

- Finally, a dense layer with a softmax activation is added to produce the final classification probabilities. The number of units in this layer matches the number of classes in your classification task (in this case, 7 classes). The softmax activation ensures that the output values are normalized into a probability distribution.

In [12]:
# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.7
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(7, activation='softmax')(x)

# Configure and compile the model

model = Model(pre_trained_model.input, x)
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=True)
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [13]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d (Conv2D)                (None, 111, 111, 32  864         ['input_1[0][0]']                
                                )                                                                 
                                                                                                  
 batch_normalization (BatchNorm  (None, 111, 111, 32  96         ['conv2d[0][0]']                 
 alization)                     )                                                             

## Training


In [14]:
train_datagen = ImageDataGenerator(rotation_range=60, width_shift_range=0.2, height_shift_range=0.2,
                                   shear_range=0.2, zoom_range=0.2, fill_mode='nearest')

train_datagen.fit(X_train)

val_datagen = ImageDataGenerator()
val_datagen.fit(X_val)

In [17]:
# Set a learning rate annealer
learning_rate_reduction = ReduceLROnPlateau(monitor='accuracy',
                                            patience=4,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.00001)

# using early stopping

early_stopping = EarlyStopping(monitor='loss', patience=5, restore_best_weights=True)


In [18]:
batch_size =10
epochs = 3
history = model.fit(train_datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = val_datagen.flow(X_val, y_val),
                              verbose = 1, steps_per_epoch=(X_train.shape[0] // batch_size),
                              validation_steps=(X_val.shape[0] // batch_size),callbacks=[early_stopping,learning_rate_reduction])

Epoch 1/3
Epoch 2/3
Epoch 3/3


### Retraining

Now, we are retraining the whole models. The goal is to just tune the weights a bit for our dataset and avoid changing the pretrained weights too much!

In [19]:
for layer in pre_trained_model.layers:
    layer.trainable = True

In [20]:
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['acc'])

In [21]:
train_datagen = ImageDataGenerator(featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=60,        # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.2,         # Randomly zoom image
        shear_range=0.2,          # Apply shear transformations
        width_shift_range=0.2,    # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.2,   # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,     # randomly flip images
        vertical_flip=True  ,      # randomly flip images
        fill_mode='nearest')      # Fill newly created pixels after transformations)

train_datagen.fit(X_train)

val_datagen = ImageDataGenerator()
val_datagen.fit(X_val)

In [22]:
learning_rate_reduction = ReduceLROnPlateau(monitor='acc', patience=3, verbose=1, factor=0.5,
                                            min_lr=0.000001, cooldown=2)

# using early stopping

early_stopping = EarlyStopping(monitor='loss', patience=5, restore_best_weights=True)

In [23]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d (Conv2D)                (None, 111, 111, 32  864         ['input_1[0][0]']                
                                )                                                                 
                                                                                                  
 batch_normalization (BatchNorm  (None, 111, 111, 32  96         ['conv2d[0][0]']                 
 alization)                     )                                                             

In [24]:
batch_size = 10
epochs = 50
history = model.fit_generator(train_datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = val_datagen.flow(X_val, y_val),
                              verbose = 1, steps_per_epoch=(X_train.shape[0] // batch_size),
                              validation_steps=(X_val.shape[0] // batch_size),
                              callbacks=[early_stopping,learning_rate_reduction])

Epoch 1/50


  history = model.fit_generator(train_datagen.flow(X_train,y_train, batch_size=batch_size),






Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 21: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 36: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 48: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 49/50
Epoch 50/50


In [25]:
loss_val, acc_val = model.evaluate(X_val, y_val, verbose=1)
print("Validation: accuracy = %f  ;  loss_v = %f" % (acc_val, loss_val))

Validation: accuracy = 0.836436  ;  loss_v = 0.609765


In [26]:
import joblib

joblib.dump(model, '/content/drive/MyDrive/Project_DBDA/CNN_inceptionV3_model.joblib')

import pickle

# Specify the folder path where you want to save the file
folder_path = '/content/drive/MyDrive/Project_DBDA/'

# Save the training history using pickle in the specified folder
history_filename = '/content/drive/MyDrive/Project_DBDA/CNN_inceptionV3_model.pkl'
history_filepath = os.path.join(folder_path, history_filename)

with open(history_filepath, 'wb') as f:
    pickle.dump(history_filename, f)

# save the model to disk
filename = 'CNN_Inception_model.pkl'
pickle.dump(model, open(filename,'wb'))


# model = joblib.load('xgbpipe.joblib')
# test = pd.read_csv('test.csv')
# yhat = model.predict(test)
# yhat

## Testing

Let's load in the intact test set and test our model

In [27]:
X_test = np.load("/content/drive/MyDrive/Project_DBDA/x_test.npy")

In [28]:
y_test = np.load("/content/drive/MyDrive/Project_DBDA/y_test.npy")
# y_test = to_categorical(y_test)

In [29]:
loss_test, acc_test = model.evaluate(X_test, y_test, verbose=1)
print("Test: accuracy = %f  ;  loss = %f" % (acc_test, loss_test))

Test: accuracy = 0.827476  ;  loss = 0.590223


In [30]:
model.save("InceptionV3.h5")