# Hands-on Activity 3.2 - Transfer Learning

Technological Institute of the Philippines | Quezon City - Computer Engineering
--- | ---
Course Code: | CPE 313
Code Title: | Advanced Machine Learning and Deep Learning
2nd Semester | AY 2023-2024
<hr> | <hr>
<u>**ACTIVITY NO.** | **Hands-on Activity 3.2 Transfer Learning**
**Name** | Mendoza, Paulo
<hr> | <hr>
**Section** | CPE32S8
**Date Performed**: | March 5, 2024
**Date Submitted**: | March 5, 2024
**Instructor**: | Engr. Roman M. Richard

<hr>

#### Objective(s):

This activity aims to introduce how to apply transfer learning

#### Intended Learning Outcomes (ILOs):
* Demonstrate how to build and train neural network
* Demonstrate how to apply transfer learning in neural network


#### Resources:
* Jupyter Notebook
* CIFAR-10 dataset

#### Procedures
Load the necessary libraries

In [None]:
from __future__ import print_function

import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

Set the parameters

In [None]:
now = datetime.datetime.now
batch_size = 128
num_classes = 5
epochs = 5
img_rows, img_cols = 28, 28
filters = 32
pool_size = 2
kernel_size = 3

Set how the input data is loaded

In [None]:

if K.image_data_format() == 'channels_first':
    input_shape = (1, img_rows, img_cols)
else:
    input_shape = (img_rows, img_cols, 1)

* Write a function to include all the training steps.
* Use the model, training set, test set and number of classes as function parameters


In [None]:
def train_model(model, train, test, num_classes):
    x_train = train[0].reshape((train[0].shape[0],) + input_shape)
    x_test = test[0].reshape((test[0].shape[0],) + input_shape)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(train[1], num_classes)
    y_test = keras.utils.to_categorical(test[1], num_classes)

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    t = now()
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    print('Training time: %s' % (now() - t))

    score = model.evaluate(x_test, y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

Shuffle and split the data between train and test sets

In [None]:

(x_train, y_train), (x_test, y_test) = mnist.load_data()



Create two datasets
* one with digits below 5
* one with 5 and above

In [None]:
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5

* Define the feature layers that will used for transfer learning
* Freeze these layers during fine-tuning process

In [None]:


feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]

Define the classification layers

In [None]:


classification_layers = [
    Dense(128),
    Activation('relu'),
    Dropout(0.5),
    Dense(num_classes),
    Activation('softmax')
]

Create a model by combining the feature layers and classification layers

In [None]:

model = Sequential(feature_layers + classification_layers)

Check the model summary

In [None]:

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 activation (Activation)     (None, 26, 26, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_1 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 12, 12, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

 Train the  model on the digits 5,6,7,8,9

In [None]:
train_model(model,
            (x_train_gte5, y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:04:23.098578
Test score: 1.455298662185669
Test accuracy: 0.6665295362472534


Freeze only the feature layers

In [None]:

for l in feature_layers:
    l.trainable = False

Check again the summary and observe the parameters from the previous model

In [None]:
model.summary()
# the Non-trainable params is now 9568 not 0 anymore because we freezed the feature layers. This only means that they are all same format of numbers so it doesn't matter too much how we preprocess them.

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 activation (Activation)     (None, 26, 26, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_1 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2  (None, 12, 12, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

Train again the model using the 0 to 4 digits

In [None]:
train_model(model,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)
# We got an accuracy of 71% on 5 - 9 it was 66% meaning we proved that freezing the feature layers does not have detrimental impact and it still works with 0 - 4 data.
# Also the accuracy is higher because it already started the training on where the 1st model left of it continued learning
# still we only use 5 epoch the accuracy was still very low on the 1st 5 epoch but if we run train_model again and again it will continue to improve I have confirmed this by running this block of code again and again
# but don't do it too much as there can be over fitting
# also the training time went from 4 minutes to 2 minutes this is because it only trained the classification layer not the feature layer anymore

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:02:23.083420
Test score: 1.3767049312591553
Test accuracy: 0.7168709635734558


#### Supplementary Activity
Now write code to reverse this training process. That is, you will train on the digits 0-4, and then finetune only the last layers on the digits 5-9.

In [22]:
# i re created the features layers so that i can train the classification layer using the 0 - 4 data, because if i don't recreate this it is still frozen
feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape=input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),
]

In [23]:
# new model from scratch but same feature and classification layer as in the procedure
model_2 = Sequential(feature_layers + classification_layers)

In [24]:
# this is now the summary notice that the feature_layers not frozen so Non-trainable params is 0
model_2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_8 (Activation)   (None, 26, 26, 32)        0         
                                                                 
 conv2d_7 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_9 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 12, 12, 32)        0         
 g2D)                                                            
                                                                 
 dropout_4 (Dropout)         (None, 12, 12, 32)        0         
                                                      

In [25]:
# here we just used values 0 - 4 first
train_model(model_2,
            (x_train_lt5, y_train_lt5),
            (x_test_lt5, y_test_lt5), num_classes)

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:05:23.406826
Test score: 1.1250667572021484
Test accuracy: 0.8678731322288513


In [26]:
# Freeze only the feature_layers
for l in feature_layers:
    l.trainable = False

In [27]:
# now using the 5 - 9 data while using the 0 - 4 weights on feature layers
train_model(model_2,
            (x_train_gte5,y_train_gte5),
            (x_test_gte5, y_test_gte5), num_classes)

# again it look shorter to train because we skipped the feature layers
# but this time the accuracy of the 1st model was higher than this 2nd iteration but is it normal since this is a case to case scenario

x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time: 0:02:23.203627
Test score: 1.2127958536148071
Test accuracy: 0.7412055134773254


#### Conclusion

Transfer Learning is very useful. It can help save time because your machine will only have to train the classification layers. This works because they are the same data structure in a sense that it won't matter how preproccessing is done. Only needing to train the classification layers not the features layers.

# Google Collab Link:

https://colab.research.google.com/drive/1ssv6zD8H3k0N4ffVETfpn6IQkR5GEWFj?usp=sharing