# TRANSFER LEARNING FOR IMPROVED MODEL WITH LESS DATA

_**Applies transfer learning to reuse pretrained layers to experiment if it improves model performance with less data.**_

**High-level Steps:**

1. Instead of taking an already trained model (containing pretrained layers), a model gets trained in this experiment to be considered as a pretrained model. To train that model, data for 8 classes out of total 10 classes in Fashion MNIST dataset are used. This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images.
a drop-in replacement for MNIST.

2. Then a binary classification model (the target model) gets trained (from scratch) on the data from remaining two clssses from the same dataset and its prediction performance gets observed.

3. Then the same classification model is build by apply transfer learning using pretrained layers from the model created in first step.

4. Lastly the prediction performance of the target model is compared with that of the model created in the second step. Also, analysis is performed to appreciate if transfer learning speeds up training and make training possible with less data.

In [1]:
# Imports required packages

import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

2025-02-15 18:03:15.290541: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-15 18:03:15.291296: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-15 18:03:15.294956: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-15 18:03:15.306398: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739622795.326201  132940 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739622795.33

## Loading and Preparing Data

In [2]:
# Loads fashion mnist dataset
fashion = tf.keras.datasets.fashion_mnist.load_data()

In [3]:
# Each training and test example is assigned to one of the following labels.
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", \
               "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [4]:
# Considering dataset is organized in tuple, items are referenced as follows
(X_train_full, y_train_full), (X_test, y_test) = fashion

In [5]:
# Checks the shape of the datasets
print("Train dataset shape:", X_train_full.shape)
print("Test dataset shape:", X_test.shape)

Train dataset shape: (60000, 28, 28)
Test dataset shape: (10000, 28, 28)


In [6]:
# Checks the data type of the data
X_train_full.dtype

dtype('uint8')

In [8]:
# Prints the labels for refer to the class index
y_train_full

array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

Considering the target binary classification model is expected to classify "Pullover" and "T-shirt/top", it separates data for these two classes leaving data for remaining 8 classes to build a model to be considered as pretrained model later.

In [9]:
# Finds the index for the target class "Pullover" and "T-shirt/top" as
# dataset labels contains class indexes instead of class names

class_0_index = class_names.index("Pullover")
class_1_index = class_names.index("T-shirt/top")

print("Index of class_0:", class_0_index)
print("Index of class_1:", class_1_index)

Index of class_0: 2
Index of class_1: 0


In [10]:
# Gets the indexes of training label containing either classes
class_0_1_index_flag = [True if (x==class_0_index or x==class_1_index) else False for x in y_train_full]

# Shows few flags 
print(class_0_1_index_flag[:10])

[False, True, True, False, True, True, False, True, False, False]


In [11]:
# Seperates dataset containing data for two classes
X_train_2_classes_full = X_train_full[class_0_1_index_flag]

# Checks the shape of the dataset
X_train_2_classes_full.shape

(12000, 28, 28)

In [12]:
# Flips bool values (True to False and False to True) to get the flags against
# other classes in the training label
class_0_1_index_flag_flipped = [not flag for flag in class_0_1_index_flag]

# Shows few flags 
print(class_0_1_index_flag_flipped[:10])

[True, False, False, True, False, False, True, False, True, True]


In [13]:
# Seperates dataset containing data for the remaining 8 classes
X_train_8_classes_full = X_train_full[class_0_1_index_flag_flipped]

# Checks the shape of the dataset
X_train_8_classes_full.shape

(48000, 28, 28)

In [14]:
# Sum of the first dimension value of both the dataset should be equal to the total number of training instances
X_train_2_classes_full.shape[0] + X_train_8_classes_full.shape[0]

60000

In [15]:
# Similarly, separates targets to contain only respective labels
y_train_2_classes_full = y_train_full[class_0_1_index_flag]
y_train_8_classes_full = y_train_full[class_0_1_index_flag_flipped]

# Checks the shape of the targets
print(y_train_2_classes_full.shape)
print(y_train_8_classes_full.shape)

(12000,)
(48000,)


## Modeling

### Training Model to be Considered as Pretrained

**Preprocesses Datasets**

In [16]:
# Separates validation dataset
X_train_8_classes, X_val_8_classes, y_train_8_classes, y_val_8_classes = train_test_split(
    X_train_8_classes_full, y_train_8_classes_full, test_size=5000, random_state=42, stratify=y_train_8_classes_full)

In [17]:
# Prints the shape of the separated datasets both containing 8 classes
print(X_train_8_classes.shape)
print(X_val_8_classes.shape)

(43000, 28, 28)
(5000, 28, 28)


In [18]:
# Then standardizes the datasets by first calculating mean and standard deviation, and then
# by subtracting the mean from the data and then dividing the data by standard deviation

pixel_means_8_classes = X_train_8_classes.mean(axis=0, keepdims=True)
pixel_stds_8_classes = X_train_8_classes.std(axis=0, keepdims=True)

X_train_8_classes_scaled = (X_train_8_classes - pixel_means_8_classes) / pixel_stds_8_classes
X_val_8_classes_scaled = (X_val_8_classes - pixel_means_8_classes) / pixel_stds_8_classes


In [19]:
# As the labels ranges from [1, 3, 4, 5, 6, 7, 8, 9], it normalizes the label from 0 through 7
label_encoder_8_classes = LabelEncoder()
y_train_8_classes_encoded = label_encoder_8_classes.fit_transform(y_train_8_classes)
y_val_8_classes_encoded = label_encoder_8_classes.transform(y_val_8_classes)

In [20]:
# Initializes the following densed neural network with arbirary number of layers and compiles it

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(8, activation="softmax")
])

model.compile(
    loss="sparse_categorical_crossentropy", 
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
    metrics=["accuracy"])


  super().__init__(**kwargs)
2025-02-15 18:03:18.964372: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [21]:
# Checks for model summary [optional]
model.summary()

In [22]:
# Fits the model over specific number iterations (epochs) and validation data
# to observe the learning performance during training
model_history = model.fit(X_train_8_classes_scaled, y_train_8_classes_encoded, epochs=20, 
                          validation_data=(X_val_8_classes_scaled, y_val_8_classes_encoded))

Epoch 1/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.6154 - loss: 1.1450 - val_accuracy: 0.8316 - val_loss: 0.4857
Epoch 2/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8378 - loss: 0.4601 - val_accuracy: 0.8618 - val_loss: 0.3884
Epoch 3/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8625 - loss: 0.3840 - val_accuracy: 0.8770 - val_loss: 0.3446
Epoch 4/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8807 - loss: 0.3361 - val_accuracy: 0.8848 - val_loss: 0.3196
Epoch 5/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8893 - loss: 0.3130 - val_accuracy: 0.8922 - val_loss: 0.3003
Epoch 6/20
[1m1344/1344[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.8978 - loss: 0.2907 - val_accuracy: 0.8982 - val_loss: 0.2883
Epoch 7/20
[1m1

In [23]:
# Saves the trained model on disk to be used as pretrained model later.
# NOTE: Folder "model" must exist for model file to be saved into.
model.save("./models/my_fashion_mnist_model.keras")

### Training Target Model from Scratch

**Preprocesses Datasets**

In [24]:
# Separates validation dataset from the data containg 2 classes
X_train_2_classes, X_val_2_classes, y_train_2_classes, y_val_2_classes = train_test_split(
    X_train_2_classes_full, y_train_2_classes_full, test_size=3000, random_state=42, stratify=y_train_2_classes_full)

In [25]:
# Prints the shape of the separated datasets containing both classes
print(X_train_2_classes.shape)
print(X_val_2_classes.shape)

(9000, 28, 28)
(3000, 28, 28)


In [26]:
# Then standardizes the datasets by first calculating mean and standard deviation, and then
# by subtracting the mean from the data and then dividing the data by standard deviation

pixel_means_2_classes = X_train_2_classes.mean(axis=0, keepdims=True)
pixel_stds_2_classes = X_train_2_classes.std(axis=0, keepdims=True)

X_train_2_classes_scaled = (X_train_2_classes - pixel_means_2_classes) / pixel_stds_2_classes
X_val_2_classes_scaled = (X_val_2_classes - pixel_means_2_classes) / pixel_stds_2_classes

In [27]:
# As the labels ranges from [1, 3, 4, 5, 6, 7, 8, 9], it normalizes the label from 0 through 7

label_encoder_2_classes = LabelEncoder()
y_train_2_classes_encoded = label_encoder_2_classes.fit_transform(y_train_2_classes)
y_val_2_classes_encoded = label_encoder_2_classes.transform(y_val_2_classes)

In [28]:
# Clears the name counters and 
# sets the global random seed for operations that rely on a random seed
tf.keras.backend.clear_session()
tf.random.set_seed(42)

# Initializes the following densed neural network with arbirary number of layers and compiles it
model_from_scratch = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model_from_scratch.compile(
    loss="binary_crossentropy", 
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.001), 
    metrics=["accuracy"])

  super().__init__(**kwargs)


In [29]:
# Checks for model summary [optional]
model_from_scratch.summary()

In [30]:
# Fits the model over specific number iterations (epochs) on all the training data available for the 2 classes
# and validation data to observe the learning performance during training
model_from_scratch_history = model_from_scratch.fit(X_train_2_classes_scaled, y_train_2_classes_encoded, epochs=20, 
                                                    validation_data=(X_val_2_classes_scaled, y_val_2_classes_encoded))

Epoch 1/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.6550 - loss: 0.6034 - val_accuracy: 0.9407 - val_loss: 0.2608
Epoch 2/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9398 - loss: 0.2398 - val_accuracy: 0.9570 - val_loss: 0.1776
Epoch 3/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9509 - loss: 0.1691 - val_accuracy: 0.9577 - val_loss: 0.1502
Epoch 4/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9552 - loss: 0.1415 - val_accuracy: 0.9607 - val_loss: 0.1382
Epoch 5/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9585 - loss: 0.1271 - val_accuracy: 0.9617 - val_loss: 0.1319
Epoch 6/20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9608 - loss: 0.1181 - val_accuracy: 0.9620 - val_loss: 0.1279
Epoch 7/20
[1m282/282[0m 

In [31]:
# Gets the indexes of test label containing either classes
class_0_1_index_flag = [True if (x==class_0_index or x==class_1_index) else False for x in y_test]

In [32]:
# Seperates dataset containing data for two classes from the whole test set also containing other classes
X_test_2_classes = X_test[class_0_1_index_flag]

# Checks the shape of the dataset
X_test_2_classes.shape

(2000, 28, 28)

In [33]:
# Similarly, separates targets to contain only respective labels
y_test_2_classes = y_test[class_0_1_index_flag]

In [34]:
# Normalizes the test labels for the 2 classes using the already fitted encoder
y_test_2_classes_encoded = label_encoder_2_classes.transform(y_test_2_classes)

In [35]:
# Prints the encoded classes for reference
y_test_2_classes_encoded

array([1, 1, 0, ..., 0, 0, 1])

In [36]:
# Standardizes the test set by subtracting the mean from the data and then dividing the data by standard deviation
X_test_2_classes_scaled = (X_test_2_classes - pixel_means_2_classes) / pixel_stds_2_classes

In [37]:
# Evaluates the test prediction performance on the model built from scratch
model_from_scratch.evaluate(X_test_2_classes_scaled, y_test_2_classes_encoded)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 742us/step - accuracy: 0.9610 - loss: 0.1085


[0.11099033057689667, 0.9624999761581421]

The above model that was built from scratch over **9000** [12000 total - 3000 validation instances] training instances containing data for 2 classes, reached **96.24%** accuracy in test set. The experiment continues to apply transfer learning by reusing pretrained layers from first model built over other 8 classes to check if new model trained over less data can achieve accuracy from the model built from scratch.

### Transfer Learning

In [88]:
# Loads the saved model created to be used as pretrained model
model_using_pretrained_layers = tf.keras.models.load_model("./models/my_fashion_mnist_model.keras")

In [89]:
# Checks the model summary especially to refer to the last layer i.e. the output layer
model_using_pretrained_layers.summary()

In [90]:
# Removes the last layer (containing 8 output) to add task specific binary output layer
model_using_pretrained_layers.pop()

# And then adds a binary output layer
model_using_pretrained_layers.add(tf.keras.layers.Dense(1, activation="sigmoid", name="output"))

# Then verifies the same visualizing the model summary
model_using_pretrained_layers.summary()

**Fine-tuning already pretrained model**

In [None]:
# Considers only 70% of the 2-classes training set to check the effectiveness of the transfer learning

X_train_2_classes_scaled_subset, _, y_train_2_classes_encoded_subset, _ = train_test_split(
    X_train_2_classes_scaled, y_train_2_classes_encoded, train_size=0.70, stratify=y_train_2_classes_encoded)

In [92]:
# First sets all the pretrained layers (except for the newly added output layer) non-trainable
for layer in model_using_pretrained_layers.layers[:-1]:
    layer.trainable = False

In [None]:
# Then trains just the output layer over lower learning rate and relatively lesser number of training iterations

tf.keras.backend.clear_session()
tf.random.set_seed(42)

model_using_pretrained_layers.compile(
    loss="binary_crossentropy", optimizer=tf.keras.optimizers.SGD(learning_rate=0.001))

model_using_pretrained_layers_history = model_using_pretrained_layers.fit(
    X_train_2_classes_scaled_subset, y_train_2_classes_encoded_subset, epochs=10, 
    validation_data=(X_val_2_classes_scaled, y_val_2_classes_encoded))


Epoch 1/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.2468 - loss: 1.1377 - val_accuracy: 0.6753 - val_loss: 0.6206
Epoch 2/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7705 - loss: 0.5438 - val_accuracy: 0.8813 - val_loss: 0.4155
Epoch 3/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8891 - loss: 0.3884 - val_accuracy: 0.9040 - val_loss: 0.3423
Epoch 4/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9046 - loss: 0.3290 - val_accuracy: 0.9137 - val_loss: 0.3056
Epoch 5/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9108 - loss: 0.2979 - val_accuracy: 0.9167 - val_loss: 0.2833
Epoch 6/10
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.9156 - loss: 0.2786 - val_accuracy: 0.9213 - val_loss: 0.2682
Epoch 7/10
[1m197/197[0m 

In [None]:
# Now, makes all the pretrained layers trainable and performs retraining over 
# smaller learning rate and relatively longer training iterations

for layer in model_using_pretrained_layers.layers[:-1]:
    layer.trainable = True

# Recompiles the model due to change of trainability of the layers
model_using_pretrained_layers.compile(
    loss="binary_crossentropy", 
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
    metrics=["accuracy"])

model_using_pretrained_layers_history = model_using_pretrained_layers.fit(
   X_train_2_classes_scaled_subset, y_train_2_classes_encoded_subset, epochs=50, 
   validation_data=(X_val_2_classes_scaled, y_val_2_classes_encoded))

Epoch 1/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9317 - loss: 0.2202 - val_accuracy: 0.9400 - val_loss: 0.1802
Epoch 2/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9422 - loss: 0.1745 - val_accuracy: 0.9460 - val_loss: 0.1575
Epoch 3/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9504 - loss: 0.1519 - val_accuracy: 0.9517 - val_loss: 0.1453
Epoch 4/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9570 - loss: 0.1375 - val_accuracy: 0.9543 - val_loss: 0.1374
Epoch 5/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9583 - loss: 0.1270 - val_accuracy: 0.9553 - val_loss: 0.1320
Epoch 6/50
[1m197/197[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9605 - loss: 0.1190 - val_accuracy: 0.9560 - val_loss: 0.1279
Epoch 7/50
[1m197/197[0m 

In [95]:
# Evaluates the test prediction performance on the model built using pretrained layers
model_using_pretrained_layers.evaluate(X_test_2_classes_scaled, y_test_2_classes_encoded)

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 752us/step - accuracy: 0.9677 - loss: 0.0950


[0.10547270625829697, 0.9664999842643738]

Though this model built over pretrained layers using on 70% of the available training set, but could also achieved **96.64%** test accuracy as compared to **96.24%** accuracy of the model built from scratch over the full training set. The error rate was improved by **11.9%** [(96.64−96.24)÷(100−96.64)×100].