# TRANSFER LEARNING

**_Experimenting with Transfer Learning for a classification task._**

In transfer learning, features learned on one problem are taken and leveraged them on a new and similar problem. It is usually done for tasks where dataset has too little to train a full-scale model from scratch. 

**The Experiment:**

- Loads _MobileNetV2_ model - pretrained on _imagenet_ dataset, as a base model taking all layers except the top one that is used for classification specific to ImageNet task.

- Freezes all the layers in the base model to avoid destroying already learned parameters during training related to new task to classify cats and dogs.

- Adds few new, trainable layers such as pooling, dropout and dense layer on top of the frozen layers for them to learn based on the new dataset.

- Prepares the new dataset in a form that is acceptable by the base model.

- Trains the head (newly added layers) on the new data over few epochs to get them trained.

- Unfreezes the all the layers in the base model and fine-tunes the entire model by re-training it on the new data with a very low learning rate to achieve meaningful improvements by incrementally adapting the pretrained features to the new data.

- Evaluates the model performance on the test data.

## Importing Packages

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import data as tf_data

import matplotlib.pyplot as plt

2025-12-22 03:03:49.637423: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-22 03:03:50.163596: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-22 03:03:51.946788: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## Data Acquisition

In [2]:
# Fetches the "cats vs. dogs" dataset using TFDS
# Only 40% of the data is used to show the effectiveness of transfer learning
train_set, val_set, test_set = tfds.load(
    "cats_vs_dogs",
    split=["train[:40%]", "train[40%:50%]", "train[50%:60%]"],  # 40% for training, 10% for validation and 10% for test
    as_supervised=True,                                         # Includes labels
)

print("Sample Counts:")
print(f"Training: {train_set.cardinality()}, Validation: {val_set.cardinality()}, Test: {test_set.cardinality()}")

Sample Counts:
Training: 9305, Validation: 2326, Test: 2326


2025-12-22 03:04:03.942728: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


## Data Analysis

## Data Preprocessing

As the images size varies and each pixel consists of 3 integer values between 0 and 255 (RGB level values), all the images get resized to same size of 160x160 and pixel values get normalized between -1 and 1 according to the requirement of the pretrained model. Additionally, random data augmentation is also applied.

**Resizing images**

In [4]:
resizer =tf.keras.layers.Resizing(160 , 160 ) # Code here to use `tf.keras.layers.Resizing` as function to resize image to 160x160

# Maps each sample from the each of the datasets and performs resizing
# Perform the image resizing by calling `map` lambda function of each dataset 
# passing resizer (with labels)

train_set = train_set.map(lambda x, y :(resizer(x), y))# Code here
val_set = val_set.map(lambda x, y :(resizer(x), y))# Code here
test_set =test_set.map(lambda x, y :(resizer(x), y))# Code here

In [5]:
# Confirms the resizing just by taking one sample with an index
next(iter(train_set))[0].shape

TensorShape([160, 160, 3])

**Augmenting Data**

Applying data augmentation to generate samples artificially by applying random and realistic transformations to the training images, such as random horizontal flipping or small random rotations helping the model to get exposed to different aspects of the training data while slowing down overfitting. It makes sense when dataset is not large or not balanced.

In [6]:
# Configures the sequence of operations (once again taking layers as utulity functions)
augmentation_layers = [
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.1),
]

def data_augmentation(x):
    """
    Augments the image by flipping and rotating it
    """
    for layer in augmentation_layers:
        x = layer(x)
    return x

# Use `data_augmentation` function passing it to train set's `map` lambda function to augment the images
train_set =train_set.map(lambda x, y :(data_augmentation(x), y)) # Code here


**Batching Data and Optimizing Loading Speed using Prefetching and Caching**

In [7]:
batch_size = 32

train_set = train_set.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()
val_set = val_set.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()
test_set = test_set.batch(batch_size).prefetch(tf_data.AUTOTUNE).cache()

## Modeling

In [10]:
# Instantiate `tf.keras.applications.MobileNetV2` as base model specifying 
# (160, 160, 3) as `input_shape`, 'False` to `include_top` to not to include top layer and
# `imagenet` as `weights` to initialize the model with specific weights

base_model = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3),
    include_top=False,
    weights="imagenet",
)
    
   

base_model.trainable=False# Code to freeze all layers in the base model

In [16]:
# Create target model out of base model

inputs =tf.keras.layers.Input(shape=(160, 160 , 3))    # Code to initialize the model input with expectation of `shape` (160, 160, 3)

x = tf.keras.applications.mobilenet_v2.preprocess_input(inputs)      # Code to pass `inputs` to routine `tf.keras.applications.mobilenet_v2.preprocess_input` to scale input pixels between -1 and 1


x = base_model(x, training=False)      # Code to pass `x` to base model `base_model`. Also set `training` parameter to `False` to keep model in inference mode

x =tf.keras.layers.GlobalAveragePooling2D()(x)     # Code to call `tf.keras.layers.GlobalAveragePooling2D()` passing `x` to convert base model's multi-dimensional output (`base_model.output_shape[1:]`) to vectors

x =tf.keras.layers.Dropout(0.2)(x)    # Code to call `tf.keras.layers.Dropout` passing argument 0.2 to initialize and then pass `x` to initialized class to regularize the network with 20% dropout

outputs = tf.keras.layers.Dense(1)(x)     # Call to `tf.keras.layers.Dense` with initialize a dense layer with 1 unit and then pass `x` to the layer

model =tf.keras.Model(inputs,outputs)     # Create target model calling `tf.keras.Model` passing both `inputs` and `outputs` as arguments

In [17]:
model.summary(show_trainable=True)      # Shows the model summary with both trainable and non-trainable parameters

## Training the Model

### Training Head of the Model

In [18]:
# Compile the model by calling model's `compile` method passing `adam` as `optimizer`, `tf.keras.losses.BinaryCrossentropy(from_logits=True)` as `loss`,
# [tf.keras.metrics.BinaryAccuracy()] as metrics
# Code here

# Call model's `fit` method to fit the model on train set with 2 epochs. 
# Also pass validation set to parameter `validation_data` to measures the learning during training
# Code here
model.compile(optimizer="adam",
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=[tf.keras.metrics.BinaryAccuracy()])

model.fit(train_set, epochs=2 , validation_data=val_set)

Epoch 1/2
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 282ms/step - binary_accuracy: 0.9413 - loss: 0.1381 - val_binary_accuracy: 0.9742 - val_loss: 0.0628
Epoch 2/2
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 239ms/step - binary_accuracy: 0.9684 - loss: 0.0825 - val_binary_accuracy: 0.9785 - val_loss: 0.0551


<keras.src.callbacks.history.History at 0x7265c4552510>

In [19]:
# Evaluates the model performance post head training

post_head_train_perf = model.evaluate(test_set)

print(f"Test set performance after head training:\n \
      Loss: {post_head_train_perf[0]:.2f}, Accuracy: {post_head_train_perf[1] * 100:.2f}%")

[1m73/73[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 217ms/step - binary_accuracy: 0.9785 - loss: 0.0606
Test set performance after head training:
       Loss: 0.06, Accuracy: 97.85%


### Fine-tuning the Model
Unfreezing the base model and training the entire model end-to-end with a low learning rate.

In [20]:
# Code to set (all layers of) base model as not-trainable
base_model.trainable = True


In [21]:
model.summary(show_trainable=True)

In [25]:
# Compile the model by calling model's `compile` method passing `tf.keras.optimizers.Adam(1e-5)` as `optimizer`, 
# `tf.keras.losses.BinaryCrossentropy(from_logits=True)` as `loss`,
# [tf.keras.metrics.BinaryAccuracy()] as metrics
# Code here

# Call model's `fit` method to fit the model on train set with 5 epochs. 
# Also pass validation set to parameter `validation_data` to measures the learning during training
# Code here
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-5),
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics= [tf.keras.metrics.BinaryAccuracy()],
)

model.fit(train_set, epochs = 5 , validation_data= val_set)

Epoch 1/5
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m342s[0m 1s/step - binary_accuracy: 0.9171 - loss: 0.2254 - val_binary_accuracy: 0.9716 - val_loss: 0.0984
Epoch 2/5
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m327s[0m 1s/step - binary_accuracy: 0.9622 - loss: 0.0945 - val_binary_accuracy: 0.9759 - val_loss: 0.0733
Epoch 3/5
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m326s[0m 1s/step - binary_accuracy: 0.9788 - loss: 0.0585 - val_binary_accuracy: 0.9768 - val_loss: 0.0655
Epoch 4/5
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m327s[0m 1s/step - binary_accuracy: 0.9913 - loss: 0.0353 - val_binary_accuracy: 0.9776 - val_loss: 0.0604
Epoch 5/5
[1m291/291[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m328s[0m 1s/step - binary_accuracy: 0.9959 - loss: 0.0216 - val_binary_accuracy: 0.9794 - val_loss: 0.0599


<keras.src.callbacks.history.History at 0x7265c42f0690>

In [26]:
# Evaluates the model performance post fine-tuning the full model

post_full_model_train_perf = model.evaluate(test_set)

print(f"Test set performance after full model training:\n \
      Loss: {post_full_model_train_perf[0]:.2f}, Accuracy: {post_full_model_train_perf[1] * 100:.2f}%")

[1m73/73[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 192ms/step - binary_accuracy: 0.9815 - loss: 0.0617
Test set performance after full model training:
       Loss: 0.06, Accuracy: 98.15%


**Observations:**

- Did the model learn over less data? Explain in detail.

- Why is the reason data augmentation was used for?

- List general workflow of transfer learning followed in this experiment.