# Transfer Learning with Pretrained Models

Transfer learning is a powerful technique in machine learning where a model developed for a particular task is reused as the starting point for a model on a second task. This is especially useful when the second task has limited data. Pretrained models, which have been trained on large datasets, can be fine-tuned for specific applications, leading to improved performance and reduced training time.

## Pretraining Work

Let's start by importing the nessary libraries and specifying some default paths for our training and validation data.

In [1]:
import os

In [None]:
#Specify the paths to the data we want to use and where we want to save our models
BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset"

TEST_PATH = os.path.join(BASE_PATH,'test')
TRAIN_PATH = os.path.join(BASE_PATH,'train')

SAVE_PATH = os.path.join(r"C:\Users\JTWit\Documents\ECE 579","Transfer Learning Models")

#Make the save path for the neural network just in case it does not yet exist
os.makedirs(SAVE_PATH,exist_ok = True)

checkpoint_dir = os.path.join(r"C:\Users\JTWit\Documents\ECE 579",'Training Checkpoints')
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.weights.h5")

# Create the directory if it doesn't exist
os.makedirs(checkpoint_dir, exist_ok=True)

### Building the Models

Next we will specify the required imports for transfer learning for a few select models. The models we will be using for this task are:

- VGG16
- ResNet50
- InceptionV3
- MobileNetV2
- DenseNet121
- EfficientNetB0

In [9]:
from tensorflow.keras import Model
from tensorflow.keras.applications import VGG16,ResNet50,InceptionV3,MobileNetV2,DenseNet121,EfficientNetB0
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, BatchNormalization, Dropout

We can walk through the process of loading these models with pretrained weights and printing their summaries to understand their architectures.

In [8]:
INPUT_SHAPE =  (432,288,3)
CLASSES_COUNT = 10

In [None]:
vgg16 = VGG16(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

resnet50 = ResNet50(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

inceptionv3 = InceptionV3(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

mobilenetv2 = MobileNetV2(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

densenet121 = DenseNet121(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

effnetb0 = EfficientNetB0(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m87910968/87910968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


  mobilenetv2 = MobileNetV2(weights='imagenet', include_top=False, input_shape=INPUT_SHAPE)


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m29084464/29084464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
[1m16705208/16705208[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


We can print the summaries of each model to understand their architectures. Although, I would not reccomend running this cell as it will produce a lot of output due to the scale of the models we are loading.

In [None]:
vgg16.summary()

resnet50.summary()

inceptionv3.summary()

mobilenetv2.summary()

densenet121.summary()

effnetb0.summary()

Let's add all of these models to a dictionary for easy access later on.

In [15]:
models = {
"vgg16":vgg16,
"resnet50":resnet50,
"inceptionv3":inceptionv3,
"mobilenetv2":mobilenetv2,
"densenet121":densenet121,
"effnetb0":effnetb0
}

We can modify the final layers of each model to adapt them to our specific classification task. This typically involves removing the top layer and adding new layers that match the number of classes in our dataset. Let's accomplish this for each model.

In [12]:
UNFROZEN_LAYERS = 10

In [28]:
for key, base_model in models.items():
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(256, activation='relu')(x)
    x = BatchNormalization()(x)
    x = Dropout(0.3)(x)
    prediction_layer = Dense(CLASSES_COUNT, activation='softmax')(x)
    model = Model(inputs=base_model.inputs, outputs=prediction_layer)
    
    # Freeze all layers except last UNFROZEN_LAYERS layers
    for layer in model.layers[:-UNFROZEN_LAYERS]:
        layer.trainable = False

    models[key] = model

### Training Data Generation

In [10]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [17]:
def get_train_and_validation_data(training_path, training_options, validation_split=0.2):
    # Create an ImageDataGenerator with validation_split
    datagen = ImageDataGenerator(
        rotation_range=training_options["rotation_range"],
        width_shift_range=training_options["width_shift_range"],
        height_shift_range=training_options["height_shift_range"],
        brightness_range=training_options["brightness_range"],
        rescale=1./255,  # Important for scaling pixel values
        validation_split=validation_split
    )

    # Training generator
    train_generator = datagen.flow_from_directory(
        training_path,
        target_size=training_options["target_size"],
        batch_size=training_options["batch_size"],
        class_mode='categorical',
        subset='training'
    )

    # Validation generator
    validation_generator = datagen.flow_from_directory(
        training_path,
        target_size=training_options["target_size"],
        batch_size=training_options["batch_size"],
        class_mode='categorical',
        subset='validation'
    )

    return train_generator, validation_generator

Now that we have the train and validation generators set up we can configure them for our transfer learning task.

In [24]:
TARGET_SIZE = (432,288)

In [25]:
#Data generators
train_options = {
    "rotation_range": 0,           # Slightly reduced
    "width_shift_range": 0.0,      # Up to 5% shift (1.5 pixels for 30x30)
    "height_shift_range": 0.,     # Up to 5% shift
    "brightness_range": (1, 1), # Gentle brightness adjustment
    "target_size": TARGET_SIZE,
    "batch_size": 16
}

train_gen,valid_gen = get_train_and_validation_data(TRAIN_PATH,train_options)

Found 640 images belonging to 10 classes.
Found 159 images belonging to 10 classes.


## Performing Transfer Learning 

Let's start by importing the nessicary libraries to perform the transfer learning task.

In [20]:
import os
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from datetime import datetime as datetime

Next,we will specify some of the hyperparameters for our training process, such as the number of epochs, batch size, and learning rate.

In [19]:
LEARNING_RATE = 1e-5
EPOCHS = 30

TARGET_SIZE = (432,288)

NETWORK_NAME = "GTZAN Custom DNN"

Now that the hyperparameters are set, we can define the loss function, optimizer, and evaluation metrics for our models.

In [29]:
for key in models.keys():
    models[key].compile(
        loss='categorical_crossentropy', 
        optimizer=Adam(learning_rate=LEARNING_RATE), 
        metrics=['accuracy'])

The final step before we can peform our transfer learning is to specify some callbacks to help with the training process.

In [22]:
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2, 
    patience=2)

earlystop = EarlyStopping(
    monitor='val_acc',
    mode="max", 
    patience=3)

checkpoint = ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True,  
    monitor='val_loss',      
    save_best_only=False,    
    verbose=1                
)


callbacks = [reduce_lr,earlystop] 

Finally we are in a place where we can peform transfer learning on our selected models. We will loop through each model.

In [None]:
for key in models.keys():

    print("-"*100)
    print(f"Now training: {key}")
    print("-"*100)

    history = models[key].fit(train_gen, validation_data=valid_gen, epochs=EPOCHS, callbacks = callbacks)
        
    accuracy = history.history['accuracy'][-1]
    date_str = datetime.today().strftime('%Y-%m-%d')
    name_string = f"{key} (accuracy = {accuracy:.4f})(date = {date_str}).keras"

    save_file = os.path.join(SAVE_PATH,name_string)

    models[key].save(save_file)

## Evaluating Model Performance

Similar to the DNN / CNN based models, we can evaluate the performance of our transfer learned models using various metrics and visualizations. 

We will start by defining a function to get a test data generator similar to our training and validation generators.

In [None]:
def get_test_data(test_path, testing_options):
    test_datagen = ImageDataGenerator(
        rotation_range=testing_options["rotation_range"],
        width_shift_range=testing_options["width_shift_range"],
        height_shift_range=testing_options["height_shift_range"],
        brightness_range=testing_options["brightness_range"],
        rescale=1./255
    )
    test_generator = test_datagen.flow_from_directory(
        test_path,
        target_size=testing_options["target_size"],
        batch_size=testing_options["batch_size"],
        class_mode='categorical',
    )
    return test_generator

In [None]:
test_options = {
    "rotation_range": 0,              
    "width_shift_range": 0,
    "height_shift_range": 0,
    "brightness_range": (1, 1),       
    "target_size": TARGET_SIZE,
    "batch_size": 16
}

test_gen = get_test_data(TEST_PATH,test_options)

Now that we have a test generator, we can evaluate each of our transfer learned models on the test dataset. This will give us an unbiased estimate of how well our models perform on unseen data.

In [None]:
for key,model in models.items():
    print('-'*100)
    print(f"Model: {key}")
    print('-'*100)

    results = model.evaluate(test_gen)
    print('Test loss:', results[0])
    print('Test accuracy:', results[1])

Model: vgg16
Model: resnet50
Model: inceptionv3
Model: mobilenetv2
Model: densenet121
Model: effnetb0
