# Transfer Learning:


## Introduction

In our previous session, we explored **Convolutional Neural Networks (CNNs)** as a powerful tool for image classification tasks. Specifically, we trained a CNN to classify different types of leaves. Through that project, we gained experience with:
1. Designing CNN architectures from scratch.
2. Training a model with multiple convolutional, pooling, and dense layers.
3. Evaluating the performance of the CNN on unseen data.

While designing and training a CNN from scratch can be effective, it may not always be practical—especially when working with smaller datasets. Training a model from scratch requires a large dataset and significant computational resources.

---

### Objective of This Session
In this session, we will introduce the concept of **transfer learning** and apply it to a new classification task: distinguishing between different types of shoes. We will build on our understanding of CNNs and demonstrate how to leverage pre-trained models to:
1. Fine-tune the network for a new task.
2. Adapt a pre-trained network's final layers to a specific dataset.
3. Optionally, fine-tune earlier layers to improve performance further.

---

### What is Transfer Learning?
**Transfer learning** is a machine learning technique where a model trained on one task is adapted to a related but different task. For example, a model trained on a large dataset like **ImageNet** (which contains millions of images from thousands of categories) can be adapted to classify images from a much smaller dataset, such as our shoe dataset.

In this project, we:
1. Use a pre-trained CNN model as a feature extractor, modifying its final layers to classify shoe types.
2. Fine-tune some earlier layers of the network to adapt them to the new dataset.

---

### Why Use Transfer Learning?
1. **Efficiency**: Leverages the computational power and vast dataset used to train pre-trained models.
2. **Accuracy**: Achieves high performance even with a small dataset by reusing learned features.
3. **Fewer Resources**: Reduces the need for large datasets and extensive training time.

---

### Learning Outcomes
By completing this session, you will:
1. Understand the core principles of transfer learning.
2. Learn to adapt pre-trained models to new datasets by modifying layers.
3. Gain experience fine-tuning earlier layers of a pre-trained model for optimal results.
4. Implement transfer learning in TensorFlow/Keras.

---

### Dataset
The dataset for this session contains images of different types of shoes. The directory structure is organized as follows:
```
Shoes/
├── Train/
│   ├── Heels (40 Samples)/
│   ├── Oxfords (40 Samples)/
├── Test/
│   ├── Heels (9 Samples)/
│   ├── Oxfords (9 Samples)/
```
Each folder represents a category of shoes, and the images are appropriately labeled for training and testing.

---

### Project Workflow
1. **Dataset Preparation**:
   - Use `ImageDataGenerator` to augment and preprocess the images for training and testing.
 
 
2. **Adapting a Pre-Trained Model**:
   - Load a CNN model trained on ImageNet.
   - Replace its final dense layers with new layers for shoe classification.
    
 
3. **Training and Fine-Tuning**:
   - Train the new layers with the shoe dataset.
   - Fine-tune earlier layers to improve performance further.
   
              
4. **Analysis and Discussion**:
   - Analyze the model’s performance.
   - Discuss the impact of fine-tuning earlier layers.

## Loading the Leaf Classifier Model

In [1]:
from tensorflow.keras.models import load_model, Sequential, Model
from tensorflow.keras.layers import Dense
import warnings

warnings.filterwarnings('ignore')

# Load the saved Sequential model
model = load_model('leaf_classifier_model.h5')

# Print the model's architecture
model.summary()

for i, layer in enumerate(model.layers):
    print(f"Layer {i}: {layer.name} - {layer.__class__.__name__} - Trainable: {layer.trainable}")




Layer 0: conv2d - Conv2D - Trainable: True
Layer 1: max_pooling2d - MaxPooling2D - Trainable: True
Layer 2: conv2d_1 - Conv2D - Trainable: True
Layer 3: max_pooling2d_1 - MaxPooling2D - Trainable: True
Layer 4: flatten - Flatten - Trainable: True
Layer 5: dense - Dense - Trainable: True
Layer 6: dense_1 - Dense - Trainable: True


**Quick Reminder**

In our previous session, we trained a CNN model that achieved a **test accuracy of around 93%**. 

## Loading the New Shoes Dataset and Preprocessing

In [2]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator


train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   validation_split=0.2) # Specify 10% for validation

training_set = train_datagen.flow_from_directory('./Shoes/Train',
                                                 target_size = (64, 64),
                                                 batch_size = 8,
                                                 class_mode = 'categorical',
                                                 subset='training')  # Specify that this is the training set

validation_set = train_datagen.flow_from_directory('./Shoes/Train',
                                                   target_size=(64, 64),
                                                   batch_size=8,
                                                   class_mode='categorical',
                                                   subset='validation')  # Specify that this is the validation set

test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('./Shoes/Test',
                                            target_size = (64, 64),
                                            batch_size = 4,
                                            class_mode = 'categorical')

Found 64 images belonging to 2 classes.
Found 16 images belonging to 2 classes.
Found 18 images belonging to 2 classes.


## Adapting the Architecture of the Leaf Identifier Model for Binary Classification

In [3]:
from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Input, Dense
import numpy as np
import pandas as pd

# Reconstruct the model layer by layer
inputs = Input(shape=(64, 64, 3))
x = inputs
for layer in model.layers[:-1]:  # Skip the last layer (Dense with 6 units)
    x = layer(x)

# Add a new layer for transfer learning
# Binary classification
new_output = Dense(units=2, activation='softmax', name='new_dense_output')(x)

# Create a new model
new_model = Model(inputs=inputs, outputs=new_output)

# Compile the modified model
new_model.compile(optimizer='nadam', loss='categorical_crossentropy', metrics=['accuracy'])


# Evaluate the model on the test dataset
test_accuracy = new_model.evaluate(test_set, verbose=0)
print(f'Test Accuracy: {test_accuracy[1]}')

Test Accuracy: 0.5


**The model has not been trained on the shoe dataset yet.**

Let’s evaluate the performance of the leaf identifier model on the shoe dataset **without any additional training**!

In [4]:
# Get class indices to map predictions to actual class names
class_indices = test_set.class_indices
class_labels = {v: k for k, v in class_indices.items()}  # Reverse the mapping

# Initialize a list to store results
results = []

# Iterate through the test set
correct_count = 0
total_samples = 0

for i in range(len(test_set)):
    # Get a batch of test data
    images, labels = test_set[i]
    
    # Predict the labels
    predictions = new_model.predict(images, verbose=0)
    
    # Convert predictions and actual labels to class indices
    predicted_classes = np.argmax(predictions, axis=1)
    actual_classes = np.argmax(labels, axis=1)
    
    # Add results to the list
    for j in range(len(images)):
        predicted_label = class_labels[predicted_classes[j]]
        actual_label = class_labels[actual_classes[j]]
        is_correct = predicted_label == actual_label
        results.append({
            "Sample": total_samples + 1,
            "Predicted Label": predicted_label,
            "Actual Label": actual_label,
            "Correct": is_correct
        })
        if is_correct:
            correct_count += 1
        total_samples += 1
    
    # Break loop if last batch is reached (test_set is an infinite generator)
    if (i + 1) * test_set.batch_size >= len(test_set.filenames):
        break

# Create a DataFrame from the results
df_results = pd.DataFrame(results)

# Save the table to a CSV file
df_results.to_csv("prediction_results.csv", index=False)

# Print a summary
print(f"Total Correct: {correct_count} out of {total_samples}")

df_results

Total Correct: 9 out of 18


Unnamed: 0,Sample,Predicted Label,Actual Label,Correct
0,1,Oxfords,Oxfords,True
1,2,Oxfords,Oxfords,True
2,3,Heels,Heels,True
3,4,Oxfords,Heels,False
4,5,Oxfords,Oxfords,True
5,6,Heels,Oxfords,False
6,7,Oxfords,Heels,False
7,8,Oxfords,Heels,False
8,9,Heels,Heels,True
9,10,Heels,Oxfords,False


## Training the New Layers (Classification Layers)

In [5]:
# Train the modified model on the new dataset
history = new_model.fit(training_set, validation_data=validation_set, epochs=10, verbose=0)

# Evaluate the model on the test dataset
test_accuracy = new_model.evaluate(test_set, verbose=0)
print(f'Test Accuracy: {test_accuracy[1]}')

Test Accuracy: 0.9444444179534912


## Fine-Tuning the Early Layers (Feature Extraction Layers)

In [6]:
from tensorflow.keras.optimizers import Adam

# Unfreeze some of the earlier layers
for layer in new_model.layers[:4]:  # Example: Unfreeze the first 4 layers
    layer.trainable = True

# Recompile the model with a lower learning rate
new_model.compile(optimizer=Adam(learning_rate=1e-5),  # Lower learning rate for fine-tuning
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

# Continue training the model
history_fine_tuning = new_model.fit(training_set, validation_data=validation_set, epochs=10, verbose=0)

# Evaluate the model on the test dataset again
test_accuracy_fine_tuned = new_model.evaluate(test_set,verbose=0)
print(f"Test Accuracy After Fine-Tuning: {test_accuracy_fine_tuned[1]}")


Test Accuracy After Fine-Tuning: 0.9444444179534912


## Why Transfer Learning Works
Transfer learning is possible because of how **Neural Networks** learn and organize information hierarchically across their layers. For example, each layer of a CNN extracts increasingly complex features from the input data:

1. **Early Layers**:
   - Learn to detect **basic patterns** like edges, corners, and textures.
   - These patterns are universal and apply to many types of images (e.g., edges in a leaf image are similar to edges in a shoe or a face image).

2. **Middle Layers**:
   - Combine basic patterns to detect **textures**, **shapes**, and **small objects**.
   - For example, these layers might identify curves, rectangles, or patterns specific to a leaf’s structure or the shape of a shoe.

3. **Later Layers**:
   - Focus on more **task-specific features**, such as combining all learned patterns to classify an entire object.
   - For instance, the later layers of a CNN trained on faces might learn to detect noses, eyes, and mouths, ultimately combining these features to recognize the entire face.

---

### Example: Face Detection
- When training a CNN to detect faces:
  - **First layers** learn to detect edges and corners, which are universal to any image.
  - **Intermediate layers** detect eyes, noses, and mouths (parts of a face).
  - **Final layers** combine these features into a high-level representation of the entire face.

These early layers—focused on basic feature extraction—are transferable because detecting edges, shapes, or textures in one dataset (e.g., ImageNet) applies equally well to another dataset (e.g., shoes or leaves).

---

### How Transfer Learning Leverages Pre-Trained Models
1. **Feature Extraction**:
   - In transfer learning, we reuse the **early layers** of a pre-trained model because they are already excellent at extracting general features like edges and textures.
   - These features are **universal** and not tied to any specific dataset.

2. **Task-Specific Learning**:
   - The **final layers** of a pre-trained model are task-specific. For example:
   - To adapt the model, we replace the final layers (classification layers) with new layers tailored to our specific dataset.
   - These new later layers are trained from scratch or fine-tuned to learn task-specific features (e.g., distinguishing shoe types).

3. **Fine-Tuning**:
   - If needed, we can also fine-tune earlier layers, especially if our dataset is significantly different from the original dataset used for pre-training. This allows the model to adapt its feature extraction to better suit the new task.

---

### Key Insight
Transfer learning works because:
- **Basic features** (edges, textures, shapes) are transferable across datasets.
- **Task-specific features** (e.g., recognizing specific categories) are dataset-dependent and require retraining.
- For most applications, simply replacing and fine-tuning the final layers of a pre-trained model is enough to achieve excellent results.

This saves time, computational resources, and the need for massive datasets.