<a href="https://colab.research.google.com/github/babupallam/PyTorch-Learning-Repository/blob/main/06_Transfer_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



---



#### **6.1. What is Transfer Learning?**
- **Transfer Learning** is the process of taking a pre-trained model on one task and adapting it to a new, often smaller, but related task.
- This is particularly useful in deep learning, where training models from scratch on large datasets requires enormous computational resources and time.
- In computer vision, common pre-trained models include **ResNet**, **VGG**, and **Inception**, which are often trained on **ImageNet** (a dataset of 1.2 million images across 1,000 classes).

---

**6.1.1. Key Concepts in Transfer Learning**
- **Feature Extraction**: We freeze the convolutional layers of a pre-trained network and only train the fully connected layers on the new task. This approach uses the pre-trained model’s ability to extract features from images.
- **Fine-tuning**: In this approach, we unfreeze some or all of the convolutional layers of the pre-trained model and retrain them along with the fully connected layers to adapt the model to the new dataset.

---



#### **6.2. Pre-trained Models in PyTorch**
- PyTorch provides pre-trained models through `torchvision.models`, allowing easy access to models like ResNet, VGG, and DenseNet, among others.
- We can load a pre-trained model by setting `pretrained=True` when initializing the model.

**Common Pre-trained Models**:
- **ResNet**: A very deep network with residual connections to alleviate vanishing gradient problems.
- **VGG**: A simpler architecture that uses large convolutional filters but lacks residual connections.
- **Inception**: A complex architecture with multiple types of convolutional operations in parallel.
  
---

**6.2.1. Using a Pre-trained Model**
We will now explore how to use a pre-trained model (ResNet18) and apply it to a new classification task using transfer learning. We will demonstrate both **feature extraction** and **fine-tuning** approaches.

---



#### **6.3. Feature Extraction with a Pre-trained Model**
- In this approach, we freeze the convolutional base of the pre-trained model and replace the final fully connected layer to match the number of classes in our new dataset.

---

**6.3.1. Example: Transfer Learning with ResNet18 (Feature Extraction)**

**Demonstration: Feature Extraction using ResNet18**

In [3]:
import torch
import torch.nn as nn  # Import neural network functionalities (layers, loss functions, etc.)
import torch.optim as optim  # Import optimizers like Adam, SGD, etc.
import torchvision  # Provides datasets, pre-trained models, and transforms
import torchvision.transforms as transforms  # Tools for data preprocessing and augmentation
from torchvision import models  # Module to load pre-trained models


In [4]:

# Step 1: Define transformations for the dataset
# Resize the CIFAR-10 images to 224x224 (as ResNet expects this input size) and normalize them
transform = transforms.Compose([
    transforms.Resize(224),  # Resize images to 224x224 (ResNet input size)
    transforms.ToTensor(),  # Convert images to PyTorch tensors (required for model input)
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize image pixels to range [-1, 1]
])


In [5]:

# Step 2: Load CIFAR-10 dataset (small dataset with 60,000 32x32 color images across 10 classes)
# CIFAR-10 is being used as an example for loading a small custom dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)


Files already downloaded and verified
Files already downloaded and verified


In [6]:

# Step 3: Load the dataset into DataLoader for batching and shuffling
# DataLoader helps in loading data in batches, making it easier to handle during training and testing
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)  # Shuffle training data
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)  # No need to shuffle test data


In [7]:

# Step 4: Load a pre-trained ResNet-18 model
# ResNet-18 is a deep neural network pre-trained on ImageNet. Here, we use it for transfer learning.
model = models.resnet18(pretrained=True)  # Load the pre-trained ResNet-18 model


In [8]:

# Step 5: Freeze the convolutional base of ResNet-18
# Freezing means the weights of these layers will not be updated during training, allowing us to only train the new fully connected layer
for param in model.parameters():
    param.requires_grad = False  # Freeze all the layers (prevents weight updates)


In [9]:

# Step 6: Replace the fully connected (fc) layer
# ResNet's default final layer is designed for 1000 classes (ImageNet). We need to replace it with a new fully connected layer for CIFAR-10 (10 classes).
num_ftrs = model.fc.in_features  # Get the number of input features to the original fully connected layer
model.fc = nn.Linear(num_ftrs, 10)  # Replace the original fc layer with a new one for 10 classes


In [12]:

# Step 7: Define the loss function and optimizer
# Use CrossEntropyLoss for classification tasks (combines softmax and negative log likelihood loss)
criterion = nn.CrossEntropyLoss()

# Use Adam optimizer, but only optimize the final layer (since all other layers are frozen)
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)  # Optimize only the new fully connected layer


In [None]:

# Step 8: Training loop (simplified)
epochs = 5  # Number of epochs (how many times the entire dataset is passed through the model)
for epoch in range(epochs):
    running_loss = 0.0  # Initialize running loss for each epoch
    model.train()  # Set model to training mode (enables layers like dropout, batch norm, etc.)

    # Loop over batches of images in the training set
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients from the previous step (avoids accumulation)

        # Forward pass: Get model predictions for the inputs
        outputs = model(inputs)

        # Calculate the loss between model outputs (predictions) and actual labels
        loss = criterion(outputs, labels)

        # Backward pass: Compute gradients for the trainable parameters (final layer)
        loss.backward()

        # Update the weights of the final fully connected layer using the optimizer
        optimizer.step()

        # Accumulate the loss for monitoring
        running_loss += loss.item()

    # Print the average loss for this epoch
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}')

# Print when training is complete
print("Training Complete.")


**Explanation**


1. **Transforms**:
   - The **images are resized** to 224x224 pixels, as ResNet models expect input images of this size (the original CIFAR-10 images are 32x32).
   - The **images are normalized** to the range `[-1, 1]` using the mean and standard deviation of `(0.5, 0.5, 0.5)` for each channel (R, G, B).

2. **Freezing the Convolutional Base**:
   - In transfer learning, the convolutional layers of a pre-trained model (in this case, ResNet-18) are used without updating their weights.
   - **Freezing** these layers means they won’t be modified during training, which speeds up training and reduces the chance of overfitting.

3. **Replacing the Fully Connected Layer**:
   - The **original fully connected (fc) layer** of ResNet-18 is designed for 1000 classes (ImageNet). Since CIFAR-10 has only 10 classes, we **replace the fc layer** with a new layer that outputs predictions for 10 classes.

4. **Training**:
   - The **Adam optimizer** is used to update only the weights of the final fully connected layer, while the rest of the model remains frozen.
   - The model is trained over 5 epochs, and the **loss is printed after each epoch** to monitor training progress.


#### **6.4. Fine-tuning a Pre-trained Model**
- In **fine-tuning**, we allow some or all of the convolutional layers to be retrained, along with the new fully connected layer. This allows the model to adapt its feature extraction capabilities to the new dataset.

---

**6.4.1. Example: Fine-tuning ResNet18**

**Demonstration: Fine-tuning using ResNet18**

In [None]:
# Load a pre-trained ResNet18 model
model = models.resnet18(pretrained=True)  # Load ResNet-18 model pre-trained on ImageNet

# Step 1: Unfreeze the last few layers for fine-tuning
# We will fine-tune the last residual block (layer4) and the fully connected (fc) layer.
for name, param in model.named_parameters():
    if "layer4" in name or "fc" in name:  # Unfreeze parameters of the last residual block (layer4) and the fully connected layer
        param.requires_grad = True  # Allow these layers to be updated during training
    else:
        param.requires_grad = False  # Freeze the rest of the model to prevent weight updates

# Step 2: Replace the fully connected (fc) layer to match CIFAR-10's 10 classes
# ResNet-18's original fully connected layer is designed for 1000 classes (ImageNet).
# We replace it with a new fully connected layer to output predictions for 10 classes (CIFAR-10).
num_ftrs = model.fc.in_features  # Get the number of input features to the fully connected layer
model.fc = nn.Linear(num_ftrs, 10)  # Replace the fully connected layer with a new one for 10 output classes

# Step 3: Define the loss function and optimizer
# Use CrossEntropyLoss, which is suitable for classification problems
criterion = nn.CrossEntropyLoss()

# Use Adam optimizer, but only optimize the unfrozen layers (layer4 and fc)
# 'filter' function is used to pass only the parameters that require gradients to the optimizer
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0001)  # Lower learning rate for fine-tuning

# Step 4: Fine-tuning loop (simplified)
epochs = 5  # Number of epochs to fine-tune the model
for epoch in range(epochs):
    running_loss = 0.0  # Initialize running loss for each epoch
    model.train()  # Set model to training mode (enables dropout, batch norm, etc.)

    # Iterate over batches of images in the training set
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients to avoid accumulation

        # Forward pass: Compute the model outputs for the inputs
        outputs = model(inputs)

        # Calculate the loss between model predictions (outputs) and the true labels
        loss = criterion(outputs, labels)

        # Backward pass: Compute gradients for the unfrozen parameters
        loss.backward()

        # Update the weights of the unfrozen layers (layer4 and fc) using the optimizer
        optimizer.step()

        # Accumulate the running loss for monitoring
        running_loss += loss.item()

    # Print the average loss for this epoch
    print(f'Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}')

# Print when fine-tuning is complete
print("Fine-tuning Complete.")



**Explanation**:
- **Fine-tuning Strategy**: We selectively unfreeze the last block of convolutional layers (`layer4`) and the fully connected layer (`fc`). This allows the model to update its feature extraction for the specific task.
- **Lower Learning Rate**: A smaller learning rate (`0.0001`) is used for fine-tuning to avoid large updates that could destroy the pre-trained knowledge.

---



#### **6.5. Observations on Transfer Learning**
- **Feature Transferability**: The earlier layers in a CNN tend to capture low-level features (e.g., edges, textures) that are applicable to many tasks, while the later layers capture more task-specific features.
- **Efficiency**: Transfer learning significantly reduces the training time and computational resources required compared to training from scratch, especially when working with limited data.
- **State-of-the-Art**: Transfer learning is widely used in various applications, including medical imaging, where pre-trained models are adapted to detect diseases from radiological scans, even with small datasets.

---



#### **6.6. State-of-the-Art Research on Transfer Learning**
- **Pre-trained Models in Natural Language Processing (NLP)**: In NLP, models like **BERT**, **GPT**, and **T5** are pre-trained on massive text corpora and fine-tuned for specific tasks like sentiment analysis or question answering.
- **Self-supervised Learning**: Recent advancements in **self-supervised learning** allow models to learn general-purpose features from unlabeled data, which can then be fine-tuned for various downstream tasks.
- **Model Scaling**: Research in transfer learning has led to the development of large, scalable models like **EfficientNet** and **BigGAN**, which achieve high performance by balancing the width, depth, and resolution of the models.

---



### Continuity to the Next Section
- In the next section, we will explore **Recurrent Neural Networks (RNNs)** and **Long Short-Term Memory Networks (LSTMs)**, which are particularly useful for sequential data such as time series, text, and speech.
  
This section covered the basics of transfer learning, including using pre-trained models for feature extraction and fine-tuning. We leveraged powerful models like ResNet to quickly adapt to new tasks, improving performance with minimal computational effort.