# Transer Learning (ResNet-18)
In recent years, increasing the depth of neural networks has proven crucial for enhancing their learning capacity and performance in complex tasks, such as image recognition. However, training increasingly deep networks poses significant technical challenges, particularly related to the propagation of information and gradients during training.

Kaiming He and collaborators stated: "Driven by the significance of depth, a question arises: Is
learning better networks as easy as stacking more layers?" 

This question is precisely what ResNet addresses, a convolutional neural network (CNN) architecture whose number of layers and parameters is indicated in its name. Introduced in 2015 by Microsoft, ResNet offered a solution to the vanishing gradient problem that occurred when adding extra layers, through the use of **skip connections**. The ResNet family includes models of various depths, such as ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152, each adapted to different levels of complexity and learning capacity.

You can find more details in the original [ResNet paper](https://arxiv.org/abs/1512.03385)

### 1. Library Imports for Transfer Learning

For the implementation of the second model, the core PyTorch libraries were utilized. Specifically, the models sub-module from torchvision was imported to access the ResNet18 architecture and its pre-trained weights. Additionally, seaborn was included to enhance the visualization of the final comparison metrics.

In [1]:
# Core deep learning libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models # Crucial for Transfer Learning

# Data handling and processing
import numpy as np
from torchvision import datasets, transforms

# Visualization and Evaluation
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report

# System and environment configuration
import os
import sys

# Aesthetic settings for high-quality plots in GitHub
plt.style.use('seaborn-v0_8-muted') 
plt.rcParams['figure.figsize'] = (10, 6)

### 2. Data Loading and Preprocessing

The system paths were configured to access local modules and the dataloaders were initialized. An image size of 228x228 was selected to balance detail and computational cost.

In [5]:
# Go up one level to reach the project root and enter 'src'
sys.path.append(os.path.abspath(os.path.join('..')))

# Import custom dataloaders
from src.data.dataloaders import get_loaders

# Execute the function to get data loaders
# batch_size=32 is a stable standard for training
train_loader, test_loader = get_loaders(batch_size=32, img_size=228)

print(f"train_loader variable defined: {train_loader is not None}")

train_loader variable defined: True


### 3. ResNet18 Implementation and Customization

The ResNet18 architecture was adapted to serve as the primary model. Since the original model was trained on the ImageNet dataset (RGB images), the input layer was modified to process single-channel grayscale X-rays. Furthermore, the weights of the convolutional base were frozen to prevent their distortion during the initial training phase.

In [6]:
# 1. The pre-trained ResNet18 model was loaded with the latest available weights
weights = models.ResNet18_Weights.DEFAULT
model_tl = models.resnet18(weights=weights)

# 2. Feature Extraction: All pre-trained parameters were frozen
# This ensured that only the new custom layers were trained initially
for param in model_tl.parameters():
    param.requires_grad = False

# 3. Input Layer Adaptation:
# ResNet18 expects 3 channels (RGB), but chest X-rays are grayscale (1 channel).
# The first convolutional layer was replaced to match our data dimensions.
model_tl.conv1 = nn.Conv2d(
    in_channels=1, 
    out_channels=64, 
    kernel_size=7, 
    stride=2, 
    padding=3, 
    bias=False
)

# 4. Classification Head Redesign:
# The original 1000-class output was replaced with a binary classifier 
# (Normal vs Pneumonia) using a Dropout layer to mitigate overfitting.
num_ftrs = model_tl.fc.in_features
model_tl.fc = nn.Sequential(
    nn.Linear(num_ftrs, 256),
    nn.ReLU(),
    nn.Dropout(0.4),
    nn.Linear(256, 2)
)

# Hardware selection (Optimized for Apple M1/M2/M3)
if torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Training on: Apple Silicon GPU (MPS)")
elif torch.cuda.is_available():
    device = torch.device("cuda")
    print("Training on: NVIDIA GPU (CUDA)")
else:
    device = torch.device("cpu")
    print("Training on: CPU")

# 5. Device Migration:
# The model was moved to the Apple Silicon GPU (MPS) for hardware acceleration.
model_tl = model_tl.to(device)

print("ResNet18 model was successfully adapted and migrated to device.")

Training on: Apple Silicon GPU (MPS)
ResNet18 model was successfully adapted and migrated to device.


### 4. Training Loop Definition

A function was created to manage the training process, error calculation, and weight optimization.

In [9]:
def train_model(model, train_loader, test_loader, criterion, optimizer, epochs=10):
    """
    The training loop was implemented to optimize the model parameters 
    and evaluate performance on the test set.
    """
    history = {'train_loss': [], 'test_loss': [], 'train_acc': [], 'test_acc': []}
    
    for epoch in range(epochs):
        # --- TRAINING PHASE ---
        model.train()
        running_loss, correct_train, total_train = 0.0, 0, 0
        
        print(f"\n--- Epoch {epoch+1}/{epochs} ---")
        
        for images, labels in train_loader:
            # Data was moved to the selected device (MPS/CUDA/CPU)
            images, labels = images.to(device), labels.to(device)
            
            # Gradients were reset
            optimizer.zero_grad()
            
            # Forward pass: Predictions were generated
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # Backward pass: Gradients were calculated and weights updated
            loss.backward()
            optimizer.step()
            
            # Training metrics were accumulated
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_train += labels.size(0)
            correct_train += (predicted == labels).sum().item()
            
        epoch_loss = running_loss / len(train_loader)
        epoch_acc = 100 * correct_train / total_train
        
        # --- EVALUATION PHASE ---
        model.eval()
        test_loss, correct_test, total_test = 0.0, 0, 0
        
        # Gradient calculation was disabled for evaluation to save memory
        with torch.no_grad():
            for images, labels in test_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                loss = criterion(outputs, labels)
                
                test_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                total_test += labels.size(0)
                correct_test += (predicted == labels).sum().item()
        
        val_loss = test_loss / len(test_loader)
        val_acc = 100 * correct_test / total_test
        
        # Epoch metrics were stored in history
        history['train_loss'].append(epoch_loss)
        history['test_loss'].append(val_loss)
        history['train_acc'].append(epoch_acc)
        history['test_acc'].append(val_acc)
        
        print(f"Train Loss: {epoch_loss:.4f} | Acc: {epoch_acc:.2f}%")
        print(f"Test  Loss: {val_loss:.4f} | Acc: {val_acc:.2f}%")
        
    return history

### 5. Hyperparameter Setup and Training

The optimization strategy focused on the newly added layers. The Adam optimizer was configured to update only the parameters where requires_grad was set to True.

In [10]:
# The loss function remained CrossEntropyLoss for binary classification
criterion_tl = nn.CrossEntropyLoss()

# Only the modified layers were passed to the optimizer
optimizer_tl = optim.Adam(
    filter(lambda p: p.requires_grad, model_tl.parameters()), 
    lr=1e-4
)

# The training process was executed for 10 epochs
NUM_EPOCHS = 10
history_tl = train_model(
    model_tl, 
    train_loader, 
    test_loader, 
    criterion_tl, 
    optimizer_tl, 
    epochs=NUM_EPOCHS
)


--- Epoch 1/10 ---
Train Loss: 0.4293 | Acc: 79.97%
Test  Loss: 0.4551 | Acc: 77.88%

--- Epoch 2/10 ---
Train Loss: 0.2634 | Acc: 89.74%
Test  Loss: 0.4943 | Acc: 78.69%

--- Epoch 3/10 ---
Train Loss: 0.2153 | Acc: 91.53%
Test  Loss: 0.4082 | Acc: 83.01%

--- Epoch 4/10 ---
Train Loss: 0.1991 | Acc: 91.89%
Test  Loss: 0.4896 | Acc: 80.13%

--- Epoch 5/10 ---
Train Loss: 0.1854 | Acc: 93.17%
Test  Loss: 0.3956 | Acc: 84.13%

--- Epoch 6/10 ---
Train Loss: 0.1726 | Acc: 93.44%
Test  Loss: 0.4152 | Acc: 83.81%

--- Epoch 7/10 ---
Train Loss: 0.1638 | Acc: 93.67%
Test  Loss: 0.5624 | Acc: 79.33%

--- Epoch 8/10 ---
Train Loss: 0.1618 | Acc: 93.79%
Test  Loss: 0.4880 | Acc: 82.53%

--- Epoch 9/10 ---
Train Loss: 0.1580 | Acc: 94.19%
Test  Loss: 0.4658 | Acc: 82.85%

--- Epoch 10/10 ---
Train Loss: 0.1563 | Acc: 94.06%
Test  Loss: 0.4851 | Acc: 82.85%
