# Fine-Tuning ResNet-50 on Real, Synthetic, and Mixed Fire Datasets

This notebook fine-tunes ResNet-50 classification models using three dataset compositions:
- **100% real** images from the D-Fire dataset.  
- **100% synthetic** images from the Yunnan dataset (Unreal Engine–generated).  
- **50/50 mixed** composition using the `FireClassificationMixedDataset` class.  

The objective is to assess whether fine-tuning improves performance relative to feature extraction, particularly when the amount of real data is limited. Fine-tuning is performed by unfreezing the final ResNet-50 block (`layer4`) and the fully connected layer (`fc`), while keeping earlier layers frozen.

Each model is trained for five epochs using the reusable `train_model` function from `utils/train_model.py`, with validation on the held-out D-Fire validation set. The results are compared directly against their frozen (Phase 1) counterparts.


## Notebook Setup: Mount Drive and Clone GitHub Repository

This cell prepares the Colab environment so the notebook can be reproduced in any new session:

- Mount Google Drive to access datasets, secrets, and checkpoints.  
- Load the GitHub personal access token from Drive.  
- Clone the `fire-detection-dissertation` repository.  
- Navigate into the cloned repository directory.  
- Configure the Git author identity for commits within the session.  

**Note:** This cell must be executed at the start of every new Colab session.


In [None]:
# Notebook setup: mount Google Drive and clone the GitHub repository

# 1. Mount Google Drive
import os
from google.colab import drive
if not os.path.ismount("/content/drive"):
    drive.mount("/content/drive")

# 2. Load the GitHub personal access token from Drive
token_path = "/content/drive/MyDrive/fire-detection-dissertation/secrets/github_token.txt"
with open(token_path, "r") as f:
    token = f.read().strip()

# 3. Clone the GitHub repository (force a fresh copy each session)
username = "Misharasapu"
repo = "fire-detection-dissertation"
clone_url = f"https://{token}@github.com/{username}/{repo}.git"
repo_path = f"/content/{repo}"

# Remove any existing clone to avoid conflicts
!rm -rf {repo_path}

# Clone the repository and move into the directory
%cd /content
!git clone {clone_url}
%cd {repo}

# 4. Configure Git author identity (required in Colab sessions)
!git config --global user.name "Misharasapu"
!git config --global user.email "misharasapu@gmail.com"


## Step 1: Define Dataset Type, Paths, and Model Filenames

In this step, the training configuration is defined for each model to be fine-tuned. Three dataset types are used:

- `real`: 100% real images (D-Fire).  
- `synthetic`: 100% synthetic images (Yunnan, UE5-generated).  
- `mixed`: 50% real and 50% synthetic images.  

Model filenames are generated dynamically to reflect the dataset composition. For the mixed dataset, the total number of training samples is fixed at 5,260 to ensure comparability across runs. A fixed random seed is also set for reproducibility.


In [None]:
import os
import random
import torch
import numpy as np

# Adjustable configuration
syn_ratio = 0.50          # Used only for the mixed dataset
total_samples = 5260      # Fixed total sample size for mixed dataset

# Paths to training data
real_image_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/D-Fire/train/images"
real_label_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/D-Fire/train/labels"

# Yunnan synthetic data
syn_image_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/yunnan/synthetic_all/images"
syn_label_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/yunnan/synthetic_all/labels"

# Model filenames based on dataset type (aligned with Phase 2 naming convention)
model_paths = {
    "real": "resnet_outdoor_real_100_ft_phase2.pt",
    "synthetic": "resnet_outdoor_synthetic_100_ft_phase2.pt",
    "mixed": f"resnet_outdoor_{int(syn_ratio * 100)}syn_{int((1 - syn_ratio) * 100)}real_ft_phase2.pt",
}

# Set random seed for reproducibility
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)


## Step 2: Define Image Transformations

The image transformation pipeline resizes images to 224×224 and converts them to PyTorch tensors, with pixel values scaled to the [0,1] range.  

No additional normalization is applied, ensuring consistency with the final preprocessing policy adopted across all experiments. This format is compatible with ResNet-50 input requirements.


In [None]:
# Step 2 – Image transformations (applied to both training and validation datasets)

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()  # scale pixel values to [0,1]
])

print("Image transforms set: Resize(224,224) → ToTensor() (no normalization applied)")


Transforms set: Resize(224,224) → ToTensor()  [no normalization]


## Step 3: Load Dataset, Pretrained Model, and Configure Fine-Tuning

Based on the selected training mode (`"real"`, `"synthetic"`, or `"mixed"`), this step:

1. Loads the appropriate dataset with the predefined transform:  
   - `FireClassificationDataset` for real images (D-Fire).  
   - `FireClassificationDataset` with `dataset_type="synthetic"` for Yunnan (UE5).  
   - `FireClassificationMixedDataset` for a fixed-ratio combination of real and synthetic images.

2. Splits the full dataset into training and validation subsets using an 80/20 split with a fixed random seed (`seed=42`) for reproducibility.

3. Loads the corresponding Phase-1 checkpoint (feature extraction; frozen backbone, trained `fc`) using the final naming convention (e.g., `resnet_outdoor_real_100_phase1.pt`).

4. Configures fine-tuning by unfreezing `layer4` and `fc` (all earlier layers remain frozen). The optimizer is constructed over parameters with `requires_grad=True` only.

This setup keeps dataset usage and training conditions consistent across modes and isolates the impact of fine-tuning deeper layers on classification performance.


In [None]:
from torch.utils.data import DataLoader, random_split
from torchvision import models
from torch import nn, optim
import torch
import os

from utils.fire_classification_dataset import (
    FireClassificationDataset,
    FireClassificationMixedDataset
)

# Select which model to train
selected_mode = "real"  # Options: "real", "synthetic", "mixed"
print(f"\nSelected mode: {selected_mode.upper()}")

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
using_gpu = torch.cuda.is_available()
print(f"Using device: {device} ({'GPU enabled' if using_gpu else 'CPU only'})")

# Training parameters
batch_size = 32
num_epochs = 5
learning_rate = 1e-4
print(f"Training config → Batch size: {batch_size}, Epochs: {num_epochs}, LR: {learning_rate}")

# Build dataset by mode
if selected_mode == "real":
    print("Loading 100% real dataset (D-Fire)...")
    full_dataset = FireClassificationDataset(
        image_dir=real_image_dir,
        label_dir=real_label_dir,
        transform=transform,
        dataset_type="real"
    )
    pretrained_path = "/content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_phase1.pt"

elif selected_mode == "synthetic":
    print("Loading 100% synthetic dataset (Yunnan)...")
    full_dataset = FireClassificationDataset(
        image_dir=syn_image_dir,
        label_dir=syn_label_dir,
        transform=transform,
        dataset_type="synthetic"
    )
    pretrained_path = "/content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_synthetic_100_phase1.pt"

elif selected_mode == "mixed":
    print("Loading mixed dataset (Real + Synthetic)...")
    print(f"   → Synthetic ratio: {syn_ratio:.2f}, Total samples: {total_samples}")
    full_dataset = FireClassificationMixedDataset(
        real_image_dir=real_image_dir,
        real_label_dir=real_label_dir,
        syn_image_dir=syn_image_dir,
        syn_label_dir=syn_label_dir,
        syn_ratio=syn_ratio,
        total_samples=total_samples,
        transform=transform
    )
    pretrained_path = (
        f"/content/drive/MyDrive/fire-detection-dissertation/models/"
        f"resnet_outdoor_{int(syn_ratio*100)}syn_{int((1-syn_ratio)*100)}real_phase1.pt"
    )
else:
    raise ValueError("Invalid mode. Choose from: 'real', 'synthetic', or 'mixed'.")

# Split into train / val
train_ratio = 0.8
train_size = int(train_ratio * len(full_dataset))
val_size = len(full_dataset) - train_size
generator = torch.Generator().manual_seed(42)

train_data, val_data = random_split(full_dataset, [train_size, val_size], generator=generator)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False)
print(f"Dataset split → Train: {train_size}, Validation: {val_size}")

# Load Phase-1 checkpoint safely
print(f"\nLoading pretrained Phase 1 model: {pretrained_path}")
if not os.path.exists(pretrained_path):
    raise FileNotFoundError(f"Checkpoint not found at: {pretrained_path}")

# Note: keeping parity with earlier runs that use pretrained ImageNet weights
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)

state = torch.load(pretrained_path, map_location=device)
model.load_state_dict(state)
model = model.to(device)
print("Model loaded and moved to device.")

# Fine-tuning policy (Phase 2): unfreeze only layer4 + fc
print("\nFine-tuning setup → freeze all, then unfreeze layer4 + fc")
for p in model.parameters():
    p.requires_grad = False
for p in model.layer4.parameters():
    p.requires_grad = True
for p in model.fc.parameters():
    p.requires_grad = True

# Quick summary of trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable params: {trainable_params:,} / {total_params:,} "
      f"({100.0 * trainable_params / total_params:.2f}% of model)")

# Optimizer on trainable params only
optimizer = optim.Adam((p for p in model.parameters() if p.requires_grad), lr=learning_rate)
print("Optimizer configured (Adam) on layer4 + fc only — fine-tuning mode active.")

# Save path for the fine-tuned model
save_path = f"/content/drive/MyDrive/fire-detection-dissertation/models/{model_paths[selected_mode]}"
print(f"Fine-tuned model will be saved to: {save_path}")



📦 Selected mode: REAL
🖥️ Using device: cpu (CPU only)
📌 Training config → Batch size: 32, Epochs: 5, LR: 0.0001
🔹 Loading 100% real dataset (D-Fire)...
📊 Dataset split → Train: 13777, Validation: 3445

🔧 Loading pretrained Phase 1 model: /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_phase1.pt




✅ Model loaded and moved to device.

🧩 Fine‑tuning setup → Freeze all, then unfreeze layer4 + fc
🔎 Trainable params: 14,968,834 / 23,512,130 (63.66% of model)
🛠️ Optimizer configured (Adam) on layer4 + fc only — fine‑tuning mode active.
💾 Fine‑tuned model will be saved to: /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_ft_phase2.pt


## Step 4: Train the Fine-Tuned Model (layer4 + fc)

This step starts training using the `train_model()` helper. Fine-tuning has already been configured by unfreezing `layer4` and `fc` in the model setup, and the optimizer was constructed over the parameters with `requires_grad=True`. The weights were initialised from the Phase-1 checkpoint (feature extraction).

Training runs for five epochs on the same dataset composition used in Phase 1, and the best checkpoint is saved to the configured path.


In [7]:
from utils.train_model import train_model
from torch.nn import CrossEntropyLoss

# Define loss function
criterion = CrossEntropyLoss()

# Start training (fine-tuning is already configured by unfreezing layer4 + fc)
train_losses, val_losses = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    num_epochs=num_epochs,
    device=device,
    save_path=save_path,
    print_every=1,
    print_batch_loss=False
)



🔍 Model device: cpu
📊 Trainable parameters: 14968834

🔁 Epoch 1/5




✅ Epoch [1/5] | Train Loss: 0.1366 | Val Loss: 0.0937 | Acc: 0.9689 | Precision: 0.9497 | Recall: 0.9343 | F1: 0.9419 | Time: 16591.3s
💾 New best model saved (F1: 0.9419) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_ft_phase2.pt

🔁 Epoch 2/5




✅ Epoch [2/5] | Train Loss: 0.0499 | Val Loss: 0.1230 | Acc: 0.9608 | Precision: 0.9553 | Recall: 0.8967 | F1: 0.9250 | Time: 4633.5s

🔁 Epoch 3/5




✅ Epoch [3/5] | Train Loss: 0.0321 | Val Loss: 0.1272 | Acc: 0.9704 | Precision: 0.9366 | Recall: 0.9548 | F1: 0.9456 | Time: 4691.5s
💾 New best model saved (F1: 0.9456) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_ft_phase2.pt

🔁 Epoch 4/5




✅ Epoch [4/5] | Train Loss: 0.0221 | Val Loss: 0.1066 | Acc: 0.9724 | Precision: 0.9484 | Recall: 0.9494 | F1: 0.9489 | Time: 4680.4s
💾 New best model saved (F1: 0.9489) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_ft_phase2.pt

🔁 Epoch 5/5




✅ Epoch [5/5] | Train Loss: 0.0145 | Val Loss: 0.1251 | Acc: 0.9733 | Precision: 0.9614 | Recall: 0.9386 | F1: 0.9499 | Time: 4607.4s
💾 New best model saved (F1: 0.9499) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_outdoor_real_100_ft_phase2.pt


In [1]:
# ✅ 1. Navigate to the Git-tracked repo folder
%cd /content/fire-detection-dissertation

# ✅ 2. Move the notebook from Drive into the repo so Git can track it
!cp /content/drive/MyDrive/fire-detection-dissertation/notebooks/06_train_resnet_finetuned.ipynb /content/fire-detection-dissertation/notebooks/

# Optional: confirm it's now inside the repo
!ls notebooks/

# ✅ 3. Stage the notebook for commit
!git add notebooks/06_train_resnet_finetuned.ipynb

# ✅ 4. Commit with a message
!git commit -m "Refactored Phase 2 fine-tuning notebook with unnormalised pipeline and updated dataset handling"

# ✅ 5. Push to GitHub
!git push


[Errno 2] No such file or directory: '/content/fire-detection-dissertation'
/content
cp: cannot stat '/content/drive/MyDrive/fire-detection-dissertation/notebooks/06_train_resnet_finetuned.ipynb': No such file or directory
ls: cannot access 'notebooks/': No such file or directory
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
