# 🔧 Fine-Tuning ResNet-50 on Real, Synthetic, and Mixed Fire Datasets

This notebook fine-tunes ResNet-50 classification models using three dataset compositions:
- 100% real images from the D-Fire dataset
- 100% synthetic images from the Yunnan UE5-generated dataset
- 50/50 mixed composition using the `FireClassificationMixedDataset` class

The goal is to assess whether fine-tuning improves model performance compared to feature extraction, especially when only limited real data is available. Fine-tuning is performed by unfreezing the final ResNet-50 block (`layer4`) and fully connected layer (`fc`), while keeping earlier layers frozen.

Each model is trained for 5 epochs using the same training loop defined in `train_model.py`, with validation on the held-out D-Fire validation set. Results will be compared directly to their frozen counterparts from Phase 1.


## 📦 Notebook Setup: Mount Drive & Clone GitHub Repo

This cell ensures the notebook is reproducible in any new Colab session by:

- Mounting your Google Drive (to access datasets, secrets, and checkpoints)
- Loading your GitHub token from Drive
- Cloning the fire-detection-dissertation repository
- Navigating into the correct folder
- Setting Git identity for future commits

⚠️ **Note:** This cell must be run every time you open this notebook in a new Colab session.


In [1]:
# 🔧 Minimal Colab setup for any working notebook

# 1. Mount Google Drive
import os
from google.colab import drive
if not os.path.ismount("/content/drive"):
    drive.mount("/content/drive")

# 2. Load GitHub token securely from Drive
token_path = "/content/drive/MyDrive/fire-detection-dissertation/secrets/github_token.txt"
with open(token_path, "r") as f:
    token = f.read().strip()

# 3. Clone the GitHub repo (force fresh clone for safety)
username = "Misharasapu"
repo = "fire-detection-dissertation"
clone_url = f"https://{token}@github.com/{username}/{repo}.git"
repo_path = f"/content/{repo}"

# Optional: Remove old clone (safe to rerun)
!rm -rf {repo_path}

# Clone fresh and move into the repo
%cd /content
!git clone {clone_url}
%cd {repo}

# 4. Set Git identity (required in Colab sessions)
!git config --global user.name "Misharasapu"
!git config --global user.email "misharasapu@gmail.com"


Mounted at /content/drive
/content
Cloning into 'fire-detection-dissertation'...
remote: Enumerating objects: 100, done.[K
remote: Counting objects: 100% (100/100), done.[K
remote: Compressing objects: 100% (78/78), done.[K
remote: Total 100 (delta 45), reused 71 (delta 20), pack-reused 0 (from 0)[K
Receiving objects: 100% (100/100), 4.08 MiB | 9.09 MiB/s, done.
Resolving deltas: 100% (45/45), done.
/content/fire-detection-dissertation


## 🧩 Step 1: Define Dataset Type, Paths, and Model Filenames

In this step, we define the training configuration for each model to be fine-tuned. Three dataset types are used:

- 🔵 `real`: 100% real images (D-Fire)
- 🟠 `synthetic`: 100% synthetic images (Yunnan UE5)
- 🟣 `mixed`: 50% synthetic, 50% real

We also define model filenames dynamically and ensure reproducibility with a fixed random seed. The total number of training samples is kept constant for the mixed dataset (5,260), and filenames are structured to reflect the data composition used during training.


In [None]:
import os
import random
import torch
import numpy as np

# 🔧 Adjustable configuration
syn_ratio = 0.50                     # Used only for the mixed dataset
total_samples = 5260                # Fixed total sample size for mixed dataset

# 🗂️ Paths to training data
real_image_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/D-Fire/train/images"
real_label_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/D-Fire/train/labels"

syn_image_dir  = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/yunnan/synthetic_all/images"
syn_label_dir  = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/yunnan/synthetic_all/labels"

# 🧠 Model filenames based on dataset type
model_paths = {
    "real":    "resnet_real_100_ft.pt",
    "synthetic": "resnet_synthetic_100_ft.pt",
    "mixed":   f"resnet_mixed_{int(syn_ratio*100)}syn_{int((1-syn_ratio)*100)}real_ft.pt"
}

# ✅ Set random seed for reproducibility
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)


## 🧼 Step 2: Define Image Transformations

We define the image transformation pipeline used across all datasets. Images are resized to 224×224, normalized to ImageNet statistics, and converted to PyTorch tensors.

This ensures compatibility with the input format expected by ResNet-50, which was pretrained on ImageNet.


In [None]:
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])


## ⚙️ Step 3: Load Dataset, Pretrained Model, and Configure Fine-Tuning

Based on the selected training mode (`"real"`, `"synthetic"`, or `"mixed"`), we perform the following:

1. Load the appropriate dataset class with the predefined transform:
   - `FireClassificationDataset` for real images (D-Fire)
   - `FireClassificationSyntheticDataset` for synthetic images (Yunnan UE5)
   - `FireClassificationMixedDataset` for a fixed-ratio combination of both

2. Split the full dataset into training and validation subsets using an 80/20 split with a fixed random seed (`seed=42`) to ensure reproducibility.

3. Load the corresponding Phase 1 model (`*_100.pt`), which was trained using feature extraction (frozen base, trained `fc` layer only).

4. Construct the Adam optimizer using only the parameters marked with `requires_grad=True`. Fine-tuning will be handled automatically inside the `train_model()` helper by unfreezing `layer4` and `fc` when `fine_tune=True`.

This setup ensures consistent dataset usage and training conditions across all modes, and enables us to isolate the impact of fine-tuning deeper convolutional layers on classification performance.


In [None]:
from torch.utils.data import DataLoader, random_split
from torchvision import models
from torch import nn, optim
from utils.fire_classification_dataset import (
    FireClassificationDataset,
    FireClassificationSyntheticDataset,
    FireClassificationMixedDataset
)

# 🔘 SELECT WHICH MODEL TO TRAIN
selected_mode = "real"  # Options: "real", "synthetic", "mixed"
print(f"\n📦 Selected mode: {selected_mode.upper()}")

# 🖥️ Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
using_gpu = torch.cuda.is_available()
print(f"🖥️ Using device: {device} ({'GPU enabled' if using_gpu else 'CPU only'})")

# ✅ Training parameters
batch_size = 32
num_epochs = 5
learning_rate = 1e-4
print(f"📌 Training config → Batch size: {batch_size}, Epochs: {num_epochs}, LR: {learning_rate}")

# ✅ Load dataset based on selected mode
if selected_mode == "real":
    print("🔹 Loading 100% real dataset (D-Fire)...")
    full_dataset = FireClassificationDataset(real_image_dir, real_label_dir, transform=transform)
    pretrained_path = "/content/drive/MyDrive/fire-detection-dissertation/models/resnet_real_100.pt"

elif selected_mode == "synthetic":
    print("🔸 Loading 100% synthetic dataset (Yunnan)...")
    full_dataset = FireClassificationSyntheticDataset(syn_image_dir, syn_label_dir, transform=transform)
    pretrained_path = "/content/drive/MyDrive/fire-detection-dissertation/models/resnet_synthetic_100.pt"

elif selected_mode == "mixed":
    print("🟣 Loading 50/50 mixed dataset (Real + Synthetic)...")
    print(f"   → Synthetic ratio: {syn_ratio}, Total samples: {total_samples}")
    full_dataset = FireClassificationMixedDataset(
        real_image_dir, real_label_dir,
        syn_image_dir, syn_label_dir,
        syn_ratio=syn_ratio,
        total_samples=total_samples,
        transform=transform
    )
    pretrained_path = f"/content/drive/MyDrive/fire-detection-dissertation/models/resnet_mixed_{int(syn_ratio*100)}syn_{int((1-syn_ratio)*100)}real.pt"

else:
    raise ValueError("❌ Invalid mode. Choose from: 'real', 'synthetic', or 'mixed'.")

# ✅ Split into train and validation subsets (80/20)
train_ratio = 0.8
train_size = int(train_ratio * len(full_dataset))
val_size = len(full_dataset) - train_size
generator = torch.Generator().manual_seed(42)

train_data, val_data = random_split(full_dataset, [train_size, val_size], generator=generator)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False)

print(f"📊 Dataset split → Train: {train_size}, Validation: {val_size}")

# ✅ Load pretrained model from Phase 1 (CPU-safe)
print(f"\n🔧 Loading pretrained Phase 1 model: {pretrained_path}")
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)
model.load_state_dict(torch.load(pretrained_path, map_location=device))
model = model.to(device)
print("✅ Model loaded and moved to device.")

# ✅ Construct optimizer using only trainable parameters
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate)
print("🛠️ Optimizer configured (Adam, fine-tuning mode).")

# ✅ Set save path for fine-tuned model
save_path = f"/content/drive/MyDrive/fire-detection-dissertation/models/{model_paths[selected_mode]}"
print(f"💾 Fine-tuned model will be saved to: {save_path}")



📦 Selected mode: REAL
🖥️ Using device: cpu (CPU only)
📌 Training config → Batch size: 32, Epochs: 5, LR: 0.0001
🔹 Loading 100% real dataset (D-Fire)...
📊 Dataset split → Train: 13777, Validation: 3445

🔧 Loading pretrained Phase 1 model: /content/drive/MyDrive/fire-detection-dissertation/models/resnet_real_100.pt
✅ Model loaded and moved to device.
🛠️ Optimizer configured (Adam, fine-tuning mode).
💾 Fine-tuned model will be saved to: /content/drive/MyDrive/fire-detection-dissertation/models/resnet_real_100_ft.pt


## 🔁 Step 4: Train the Fine-Tuned Model (layer4 + fc)

In this step, we begin training the selected model using the `train_model()` helper function.

- `fine_tune=True` ensures that both `layer4` and `fc` are unfrozen and trained.
- The model weights are initialized from the Phase 1 checkpoint (trained with feature extraction).
- The model will be trained for 5 additional epochs using the same dataset composition as in Phase 1.
- Results will be printed at the end of each epoch and the best-performing model (by F1 score) will be saved to the specified path.


In [6]:
from utils.train_model import train_model
from torch.nn import CrossEntropyLoss

# ✅ Define loss function
criterion = CrossEntropyLoss()

# ✅ Start training (with fine-tuning)
train_losses, val_losses = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    num_epochs=num_epochs,
    device=device,
    save_path=save_path,
    fine_tune=True,  #  Fine-tune layer4 + fc
    print_every=1,
    print_batch_loss=False
)



🔍 Model device: cpu
🔧 Fine-tuning mode: Unfreezing layer4 and fc...
📊 Trainable parameters: 14968834

🔁 Epoch 1/5




✅ Epoch [1/5] | Train Loss: 0.1364 | Val Loss: 0.0884 | Acc: 0.9710 | Precision: 0.9471 | Recall: 0.9451 | F1: 0.9461 | Time: 21044.6s
💾 New best model saved (F1: 0.9461) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_real_100_ft.pt

🔁 Epoch 2/5




✅ Epoch [2/5] | Train Loss: 0.0546 | Val Loss: 0.1140 | Acc: 0.9663 | Precision: 0.9161 | Recall: 0.9634 | F1: 0.9391 | Time: 3536.8s

🔁 Epoch 3/5




✅ Epoch [3/5] | Train Loss: 0.0279 | Val Loss: 0.1315 | Acc: 0.9669 | Precision: 0.9605 | Recall: 0.9150 | F1: 0.9372 | Time: 3527.7s

🔁 Epoch 4/5




✅ Epoch [4/5] | Train Loss: 0.0226 | Val Loss: 0.1044 | Acc: 0.9724 | Precision: 0.9523 | Recall: 0.9451 | F1: 0.9487 | Time: 3575.8s
💾 New best model saved (F1: 0.9487) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_real_100_ft.pt

🔁 Epoch 5/5


                                                                

✅ Epoch [5/5] | Train Loss: 0.0198 | Val Loss: 0.1169 | Acc: 0.9713 | Precision: 0.9621 | Recall: 0.9300 | F1: 0.9458 | Time: 3717.3s




In [None]:
# ✅ 1. Navigate to the Git-tracked repo folder
%cd /content/fire-detection-dissertation

# ✅ 2. Move the notebook from Drive into the repo so Git can track it
!mv /content/drive/MyDrive/fire-detection-dissertation/notebooks/06_train_resnet_finetuned.ipynb /content/fire-detection-dissertation/notebooks/

# Optional: confirm it's now inside the repo
!ls notebooks/

# ✅ 3. Stage the notebook for commit
!git add notebooks/06_train_resnet_finetuned.ipynb

# ✅ 4. Commit with a message
!git commit -m "Added Phase 2 fine-tuning notebook with modular support for real/synthetic/mixed models
"

# ✅ 5. Push to GitHub
!git push
