# Training Indoor ResNet-50 Models on Real and Mixed Fire Datasets

This notebook trains ResNet-50 classification models for **indoor fire detection** using two dataset compositions:

- **100% real** images from the PLOS ONE indoor dataset.  
- **50/50 mixed** composition combining PLOS ONE real images with SYN-FIRE synthetic positives.  

The objective is to evaluate best-case indoor performance for deployment scenarios (e.g., factories, warehouses, enclosed environments) and to assess whether supplementing limited real data with synthetic positives improves classification outcomes.

Training is conducted in **frozen feature extraction mode** for comparability with Phase 1 results. Models are validated on the held-out PLOS ONE validation set, and final performance is measured on the PLOS ONE test set. Results are compared against outdoor-trained models from Phases 1–3 to highlight the impact of indoor-focused training.


## Notebook Setup: Mount Drive and Clone GitHub Repository

This cell prepares the Colab environment so the notebook can be reproduced in any new session:

- Mount Google Drive to access datasets, secrets, and checkpoints.  
- Load the GitHub personal access token from Drive.  
- Clone the `fire-detection-dissertation` repository.  
- Navigate into the cloned repository directory.  
- Configure the Git author identity for commits within the session.  

**Note:** This cell must be executed at the start of every new Colab session.


In [17]:
# Notebook setup: mount Google Drive and clone the GitHub repository

# 1. Mount Google Drive
import os
from google.colab import drive
if not os.path.ismount("/content/drive"):
    drive.mount("/content/drive")

# 2. Load the GitHub personal access token from Drive
token_path = "/content/drive/MyDrive/fire-detection-dissertation/secrets/github_token.txt"
with open(token_path, "r") as f:
    token = f.read().strip()

# 3. Clone the GitHub repository (force a fresh copy each session)
username = "Misharasapu"
repo = "fire-detection-dissertation"
clone_url = f"https://{token}@github.com/{username}/{repo}.git"
repo_path = f"/content/{repo}"

# Remove any existing clone to avoid conflicts
!rm -rf {repo_path}

# Clone the repository and move into the directory
%cd /content
!git clone {clone_url}
%cd {repo}

# 4. Configure Git author identity (required in Colab sessions)
!git config --global user.name "Misharasapu"
!git config --global user.email "misharasapu@gmail.com"


/content
Cloning into 'fire-detection-dissertation'...
remote: Enumerating objects: 140, done.[K
remote: Counting objects: 100% (140/140), done.[K
remote: Compressing objects: 100% (103/103), done.[K
remote: Total 140 (delta 69), reused 102 (delta 35), pack-reused 0 (from 0)[K
Receiving objects: 100% (140/140), 4.33 MiB | 32.38 MiB/s, done.
Resolving deltas: 100% (69/69), done.
/content/fire-detection-dissertation


## Step 1: Define Dataset Type, Paths, and Model Filenames

In this step, the Phase 4 indoor training configuration is defined. Only one model is trained at a time, controlled by the `selected_model` variable:

- **indoor_real**: 100% real images from the PLOS ONE dataset (indoor).  
- **indoor_mixed**: 50% PLOS ONE real + 50% SYN-FIRE synthetic positives.  

> **Note:** The SYN-FIRE dataset contains only positive (fire) images, stored in  
> `/synthetic/syn-fire/synthetic_all/images` and `/synthetic/syn-fire/synthetic_all/masks`.  
> It is used exclusively for training augmentation. Validation and test sets remain PLOS ONE only.

Model filenames follow the unified convention (`resnet_<domain>_<composition>_phase4.pt`). A fixed random seed is set for reproducibility.  
An optional `total_samples` cap can be applied to mirror Phase 1 fairness (e.g., 5,260 samples). Leaving it as `None` will use the full PLOS ONE training split.


In [36]:
# Step 1: Dataset types, paths, and filenames (Phase 4: Indoor)

import os
import random
import torch
import numpy as np

# Paths to training/validation/test data (PLOS ONE, indoor focus)
plos_train_img_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/train/images"
plos_train_lbl_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/train/labels"
plos_val_img_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/valid/images"
plos_val_lbl_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/valid/labels"
plos_test_img_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/test/images"
plos_test_lbl_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/test/labels"

# SYN-FIRE (indoor synthetic positives only, used for mixed model)
synfire_img_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/syn-fire/synthetic_all/images"
synfire_mask_dir = "/content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/syn-fire/synthetic_all/masks"

# Model filenames (consistent with naming convention)
model_paths = {
    "indoor_real": "resnet_indoor_real_100_phase4.pt",
    "indoor_mixed": "resnet_indoor_50syn_50real_phase4.pt",
}

# Select which model to train for this run
# Options: "indoor_real" or "indoor_mixed"
selected_model = "indoor_mixed"

# Reproducibility
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

print(f"Phase 4 setup complete. Selected model: {selected_model}")
print("Checkpoint will be saved as:", model_paths[selected_model])
print("PLOS train path:", plos_train_img_dir)
if selected_model == "indoor_mixed":
    print("SYN-FIRE positives path:", synfire_img_dir)


Phase 4 setup complete. Selected model: indoor_mixed
Checkpoint will be saved as: resnet_indoor_50syn_50real_phase4.pt
PLOS train path: /content/drive/MyDrive/fire-detection-dissertation/data/raw/real/PLOS_ONE/train/images
SYN-FIRE positives path: /content/drive/MyDrive/fire-detection-dissertation/data/raw/synthetic/syn-fire/synthetic_all/images


## Step 2: Define Image Transformations

The preprocessing pipeline used for both training and validation performs the following steps:  
- Resize all images to 224×224.  
- Convert images to PyTorch tensors.  

No normalization is applied. All models from Phase 1 to Phase 4 are trained and evaluated on unnormalised images for consistency. This ensures alignment across training, validation, and testing, and also makes visual inspection of images more straightforward.  

No data augmentation is applied. The test set is handled separately in the evaluation notebook.


In [37]:
# Step 2: Image transformations (train and validation only; no augmentation, no normalization)

from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()  # pixel values scaled to [0,1]
])

print("Image transforms set: Resize(224,224) → ToTensor() (no normalization applied)")


Transforms set: Resize(224,224) → ToTensor()  [no normalization]


## Step 3: Load Dataset and Build DataLoaders

The dataset helper functions are imported from the repository, and the **PLOS ONE (indoor)** splits are loaded. Only the training and validation sets are used in this notebook. The test split is reserved for the evaluation notebook.

- **Helper:** `FireClassificationDataset` with `dataset_type="plos"` (maps class 0 → fire).  
- **Splits:** use the existing `train/` and `valid/` directories.  
- **Configuration:** one model per run. If `selected_model="indoor_real"`, only PLOS ONE samples are used. Switching to `"indoor_mixed"` applies the `FireClassificationPhase4MixedFixed` loader (2000 PLOS real + 2000 SYN-FIRE positives) while keeping validation PLOS-only.


In [38]:
# Step 3: Load dataset and build DataLoaders (train + validation only)

import importlib
import utils.fire_classification_dataset as fcd
importlib.reload(fcd)  # ensure the notebook uses the latest version

from torch.utils.data import DataLoader

if selected_model == "indoor_real":
    # PLOS-only (train + validation)
    train_dataset = fcd.FireClassificationDataset(
        image_dir=plos_train_img_dir,
        label_dir=plos_train_lbl_dir,
        transform=transform,
        dataset_type="plos",
    )
else:  # indoor_mixed
    # Fixed 50/50 split: 2000 PLOS real + 2000 SYN-FIRE positives
    train_dataset = fcd.FireClassificationPhase4MixedFixed(
        real_image_dir=plos_train_img_dir,
        real_label_dir=plos_train_lbl_dir,
        syn_image_dir=synfire_img_dir,
        syn_mask_dir=synfire_mask_dir,
        n_real=2000,
        n_syn=2000,
        transform=transform,
        seed=42,
    )

# Validation dataset (always PLOS-only)
val_dataset = fcd.FireClassificationDataset(
    image_dir=plos_val_img_dir,
    label_dir=plos_val_lbl_dir,
    transform=transform,
    dataset_type="plos",
)

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print(f"Selected model: {selected_model}")
print(f"Training samples: {len(train_dataset)} | Validation samples: {len(val_dataset)}")
print(f"Batch size: {batch_size}")


Selected model: indoor_mixed
Train samples: 4000 | Valid samples: 500
Batch size: 32


In [39]:
# Debug: confirm composition of the indoor mixed dataset
if selected_model == "indoor_mixed":
    # Count samples
    n_total = len(train_dataset)
    n_real = len(train_dataset.real_indices)
    n_syn = len(train_dataset.syn_indices)

    print(f"Indoor mixed dataset composition → Real: {n_real}, Synthetic: {n_syn}, Total: {n_total}")

    # Quick label distribution sanity check (first 200 samples only for speed)
    labels = [int(train_dataset[i][1]) for i in range(200)]
    n_fire = sum(labels)
    n_no_fire = 200 - n_fire
    print(f"Label distribution (first 200 samples): Fire={n_fire}, No Fire={n_no_fire}")


📊 Indoor mixed dataset composition → Real: 2000, Synthetic: 2000, Total: 4000
✅ Sample label distribution (first 200): Fire=152, No Fire=48


## Step 4: Build ResNet-50 (Feature Extraction), Define Loss and Optimizer

This step follows the Phase-1 setup: load a pretrained ResNet-50, freeze the backbone, replace the final `fc` layer with a two-class head, and train only that head. The loss function is `CrossEntropyLoss` (two logits), and the optimizer is Adam over the head parameters. The checkpoint filename is derived from `model_paths[selected_model]`.


In [40]:
# Step 4: Model (feature extraction), loss, optimizer

import os
import torch
import torch.nn as nn
from torchvision import models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 1) Pretrained ResNet-50 backbone
model = models.resnet50(pretrained=True)

# 2) Freeze all backbone layers (feature extraction mode)
for p in model.parameters():
    p.requires_grad = False

# 3) Replace classification head with a 2-class layer
in_features = model.fc.in_features  # typically 2048
model.fc = nn.Linear(in_features, 2)

# 4) Ensure the new head is trainable
for p in model.fc.parameters():
    p.requires_grad = True

# 5) Move model to device
model = model.to(device)

# 6) Loss and optimizer (train head only)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-4)

# 7) Checkpoint path
models_dir = "/content/drive/MyDrive/fire-detection-dissertation/models"
os.makedirs(models_dir, exist_ok=True)
checkpoint_path = os.path.join(models_dir, model_paths[selected_model])

# 8) Sanity check
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Device: {device} | Trainable params: {trainable} / {total} (only fc)")
print(f"Checkpoint will be saved to: {checkpoint_path}")




Device: cuda | Trainable params: 4098 / 23512130 (only fc)
Checkpoint will be saved to: /content/drive/MyDrive/fire-detection-dissertation/models/resnet_indoor_50syn_50real_phase4.pt


## Step 5: Train the Model (Feature Extraction Baseline)

Training is executed using the `train_model()` helper from `utils/train_model.py`, which:

- Utilises GPU if available.  
- Tracks training and validation loss each epoch.  
- Saves a checkpoint **only when the validation F1 score improves** to  
  `models/{model_paths[selected_model]}`.  

Training runs for `num_epochs` (default: 5) for consistency with earlier notebooks.


In [None]:
# Step 5: Train using the repository helper

from utils.train_model import train_model

# Epochs (consistent with earlier phases)
num_epochs = 5

train_losses, val_losses = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    num_epochs=num_epochs,
    device=device,
    save_path=checkpoint_path,  # e.g., models/resnet_indoor_real_100_phase4.pt
    print_every=1,              # print one summary per epoch
    print_batch_loss=False      # set True only for debugging
)

print("\nTraining complete.")
print(f"Best model saved to: {checkpoint_path}")



🔍 Model device: cuda:0
📌 Feature extraction mode: Notebook is responsible for freezing/unfreezing.
📊 Trainable parameters: 4098

🔁 Epoch 1/5




✅ Epoch [1/5] | Train Loss: 0.3785 | Val Loss: 0.4209 | Acc: 0.8700 | Precision: 0.8205 | Recall: 0.9931 | F1: 0.8986 | Time: 195.1s
💾 New best model saved (F1: 0.8986) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_indoor_50syn_50real_phase4.pt

🔁 Epoch 2/5




✅ Epoch [2/5] | Train Loss: 0.2057 | Val Loss: 0.3021 | Acc: 0.9120 | Precision: 0.8994 | Recall: 0.9552 | F1: 0.9264 | Time: 195.0s
💾 New best model saved (F1: 0.9264) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_indoor_50syn_50real_phase4.pt

🔁 Epoch 3/5




✅ Epoch [3/5] | Train Loss: 0.1528 | Val Loss: 0.2577 | Acc: 0.9160 | Precision: 0.8974 | Recall: 0.9655 | F1: 0.9302 | Time: 195.2s
💾 New best model saved (F1: 0.9302) → /content/drive/MyDrive/fire-detection-dissertation/models/resnet_indoor_50syn_50real_phase4.pt

🔁 Epoch 4/5


                                                              

✅ Epoch [4/5] | Train Loss: 0.1301 | Val Loss: 0.2266 | Acc: 0.9180 | Precision: 0.9278 | Recall: 0.9310 | F1: 0.9294 | Time: 195.9s

🔁 Epoch 5/5


🚂 Training:  90%|████████▉ | 112/125 [02:52<00:21,  1.67s/it]

In [None]:
# ✅ 1. Navigate to the Git-tracked repo folder
%cd /content/fire-detection-dissertation

# ✅ 2. Move the notebook from Drive into the repo so Git can track it
!mv /content/drive/MyDrive/fire-detection-dissertation/notebooks/07_train_resnet_indoor_models.ipynb /content/fire-detection-dissertation/notebooks/

# Optional: confirm it's now inside the repo
!ls notebooks/

# ✅ 3. Stage the notebook for commit
!git add notebooks/07_train_resnet_indoor_models.ipynb

# ✅ 4. Commit with a message
!git commit -m "Added Phase 4 indoor training notebook (PLOS ONE real + SYN-FIRE mixed, unnormalised pipeline)"

# ✅ 5. Push to GitHub
!git push
