# Image Detection with Deep Learning

Welcome to the **BILD 2025 Summer School** hands-on session on object detection! This notebook will guide you through the complete pipeline of building a model to locate signs of pneumonia in chest X-rays.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/albarqounilab/BILD-Summer-School/blob/main/notebooks/day1/detection_exercise.ipynb)

![BILD Banner](https://raw.githubusercontent.com/albarqounilab/BILD-Summer-School/refs/heads/main/images/helpers/notebook-banner.png)

---

### Today's Goals

This session is a practical journey into the world of medical object detection. By the end of this notebook, you will be able to:

1.  **Prepare Medical Imaging Data**: Load and process complex medical images in the DICOM format and parse their corresponding bounding box annotations.
2.  **Understand Detection Data Pipelines**: Create a custom PyTorch `Dataset` and `DataLoader` specifically for object detection.
3.  **Train a Detector Model**: Fine-tune a state-of-the-art **Faster R-CNN** model to draw precise boxes around infected lung areas.
4.  **Master Detection Metrics**: Use and interpret key evaluation metrics like **Intersection over Union (IoU)** and **Average Precision (AP)**.
5.  **Perform Advanced Quality Control**: Go beyond a single number to systematically analyze your model's failure modes.

*   **Objectives**: You'll see how AI can be trained to identify and localize pathological findings, a crucial step in computer-assisted diagnosis. You'll also apply your object detection skills to a challenging real-world problem using medical imaging data.

## 1. Environment Setup

First, we'll prepare our environment. This involves installing necessary packages for handling large files and medical images, cloning the dataset, and importing all the Python libraries we'll need.

> **Note**: This first code block will install several packages and download a large dataset. This may take a few minutes and requires an internet connection.

In [None]:
# Library Installations
!pip install pydicom albumentations -q

# Core Library Import
import os
# os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import sys, math, random, time, warnings
from glob import glob
from pathlib import Path

# Data Handling and Processing
import numpy as np
import pandas as pd
import cv2
import pydicom
from PIL import Image

# Deep Learning with PyTorch & Torchvision
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision import tv_tensors
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.utils import draw_bounding_boxes

# Data Augmentation & Visualization
import albumentations as A
from albumentations.pytorch import ToTensorV2
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Utilities
from tqdm.notebook import tqdm
# Download official torchvision helper scripts for training and evaluation
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")
import utils

# Environment Configuration
warnings.filterwarnings("ignore")

# Ensure reproducibility
def seed_everything(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

RANDOM_SEED = 42
seed_everything(RANDOM_SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Setup complete. Using device: {device}")

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/2.4 MB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.7/2.4 MB[0m [31m10.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.4/2.4 MB[0m [31m26.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. The Dataset: RSNA Pneumonia Detection Challenge

Our task is to detect signs of pneumonia in chest X-rays. We are using a dataset from the **RSNA Pneumonia Detection Challenge**, which contains nearly 30,000 chest X-ray images in the DICOM format. A subset of these images has been annotated by expert radiologists with bounding boxes indicating areas of lung opacity, which are a sign of pneumonia.

### Downloading the Data (2 minutes)
> ~1.8GB; may take a few minutes. If you're have the local data, skip.

In [None]:
!pip -q install -U "huggingface_hub[cli]" -q
!hf download albarqouni/bild-dataset --repo-type dataset --include "Detection/rsna-pneumonia-detection-challenge.zip" --local-dir ./
!unzip -q ./Detection/rsna-pneumonia-detection-challenge.zip -d ./

### 2.1. Data Exploration & Quality Control

Before we build any model, we must first understand our data. If we train our model on noisy, inconsistent, or poorly understood data, we cannot trust its predictions.

This process, known as **Exploratory Data Analysis (EDA)**, is our first quality control step. We will:
1.  Analyze the class balance: How many patients have pneumonia versus how many do not?
2.  Inspect the annotations: How many bounding boxes are there per image?
3.  Visualize a sample: We will perform a "sanity check" to ensure our data is loading correctly and the labels align with the images.

We'll start by loading the labels file, which contains the patient IDs and bounding box information for each positive case.

In [None]:
# Define paths to our data files
DATA_PATH = './rsna-pneumonia-detection-challenge'
LABELS_PATH = os.path.join(DATA_PATH, 'labels.csv')
IMAGE_DIR = os.path.join(DATA_PATH, 'images')

# Load the labels into a pandas DataFrame
df = pd.read_csv(LABELS_PATH)

# Initial Analysis
# Calculate the number of unique patients with and without pneumonia
positive_cases = df[df['Target'] == 1]['patientId'].nunique()
total_cases = df['patientId'].nunique()
negative_cases = total_cases - positive_cases

print(f"--- Dataset Overview ---")
print(f"Total unique patients: {total_cases}")
print(f"Patients with pneumonia (positive cases): {positive_cases} ({positive_cases/total_cases:.1%})")
print(f"Patients without pneumonia (negative cases): {negative_cases} ({negative_cases/total_cases:.1%})")

# Check how many pneumonia cases have multiple bounding boxes
bbox_counts = df[df['Target'] == 1].groupby('patientId').size()
print(f"Pneumonia cases with >1 bounding box: {(bbox_counts > 1).sum()}")

> ### Deep Dive: The DICOM Format
>
> **DICOM** (Digital Imaging and Communications in Medicine) is the international standard for medical images and related information. It's more than just an image format like JPEG or PNG. A DICOM file is a complex container that holds not only the pixel data of the image but also a rich set of **metadata**.
>
> This metadata includes:
> -   **Patient Information**: Name, ID, age, sex (often anonymized in public datasets).
> -   **Study Information**: What kind of study was performed (e.g., Chest X-ray), the date, and referring physician.
> -   **Image Acquisition Details**: The type of machine used (e.g., scanner model), exposure settings, pixel spacing (the physical size of a pixel), and image orientation.
>
> For our task, the most important part is the **pixel array**, which contains the actual image data. We use the `pydicom` library to easily read these files and extract this pixel data for our model.

#### Sanity Check: Visualizing a Single Case

Now for a crucial sanity check. We will load a single DICOM image and use the coordinates from our labels file to draw its corresponding bounding box.

This simple step is incredibly important for debugging. It visually confirms that our entire data pipeline is working as expected:
-   Are we reading the DICOM files correctly?
-   Are we parsing the CSV file properly?
-   Do the `(x, y, width, height)` coordinates from the file actually correspond to the pneumonia opacity visible in the image?

Catching a mistake here can save hours of confusion later.

In [None]:
# Visualize a Sample
# Select a sample row for a patient with pneumonia
sample_patient = df[df.Target == 1].iloc[13]
image_path = os.path.join(IMAGE_DIR, f"{sample_patient.patientId}.dcm")

# Read the DICOM file
dicom_data = pydicom.dcmread(image_path)
image_array = dicom_data.pixel_array

# Get the bounding box coordinates from the DataFrame
box = sample_patient[['x', 'y', 'width', 'height']].values

# Plotting
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.imshow(image_array, cmap='gray')
ax.set_title(f"Patient ID: {sample_patient.patientId}")

# Create a Rectangle patch and add it to the plot
rect = patches.Rectangle(
    (box[0], box[1]), box[2], box[3],
    linewidth=2, edgecolor='r', facecolor='none'
)
ax.add_patch(rect)
ax.axis('off')
plt.show()

#### Analyzing Bounding Box Characteristics
Now that we've seen a single case, let's analyze the properties of *all* the bounding boxes in the dataset. Understanding the distribution of their sizes and shapes can reveal important characteristics of our data and potential challenges for our model.

-   **Area Distribution:** Are most pneumonia findings large and obvious, or small and subtle? If the dataset is dominated by very small boxes, our model might struggle to detect them.
-   **Aspect Ratio (Width / Height):** Are the findings generally square-shaped, or are they often very tall and thin, or short and wide? Extreme aspect ratios can sometimes be challenging for standard object detectors.

Let's plot these distributions to find out.

### Q1: What can we learn from the bounding box distributions?
**Your Task**: Look at the two histograms below.
1. The first shows the distribution of bounding box **areas**.
2. The second shows the distribution of **aspect ratios** (width / height).

What do these plots tell you about the pneumonia findings in this dataset? How might this information influence your model design or training strategy?

In [None]:
# Analyze Bounding Box Sizes and Aspect Ratios
positive_df = df[df['Target'] == 1].copy()
positive_df['area'] = positive_df['width'] * positive_df['height']
positive_df['aspect_ratio'] = positive_df['width'] / positive_df['height']

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot Area Distribution
axes[0].hist(positive_df['area'], bins=50, color='skyblue', edgecolor='black')
axes[0].set_title('Distribution of Bounding Box Areas')
axes[0].set_xlabel('Area (pixels^2)')
axes[0].set_ylabel('Frequency')

# Plot Aspect Ratio Distribution
axes[1].hist(positive_df['aspect_ratio'], bins=50, color='salmon', edgecolor='black')
axes[1].set_title('Distribution of Bounding Box Aspect Ratios (W/H)')
axes[1].set_xlabel('Aspect Ratio')
axes[1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

<details>
<summary>Click for Solution & Discussion</summary>

**Area Distribution**: The area histogram is heavily skewed to the left, with a long tail. This means that while there is a wide range of sizes, the vast majority of pneumonia patches are relatively **small**. This confirms our earlier suspicion that detecting small objects will be a key challenge for our model.

**Aspect Ratio Distribution**: This histogram is centered around 1.0, indicating that most bounding boxes are roughly square-shaped. However, there is a noticeable tail to the right, showing that some findings are significantly wider than they are tall. There are very few tall and thin boxes.

**Influence on Strategy**: This information is very useful. The prevalence of small objects reinforces the need for a model with a Feature Pyramid Network (like Faster R-CNN) that is good at multi-scale detection. The aspect ratio distribution could guide the design of anchor boxes in some models, though modern detectors are less sensitive to this. It also suggests that data augmentations that create more varied aspect ratios might be beneficial.
</details>



> 10 minutes



### 2.2. The `DetectionDataset` Class

To feed our data into a PyTorch model, we need a custom `Dataset` class. This class acts as a blueprint, telling PyTorch how to access and process each individual sample.

> ### Deep Dive: The Target Dictionary for Object Detection
>
> Unlike classification (which needs a single label) or segmentation (which needs a mask), object detection models in PyTorch require a specific dictionary format for each image's ground truth. This `target` dictionary must contain several key pieces of information:
>
> -   `boxes`: A tensor of shape `[N, 4]`, where `N` is the number of objects in the image. Each of the 4 values represents a bounding box in **[x_min, y_min, x_max, y_max]** format (also known as PASCAL_VOC format).
> -   `labels`: A tensor of shape `[N]` containing the integer class label for each box. In our case, `1` will represent "pneumonia".
> -   `image_id`: A unique identifier for the image.
> -   `area`: The area of each bounding box.
> -   `iscrowd`: A boolean tensor indicating if any boxes represent a "crowd" of objects (we'll set this to `0` for all boxes).
>
> Our `__getitem__` method is carefully constructed to load an image and build this precise target dictionary for it.

### Q2 proposal: Complete the Dataset's `__getitem__` Method

Why only positive cases?

To feed our data into a PyTorch model, we need a custom `Dataset` class. This class acts as a blueprint, telling PyTorch how to access and process each individual sample. For this detection task, we will focus only on the images that contain pneumonia (`Target == 1`), as they are the ones with bounding boxes to learn from.

> **TODO**: The `__getitem__` method is the core of our data pipeline. It's responsible for loading a single image and formatting its corresponding labels correctly.
>
> Fill in the `...` placeholders in the code below to complete the following steps:
> 1.  **Get Bounding Boxes:** Extract the `x`, `y`, `width`, and `height` values for the current `image_id` from the dataframe.
> 2.  **Convert Box Format:** The raw data is in `[x, y, width, height]` format. Convert it to the `[x_min, y_min, x_max, y_max]` format that PyTorch models expect.
> 3.  **Construct Target Dictionary:** Create the `target` dictionary with all the required keys (`boxes`, `labels`, `image_id`, etc.), making sure they are the correct data types (PyTorch tensors).

In [None]:
class PneumoniaDataset(Dataset):
    def __init__(self, dataframe, image_dir, transforms=None):
        super().__init__()
        self.df = dataframe[dataframe['Target'] == 1].copy()
        self.image_ids = self.df['patientId'].unique()
        self.image_dir = image_dir
        self.transforms = transforms

    def __getitem__(self, index: int):
        # Image Loading and Preprocessing
        image_id = self.image_ids[index]
        image_path = os.path.join(self.image_dir, f"{image_id}.dcm")

        dicom_data = pydicom.dcmread(image_path)
        image = dicom_data.pixel_array

        # Convert to a 3-channel image for compatibility with torchvision models
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

        # Q1: FILL IN THE ...

        # Get all annotation records for the current image_id
        records = ...
        boxes = ...

        # Convert boxes from [x, y, w, h] to [x_min, y_min, x_max, y_max]
        boxes[:, 2] = ... # x_max
        boxes[:, 3] = ... # y_max

        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

        # Create the target dictionary
        target = {
            'boxes': ..., # Keep as numpy array for albumentations initially
            'labels': ..., # Label 1 for pneumonia
            'image_id': ...,
            'area': ...,
            'iscrowd': ...,
        }

        # Data Augmentation
        if self.transforms:
            # Convert labels tensor to numpy array before passing to albumentations
            labels_np = target['labels'].numpy()

            # Pass numpy array for boxes and labels to albumentations
            sample = self.transforms(image=image, bboxes=target['boxes'], labels=labels_np)
            image = sample['image']

            # Handle the case where augmentations remove all boxes
            if len(sample['bboxes']) > 0:
                target['boxes'] = torch.as_tensor(sample['bboxes'], dtype=torch.float32)
                # Convert labels back to tensor
                target['labels'] = torch.as_tensor(sample['labels'], dtype=torch.int64)
                target['area'] = (target['boxes'][:, 3] - target['boxes'][:, 1]) * (target['boxes'][:, 2] - target['boxes'][:, 0])
                target['iscrowd'] = torch.zeros(len(target['boxes']), dtype=torch.int64)
            else:
                # If all boxes are removed, return empty tensors
                target['boxes'] = torch.zeros((0, 4), dtype=torch.float32)
                target['labels'] = torch.zeros((0,), dtype=torch.int64)
                target['area'] = torch.zeros((0,), dtype=torch.float32)
                target['iscrowd'] = torch.zeros((0,), dtype=torch.int64)


            # Ensure canvas_size is set for the BoundingBoxes tensor
            if len(target['boxes']) > 0:
                 target['boxes'] = tv_tensors.BoundingBoxes(target['boxes'], format="XYXY", canvas_size=image.shape[-2:])
            else:
                 # Still need a BoundingBoxes tensor even if empty, with correct canvas_size
                 target['boxes'] = tv_tensors.BoundingBoxes(torch.zeros((0, 4), dtype=torch.float32), format="XYXY", canvas_size=image.shape[-2:])


        return image, target

    def __len__(self) -> int:
        return len(self.image_ids)

### 2.3. Data Augmentation

Data augmentation is a critical technique for improving model performance, especially in medical imaging where datasets can be limited. By applying random transformations to our training images, we create new, plausible training examples. This helps the model become more robust and prevents it from overfitting.

We will use the `albumentations` library, which is highly efficient and designed for tasks like object detection where the bounding boxes must be transformed along with the image.

In [None]:
# Define the augmentation pipeline
def get_transforms(is_train=True, target_size=256):
    if is_train:
        return A.Compose([
            A.HorizontalFlip(p=0.5),
            A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.1, rotate_limit=10, p=0.5),
            A.RandomBrightnessContrast(p=0.5),
            # Resize must be after geometric transforms that change coordinates
            A.Resize(height=target_size, width=target_size, p=1.0),
            A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ToTensorV2(p=1.0)
        ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'], min_area=1, min_visibility=0.1))
    else: # For validation and testing, we only resize and normalize
        return A.Compose([
            A.Resize(height=target_size, width=target_size, p=1.0),
            A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ToTensorV2(p=1.0)
        ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

print("Augmentation pipelines defined.")

### 2.4. Data Splitting and `DataLoader` Creation

We'll now split our list of patient IDs into training, validation, and test sets. It is crucial to split by patient to avoid data leakage. We'll then create our `Dataset` and `DataLoader` instances for each split.

> ### Deep Dive: The `collate_fn`
>
> When creating a batch of data, the `DataLoader` usually stacks the individual tensors together. However, this doesn't work for object detection targets, because each image can have a *different number* of bounding boxes. You can't stack a tensor of 2 boxes with a tensor of 3 boxes.
>
> The `collate_fn` (collation function) is a special function we provide to the `DataLoader` to tell it how to handle this. Instead of trying to stack the target dictionaries, `utils.collate_fn` from torchvision simply gathers them into a list. The result of one batch is a list of image tensors and a list of target dictionaries.

### Q3: Why is a `collate_fn` necessary for this task?
**Your Task**: In the `DataLoader` definitions below, we pass `collate_fn=utils.collate_fn`. Based on the Deep Dive above and your understanding of the data, explain in one or two sentences why this is necessary. What would happen if we removed it?



> 25 minutes



In [None]:
from sklearn.model_selection import train_test_split

TARGET_SIZE = 256  #@param {type: "number"}
BATCH_SIZE = 2  #@param {type: "number"}

# Patient-wise Data Splitting
# We focus on the positive pneumonia cases for this detection task
positive_patient_ids = df[df['Target'] == 1]['patientId'].unique()

# Split positive cases: 70% train, 15% validation, 15% test
train_ids, test_ids = train_test_split(positive_patient_ids, test_size=0.3, random_state=RANDOM_SEED)
val_ids, test_ids = train_test_split(test_ids, test_size=0.5, random_state=RANDOM_SEED)

# Create DataFrames for each split
train_df = df[df['patientId'].isin(train_ids)]
val_df = df[df['patientId'].isin(val_ids)]
test_df = df[df['patientId'].isin(test_ids)]

print(f"--- Data Splits (Positive Cases Only) ---")
print(f"Training samples: {len(train_ids)}")
print(f"Validation samples: {len(val_ids)}")
print(f"Test samples: {len(test_ids)}")

# Create Datasets
train_dataset = PneumoniaDataset(train_df, IMAGE_DIR, transforms=get_transforms(is_train=True, target_size=TARGET_SIZE))
val_dataset = PneumoniaDataset(val_df, IMAGE_DIR, transforms=get_transforms(is_train=False, target_size=TARGET_SIZE))
test_dataset = PneumoniaDataset(test_df, IMAGE_DIR, transforms=get_transforms(is_train=False, target_size=TARGET_SIZE))

# Create DataLoaders
train_loader = DataLoader(
    train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=...
)
val_loader = DataLoader(
    val_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=...
)
test_loader = DataLoader(
    test_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=...
)

print("\nDatasets and DataLoaders created successfully.")

<details>
<summary>Discussion</summary>

The `collate_fn` is necessary because different images in our dataset have a different number of pneumonia bounding boxes. PyTorch's default collate function would fail because it wouldn't know how to stack target dictionaries with varying numbers of boxes into a single tensor.

If we removed it, the `DataLoader` would raise an error as soon as it tried to create a batch containing images with different numbers of annotations.
</details>

Let's visualize a batch from our `train_loader` to confirm that the augmentations are being applied correctly.

In [None]:
# Visualization Helper
def show_detection_batch(dataloader, n=4):
    images, targets = next(iter(dataloader))
    k = min(n, len(images))

    for i in range(k):
        # Un-normalize image for display
        img = images[i].permute(1, 2, 0).numpy()
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        img = std * img + mean
        img = np.clip(img, 0, 1)

        # Plotting
        fig, ax = plt.subplots(1, 1, figsize=(6, 6))
        ax.imshow(img)
        ax.set_title(f"Sample {i+1}, Boxes: {len(targets[i]['boxes'])}")

        for box in targets[i]['boxes']:
            x1, y1, x2, y2 = box.numpy()
            rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=2, edgecolor='g', facecolor='none')
            ax.add_patch(rect)
        ax.axis('off')
        plt.show()

print("--- Sample Augmented Training Batch ---")
show_detection_batch(train_loader, n=4)



> 25 minutes



## 3. The Model: Faster R-CNN

For our detection task, we will use **Faster R-CNN (Region-based Convolutional Neural Network)**, a powerful and widely-used two-stage object detection model.

> ### Deep Dive: How Faster R-CNN Works
>
> Faster R-CNN breaks down the complex task of object detection into two manageable stages:
>
> 1.  **Region Proposal Network (RPN)**: Instead of scanning every possible location in an image, the RPN efficiently proposes a set of rectangular regions that are likely to contain an object. It acts as a quick "attention" mechanism, telling the model where to look more closely.
>
> 2.  **Detection Network (Fast R-CNN Head)**: For each proposed region, this network performs two tasks:
>     -   **Classification**: It determines the class of the object within the region (e.g., "pneumonia" or "background").
>     -   **Regression**: It refines the coordinates of the bounding box to make it fit the object more tightly.
>
> We use a model from `torchvision` that has been **pre-trained** on the [COCO dataset](https://cocodataset.org/#home). This means the model's backbone (a ResNet-50) has already learned rich visual features. We only need to replace the final classification layer (the "head") with a new one tailored to our specific classes (background and pneumonia). This technique, called **transfer learning**, dramatically speeds up training and improves performance.

### Q4 proposal: Build the Transfer Learning Model
**TODO**: We need to adapt the pre-trained Faster R-CNN model for our specific task. This involves replacing the final classification layer (the "box predictor") with a new one that matches our number of classes (2: pneumonia + background).
Fill in the `...` placeholders below to:
 1. Get the number of input features for the model's box predictor.
 2. Create a new `FastRCNNPredictor` with the correct number of `in_features` and `num_classes`.
 3. Replace the model's original `box_predictor` with your new one.

In [None]:
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2

def get_detection_model(num_classes=2):
    """
    Creates a Faster R-CNN model with a pre-trained ResNet-50 backbone.
    """
    # Load a model pre-trained on COCO
    model = fasterrcnn_resnet50_fpn_v2(weights='DEFAULT')

    # --- Q4: FILL IN THE ... ---

    # 1. Get the number of input features for the classifier from the model's RoI heads
    in_features = ...

    # 2. & 3. Create a new FastRCNNPredictor and replace the model's box_predictor
    model.roi_heads.box_predictor = ...

    return model

# Initialize the model and move it to the correct device
model = get_detection_model(num_classes=2)
model.to(device)

print("Faster R-CNN model created and moved to device.")

## 4. Training the Detector

We are now ready to train our model. This section defines the training loop, sets up the optimizer and learning rate scheduler, and executes the training process.

### 4.1. Training Setup

With our model defined, we now need to configure the components that will drive the learning process. Think of this like preparing a car for a race: you need an engine, an accelerator, and a strategy for managing your speed.

-   **The Optimizer (The Engine):** This is the core algorithm that updates the model's internal parameters (weights) to minimize the loss function. We will use **Stochastic Gradient Descent (SGD) with momentum**, a classic and powerful choice. It not only looks at the current error to decide which direction to go but also considers the direction it was recently moving in (momentum), which helps it navigate the complex loss landscape more smoothly.

-   **The Learning Rate (The Accelerator Pedal):** This is a critical hyperparameter that controls how *large* of a step the optimizer takes when updating the model's weights. A learning rate that is too high can cause the model to overshoot the optimal solution, while one that is too low can make training incredibly slow or get stuck.

-   **The Learning Rate Scheduler (The Gearbox/Cruise Control):** We rarely use a fixed learning rate throughout training. A **learning rate scheduler** dynamically adjusts the learning rate as training progresses. We will use a `StepLR`, which decreases the learning rate by a set factor after a certain number of epochs. This strategy allows the model to make large progress early on and then take smaller, more careful steps to fine-tune its performance as it gets closer to a good solution.

### Q5: Configure the Optimizer and Scheduler

**Your Task**: A crucial step in training is setting up the optimizer and learning rate scheduler. Fill in the `...` placeholders in the code below to:
1.  Define the `params` to be optimized (hint: these are the model parameters that require gradients).
2.  Create an `SGD` optimizer with the specified learning rate, momentum, and weight decay.
3.  Create a `StepLR` scheduler that will decrease the learning rate by a factor of `gamma` every `step_size` epochs.

In [None]:
# Training Hyperparameters
NUM_EPOCHS = 5      # For a real run, 10-20 epochs would be better
LEARNING_RATE = 0.005
MOMENTUM = 0.9
WEIGHT_DECAY = 0.0005

# Q4: FILL IN THE ...
# Gather the parameters that need to be trained
params = ...
optimizer = torch.optim.SGD(...)

# A learning rate scheduler that decreases the LR by 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(...)

print("Optimizer and Learning Rate Scheduler configured.")

### 4.2. The Training Loop

The training loop iterates through our dataset for a specified number of epochs. In each epoch, it processes the entire training set, updates the model's weights, and then evaluates the model's performance on the validation set.

For brevity and to use a standardized and optimized implementation, we will use the `train_one_epoch` and `evaluate` functions from the official `torchvision` reference scripts, which we downloaded earlier.

In [None]:
from engine import train_one_epoch
# Main Training Loop
# This will take a significant amount of time to run.
# On a standard Colab GPU, expect ~15 minutes per epoch.
start_time = time.time()
best_map = 0.0
history = []

print("--- Starting Training ---")
for epoch in range(NUM_EPOCHS):
    print(f"\n--- Epoch {epoch+1}/{NUM_EPOCHS} ---")

    # Train for one epoch, printing every 100 iterations
    metric_logger = train_one_epoch(model, optimizer, train_loader, device, epoch, print_freq=100)

    # Update the learning rate
    lr_scheduler.step()

    torch.save(model.state_dict(), './Detection/detection_model.pth')

end_time = time.time()
total_training_time = (end_time - start_time) / 60
print(f"\n--- Training Finished in {total_training_time:.2f} minutes ---")
print(f"Best Validation mAP: {best_map:.4f}")



```
1 hour training in T4
```



## 5. Evaluation

We have trained our model and used the validation set to monitor its progress and save the best version. Now comes the moment of truth: the **final exam**.

Up to this point, the **test set** has been kept completely locked away. The model has never seen a single image from it. This is a critical principle of good machine learning practice. The performance on this held-out set provides the most honest and unbiased measure of how our model will perform on new, unseen patients in a real-world scenario. It is the final verdict on our model's capabilities.

### (Optional) Download Pre-Trained Model

Training can take a long time. If you are short on time or running into issues, you can skip the training step and download a pre-trained version of the model weights by running the cell below. You can then proceed directly to the evaluation sections.

In [None]:
!hf download albarqouni/bild-dataset --repo-type dataset --include "Detection/detection_model.pth" --local-dir ./
print("Pre-trained model downloaded.")

### 5.1. Analyzing Performance Metrics

> ### Deep Dive: Object Detection Evaluation Metrics
>
> Evaluating an object detector is more complex than simple classification. The primary metrics are based on the concept of **Intersection over Union (IoU)**.
>
> -   **Intersection over Union (IoU)**: This measures the overlap between a predicted bounding box and a ground truth bounding box. It's calculated as `(Area of Overlap) / (Area of Union)`. An IoU of 1 means a perfect match, while an IoU of 0 means no overlap.


<img src="https://raw.githubusercontent.com/albarqounilab/BILD-Summer-School/refs/heads/main/images/iou.png" width="60%">


In [None]:
def compute_iou(box1, box2):
    xA = max(box1[0], box2[0])
    yA = max(box1[1], box2[1])
    xB = min(box1[2], box2[2])
    yB = min(box1[3], box2[3])

    interArea = max(0, xB - xA) * max(0, yB - yA)
    box1Area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2Area = (box2[2] - box2[0]) * (box2[3] - box2[1])

    iou = interArea / float(box1Area + box2Area - interArea)
    return iou

>
> We use IoU to classify each prediction as a:
> -   **True Positive (TP)**: The model correctly detects an object (IoU > threshold, e.g., 0.5).
> -   **False Positive (FP)**: The model predicts an object where there is none, or the IoU is below the threshold.
> -   **False Negative (FN)**: The model fails to detect a ground truth object.
>
> From these, we calculate:
> -   **Precision**: `TP / (TP + FP)` - Of all the predictions made, how many were correct?
> -   **Recall**: `TP / (TP + FN)` - Of all the actual objects, how many did the model find?
>
> -   **Average Precision (AP)**: The ultimate single-number metric. It is the area under the Precision-Recall curve, calculated across different confidence score thresholds. **mAP (mean Average Precision)** is simply the AP averaged over all classes (in our case, we only have one class, so AP and mAP are the same). The `evaluate` function we used automatically calculates mAP at different IoU thresholds, which is the standard for object detection challenges.




<img src="https://raw.githubusercontent.com/albarqounilab/BILD-Summer-School/refs/heads/main/images/avg_precision.png" width="60%">

### Q6: Run Evaluation and Visualize Predictions
**Your Task**: Now, load the best model you saved during training (or the pre-trained one you downloaded) and run the `evaluate` function on the `test_loader`.

After the evaluation, run the visualization code. For each image, the **green boxes are the ground truth** and the **red boxes are the model's predictions**. Analyze the output:
-   What is the final mAP on the test set?
-   Can you find examples of **True Positives**, **False Negatives**, and **False Positives**?

In [None]:
def evaluate(model, dataloader, device, iou_threshold=0.4, conf_threshold=0.7):
    model.eval()
    all_precisions = []
    all_recalls = []
    all_ious = []

    with torch.no_grad():
        for images, targets in dataloader:
            images = [img.to(device) for img in images]
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            outputs = model(images)

            for pred, target in zip(outputs, targets):
                gt_boxes = target['boxes'].cpu().numpy()
                pred_boxes = pred['boxes'].cpu().numpy()
                pred_scores = pred['scores'].cpu().numpy()

                # Filter by confidence
                pred_boxes = pred_boxes[pred_scores > conf_threshold]

                # Compute IoUs
                matched = set()
                ious = []
                for gt_box in gt_boxes:
                    best_iou = 0
                    for i, pred_box in enumerate(pred_boxes):
                        if i in matched:
                            continue
                        iou = compute_iou(gt_box, pred_box)
                        if iou > best_iou:
                            best_iou = iou
                            best_idx = i
                    if best_iou >= iou_threshold:
                        matched.add(best_idx)
                        ious.append(best_iou)

                TP = len(matched)
                FP = len(pred_boxes) - TP
                FN = len(gt_boxes) - TP

                precision = TP / (TP + FP) if (TP + FP) > 0 else 0
                recall = TP / (TP + FN) if (TP + FN) > 0 else 0

                all_precisions.append(precision)
                all_recalls.append(recall)
                all_ious.extend(ious)

    return {
        "precision": np.mean(all_precisions),
        "recall": np.mean(all_recalls),
        "mean_iou": np.mean(all_ious) if all_ious else 0
    }

def compute_iou(box1, box2):
    xA = max(box1[0], box2[0])
    yA = max(box1[1], box2[1])
    xB = min(box1[2], box2[2])
    yB = min(box1[3], box2[3])

    interArea = max(0, xB - xA) * max(0, yB - yA)
    box1Area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2Area = (box2[2] - box2[0]) * (box2[3] - box2[1])

    iou = interArea / float(box1Area + box2Area - interArea)
    return iou

In [None]:
# Load the best model
# Create a new model instance and load the saved weights
# best_model= model
best_model = get_detection_model(num_classes=2)
best_model.load_state_dict(torch.load('./Detection/detection_model.pth'))
best_model.to(device)

print("--- Evaluating on Test Set ---")
# Run the evaluation
test_evaluator = evaluate(best_model, test_loader, device=device)

### Q7: Improved visualization

**Your Task**: Now we will add a text tag to our ground truth and predictions `via axes[i].text()`. Fill the color, font size and alpha values

In [None]:
import math

def visualize_predictions_with_labels(model, dataloader, num_samples=8, score_threshold=0.5):
    model.eval()

    images, targets = next(iter(dataloader))
    images = [img.to(device) for img in images]

    with torch.no_grad():
        predictions = model(images)

    n = min(num_samples, len(images))
    print(f"\n--- Visualizing {n} Test Predictions with Labels ---")
    print("Green = Ground Truth | Red = Prediction (score > 0.5)")

    # auto grid: ceil(sqrt(n)) × ceil(n / sqrt(n))
    rows = math.ceil(math.sqrt(n))
    cols = math.ceil(n / rows)

    fig, axes = plt.subplots(rows, cols, figsize=(5*cols, 5*rows))
    axes = np.array(axes).reshape(-1)  # flatten

    for i in range(n):
        img_cpu = images[i].cpu().permute(1, 2, 0).numpy()
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        img_cpu = std * img_cpu + mean
        img_cpu = np.clip(img_cpu, 0, 1)

        axes[i].imshow(img_cpu)
        axes[i].axis('off')

        # GT boxes (green)
        for box in targets[i]['boxes']:
            x1, y1, x2, y2 = box.cpu().numpy()
            rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='g', facecolor='none')
            axes[i].add_patch(rect)
            # Add text label for ground truth
            axes[i].text(x1, y1 - 5, "Ground Truth", color='white', fontsize=12, bbox=dict(facecolor='green', alpha=0.8, edgecolor='none'))

        # Predictions (red)
        pred_boxes = predictions[i]['boxes'][predictions[i]['scores'] > score_threshold]
        pred_scores = predictions[i]['scores'][predictions[i]['scores'] > score_threshold]
        pred_labels = predictions[i]['labels'][predictions[i]['scores'] > score_threshold]

        for box, score, label in zip(pred_boxes, pred_scores, pred_labels):
            x1, y1, x2, y2 = box.cpu().numpy()
            rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='r', facecolor='none')
            axes[i].add_patch(rect)

            # Q7 FILL OUT
            # Add text label for prediction with score
            label_text = ... # Assuming label 1 is Pneumonia
            axes[i].text(x1, y1 - 10, f"{label_text}: {score:.2f}", color=..., fontsize=..., bbox=dict(facecolor=..., alpha=..., edgecolor='none'))


    # hide unused axes
    for j in range(n, len(axes)):
        axes[j].axis('off')

    plt.tight_layout()
    plt.show()

visualize_predictions_with_labels(best_model, test_loader, num_samples=5, score_threshold=0.5)



> 55 minutes



## Part II: Quality Control

Reimport libraries (if needed)

In [None]:
#@title re-import libraries
# Library Installations
!pip install pydicom albumentations -q

# Core Library Import
import os
# os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import sys, math, random, time, warnings
from glob import glob
from pathlib import Path

# Data Handling and Processing
import numpy as np
import pandas as pd
import cv2
import pydicom
from PIL import Image

# Deep Learning with PyTorch & Torchvision
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision
from torchvision import tv_tensors
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.utils import draw_bounding_boxes

# Data Augmentation & Visualization
import albumentations as A
from albumentations.pytorch import ToTensorV2
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Utilities
from tqdm.notebook import tqdm
# Download official torchvision helper scripts for training and evaluation
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/engine.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_utils.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")
import utils

# Environment Configuration
warnings.filterwarnings("ignore")

# Ensure reproducibility
def seed_everything(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

RANDOM_SEED = 42
seed_everything(RANDOM_SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Setup complete. Using device: {device}")

Setup complete. Using device: cuda


In [None]:
#@title re-load models
# Define paths to our data files
DATA_PATH = './rsna-pneumonia-detection-challenge'
LABELS_PATH = os.path.join(DATA_PATH, 'labels.csv')
IMAGE_DIR = os.path.join(DATA_PATH, 'images')

# Load the labels into a pandas DataFrame
df = pd.read_csv(LABELS_PATH)

# Initial Analysis
# Calculate the number of unique patients with and without pneumonia
positive_cases = df[df['Target'] == 1]['patientId'].nunique()
total_cases = df['patientId'].nunique()
negative_cases = total_cases - positive_cases

print(f"--- Dataset Overview ---")
print(f"Total unique patients: {total_cases}")
print(f"Patients with pneumonia (positive cases): {positive_cases} ({positive_cases/total_cases:.1%})")
print(f"Patients without pneumonia (negative cases): {negative_cases} ({negative_cases/total_cases:.1%})")

# Check how many pneumonia cases have multiple bounding boxes
bbox_counts = df[df['Target'] == 1].groupby('patientId').size()
print(f"Pneumonia cases with >1 bounding box: {(bbox_counts > 1).sum()}")

class PneumoniaDataset(Dataset):
    def __init__(self, dataframe, image_dir, transforms=None):
        super().__init__()
        # Q1 Focus: We only work with positive cases for this detection task
        self.df = dataframe[dataframe['Target'] == 1].copy()
        self.image_ids = self.df['patientId'].unique()
        self.image_dir = image_dir
        self.transforms = transforms

    def __getitem__(self, index: int):
        # Image Loading and Preprocessing
        image_id = self.image_ids[index]
        image_path = os.path.join(self.image_dir, f"{image_id}.dcm")

        dicom_data = pydicom.dcmread(image_path)
        image = dicom_data.pixel_array

        # Convert to a 3-channel image for compatibility with torchvision models
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

        # Target Preparation
        records = self.df[self.df['patientId'] == image_id]
        boxes = records[['x', 'y', 'width', 'height']].values

        # Convert boxes from [x, y, w, h] to [x_min, y_min, x_max, y_max]
        boxes[:, 2] = boxes[:, 0] + boxes[:, 2] # x_max
        boxes[:, 3] = boxes[:, 1] + boxes[:, 3] # y_max

        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

        # Create the target dictionary
        target = {
            'boxes': boxes, # Keep as numpy array for albumentations initially
            'labels': torch.ones(len(boxes), dtype=torch.int64), # Label 1 for pneumonia
            'image_id': torch.tensor([index]),
            'area': torch.as_tensor(area, dtype=torch.float32),
            'iscrowd': torch.zeros(len(boxes), dtype=torch.int64)
        }

        # Data Augmentation
        if self.transforms:
            # Convert labels tensor to numpy array before passing to albumentations
            labels_np = target['labels'].numpy()

            # Pass numpy array for boxes and labels to albumentations
            sample = self.transforms(image=image, bboxes=target['boxes'], labels=labels_np)
            image = sample['image']

            # Handle the case where augmentations remove all boxes
            if len(sample['bboxes']) > 0:
                target['boxes'] = torch.as_tensor(sample['bboxes'], dtype=torch.float32)
                # Convert labels back to tensor
                target['labels'] = torch.as_tensor(sample['labels'], dtype=torch.int64)
                target['area'] = (target['boxes'][:, 3] - target['boxes'][:, 1]) * (target['boxes'][:, 2] - target['boxes'][:, 0])
                target['iscrowd'] = torch.zeros(len(target['boxes']), dtype=torch.int64)
            else:
                # If all boxes are removed, return empty tensors
                target['boxes'] = torch.zeros((0, 4), dtype=torch.float32)
                target['labels'] = torch.zeros((0,), dtype=torch.int64)
                target['area'] = torch.zeros((0,), dtype=torch.float32)
                target['iscrowd'] = torch.zeros((0,), dtype=torch.int64)


            # Ensure canvas_size is set for the BoundingBoxes tensor
            if len(target['boxes']) > 0:
                 target['boxes'] = tv_tensors.BoundingBoxes(target['boxes'], format="XYXY", canvas_size=image.shape[-2:])
            else:
                 # Still need a BoundingBoxes tensor even if empty, with correct canvas_size
                 target['boxes'] = tv_tensors.BoundingBoxes(torch.zeros((0, 4), dtype=torch.float32), format="XYXY", canvas_size=image.shape[-2:])


        return image, target

    def __len__(self) -> int:
        return len(self.image_ids)

def compute_iou(box1, box2):
    xA = max(box1[0], box2[0])
    yA = max(box1[1], box2[1])
    xB = min(box1[2], box2[2])
    yB = min(box1[3], box2[3])

    interArea = max(0, xB - xA) * max(0, yB - yA)
    box1Area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2Area = (box2[2] - box2[0]) * (box2[3] - box2[1])

    iou = interArea / float(box1Area + box2Area - interArea)
    return iou

def evaluate(model, dataloader, device, iou_threshold=0.4, conf_threshold=0.7):
    model.eval()
    all_precisions = []
    all_recalls = []
    all_ious = []

    with torch.no_grad():
        for images, targets in dataloader:
            images = [img.to(device) for img in images]
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            outputs = model(images)

            for pred, target in zip(outputs, targets):
                gt_boxes = target['boxes'].cpu().numpy()
                pred_boxes = pred['boxes'].cpu().numpy()
                pred_scores = pred['scores'].cpu().numpy()

                # Filter by confidence
                pred_boxes = pred_boxes[pred_scores > conf_threshold]

                # Compute IoUs
                matched = set()
                ious = []
                for gt_box in gt_boxes:
                    best_iou = 0
                    for i, pred_box in enumerate(pred_boxes):
                        if i in matched:
                            continue
                        iou = compute_iou(gt_box, pred_box)
                        if iou > best_iou:
                            best_iou = iou
                            best_idx = i
                    if best_iou >= iou_threshold:
                        matched.add(best_idx)
                        ious.append(best_iou)

                TP = len(matched)
                FP = len(pred_boxes) - TP
                FN = len(gt_boxes) - TP

                precision = TP / (TP + FP) if (TP + FP) > 0 else 0
                recall = TP / (TP + FN) if (TP + FN) > 0 else 0

                all_precisions.append(precision)
                all_recalls.append(recall)
                all_ious.extend(ious)

    return {
        "precision": np.mean(all_precisions),
        "recall": np.mean(all_recalls),
        "mean_iou": np.mean(all_ious) if all_ious else 0
    }

!hf download albarqouni/bild-dataset --repo-type dataset --include "Detection/detection_model.pth" --local-dir ./
print("Pre-trained model downloaded.")

from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2

def get_detection_model(num_classes=2):
    """
    Creates a Faster R-CNN model with a pre-trained ResNet-50 backbone.
    Args:
        num_classes (int): The number of classes, including the background.
                           For pneumonia vs. not-pneumonia, this is 2.
    """
    # Load a model pre-trained on COCO
    model = fasterrcnn_resnet50_fpn_v2(weights='DEFAULT')

    # Get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features

    # Replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model




# Load the best model
# Create a new model instance and load the saved weights
# best_model= model
best_model = get_detection_model(num_classes=2)
best_model.load_state_dict(torch.load('./Detection/detection_model.pth'))
best_model.to(device)


def get_transforms(is_train=True, target_size=256):
    if is_train:
        return A.Compose([
            A.HorizontalFlip(p=0.5),
            A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.1, rotate_limit=10, p=0.5),
            A.RandomBrightnessContrast(p=0.5),
            # Resize must be after geometric transforms that change coordinates
            A.Resize(height=target_size, width=target_size, p=1.0),
            A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ToTensorV2(p=1.0)
        ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'], min_area=1, min_visibility=0.1))
    else: # For validation and testing, we only resize and normalize
        return A.Compose([
            A.Resize(height=target_size, width=target_size, p=1.0),
            A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ToTensorV2(p=1.0)
        ], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

print("Augmentation pipelines defined.")

from sklearn.model_selection import train_test_split

TARGET_SIZE = 256  #@param {type: "number"}
BATCH_SIZE = 2  #@param {type: "number"}

# Patient-wise Data Splitting
# We focus on the positive pneumonia cases for this detection task
positive_patient_ids = df[df['Target'] == 1]['patientId'].unique()

# Split positive cases: 70% train, 15% validation, 15% test
train_ids, test_ids = train_test_split(positive_patient_ids, test_size=0.3, random_state=RANDOM_SEED)
val_ids, test_ids = train_test_split(test_ids, test_size=0.5, random_state=RANDOM_SEED)

# Create DataFrames for each split
train_df = df[df['patientId'].isin(train_ids)]
val_df = df[df['patientId'].isin(val_ids)]
test_df = df[df['patientId'].isin(test_ids)]
test_dataset = PneumoniaDataset(test_df, IMAGE_DIR, transforms=get_transforms(is_train=False, target_size=TARGET_SIZE))
test_loader = DataLoader(
    test_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=utils.collate_fn
)

--- Dataset Overview ---
Total unique patients: 26684
Patients with pneumonia (positive cases): 6012 (22.5%)
Patients without pneumonia (negative cases): 20672 (77.5%)
Pneumonia cases with >1 bounding box: 3398
Fetching 1 files: 100% 1/1 [00:00<00:00, 14027.77it/s]
/content
Pre-trained model downloaded.
Augmentation pipelines defined.


## 6. Advanced Quality Control

A single mAP score is a useful summary, but to truly trust and improve our model, we need to dig deeper. This section introduces advanced quality control techniques to systematically analyze the model's behavior and identify specific failure modes.

### 6.1. Performance vs. Bounding Box Size

A common challenge for object detectors is accurately identifying objects across a wide range of sizes. In our case, pneumonia opacities can vary from small, subtle patches to large, consolidated areas. A robust model should perform well on all of them.

In this analysis, we will investigate whether our model has a **"size bias."** We'll categorize the ground truth bounding boxes in our test set into "small," "medium," and "large" groups. Then, for each group, we'll calculate the model's **Recall**.

**Recall** answers the critical question: **"Of all the actual pneumonia cases in this size category, what fraction did our model successfully find?"** A low recall for small boxes, for example, would indicate a clinically significant weakness.

### Q8: Does the model have a size bias?
**Your Task**: Run the code below, which calculates and plots the model's **recall** for small, medium, and large pneumonia patches. Based on the bar chart, does our model perform equally well on all sizes, or does it have a bias? What are the clinical implications of this?

In [None]:
from torchvision.ops import box_iou
import collections

# Helper function to get all predictions and targets
@torch.no_grad()
def get_all_preds(model, dataloader, device):
    model.eval()
    all_preds, all_targets = [], []
    for images, targets in tqdm(dataloader, desc="Gathering Test Predictions"):
        images = [img.to(device) for img in images]
        preds = model(images)
        all_preds.extend([{k: v.cpu() for k, v in p.items()} for p in preds])
        all_targets.extend(targets)
    return all_preds, all_targets

all_predictions, all_targets = get_all_preds(best_model, test_loader, device)

# Categorize boxes by size
size_bins = {'small': [], 'medium': [], 'large': []}
IOU_THRESHOLD = 0.5

for preds, targets in zip(all_predictions, all_targets):
    gt_boxes = targets['boxes']
    pred_boxes = preds['boxes'][preds['scores'] > 0.5]

    if len(gt_boxes) == 0: continue

    # Calculate IoU between all predictions and ground truths for this image
    ious = box_iou(gt_boxes, pred_boxes) if len(pred_boxes) > 0 else torch.zeros((len(gt_boxes), 0))

    for i, gt_box in enumerate(gt_boxes):
        area = (gt_box[2] - gt_box[0]) * (gt_box[3] - gt_box[1])

        # Categorize size (thresholds are relative to 256x256 image)
        if area < 64**2: category = 'small'
        elif area < 96**2: category = 'medium'
        else: category = 'large'

        # Check if the box was detected (max IoU > threshold)
        detected = (ious[i].max() > IOU_THRESHOLD).item() if len(pred_boxes) > 0 else False
        size_bins[category].append(detected)

# Calculate and plot recall for each size category
recall_by_size = {k: np.mean(v) if v else 0 for k, v in size_bins.items()}
counts_by_size = {k: len(v) for k, v in size_bins.items()}

plt.figure(figsize=(8, 5))
plt.bar(recall_by_size.keys(), recall_by_size.values(), color=['#FF9999', '#66B3FF', '#99FF99'])
plt.title('Model Recall vs. Bounding Box Size')
plt.ylabel('Recall')
plt.ylim(0, 1)
for i, (cat, recall) in enumerate(recall_by_size.items()):
    plt.text(i, recall + 0.02, f'{recall:.2f}\n(n={counts_by_size[cat]})', ha='center')
plt.show()

<details>
<summary>Click for Discussion</summary>

You will likely observe that the recall for **small** boxes is significantly lower than for medium and large boxes. This is a very common finding.

**Interpretation**: The model is much better at finding large, clear areas of pneumonia but struggles with smaller, potentially early-stage findings.

**Clinical Implications**: This is a critical limitation. A model that consistently misses small findings could fail to detect pneumonia in its early stages, delaying treatment. This analysis highlights the need for strategies specifically aimed at improving small object detection, such as using higher-resolution images, employing specialized network architectures (like Feature Pyramid Networks, which our model already uses), or using data augmentation techniques that create more small object examples.
</details>

### 6.2. Confidence Score Analysis & Calibration

A good model should not only be accurate, but its confidence scores should be meaningful. A prediction with 95% confidence should be correct much more often than a prediction with 60% confidence. This property is called **calibration**. Analyzing it is essential for deciding what confidence threshold to use in a real-world application.

We will first visually inspect the distribution of confidence scores for the model's correct predictions (**True Positives**) versus its incorrect predictions (**False Positives**). This will give us an initial intuition about whether the model is generally more confident when it is correct. We will then quantify this relationship more formally with a **Reliability Diagram**.

> ### Deep Dive: Model Calibration
>
> **What is calibration?** A model is considered well-calibrated if its predicted confidence scores accurately reflect the true probability of an event. For example, if we look at all the detections the model made with a confidence of 80-90%, a well-calibrated model would be correct on about 85% of them.
>
> **Why does it matter?** In high-stakes applications like medicine, calibration is crucial for building trust. A clinician needs to know if they can trust a model's confidence. An **over-confident** model (one that predicts high scores but is often wrong) is dangerous because it can lead to false assurances. An **under-confident** model is less useful because it may not be trusted even when it's correct.
>
> **How do we measure it?** We use a **Reliability Diagram**. This plot bins all predictions by their confidence score (e.g., 0-10%, 10-20%, etc.) on the x-axis. For each bin, it then calculates the actual accuracy (the fraction of true positives) and plots that on the y-axis. For a perfectly calibrated model, all points would lie on the diagonal line `y=x`. Deviations from this line show miscalibration.

First, let's visually inspect the distribution of confidence scores for the model's correct predictions (True Positives) versus its incorrect predictions (False Positives).

### Q9: Analyze Confidence and Calibration
**Your Task**: Analyze the two plots below.
1.  **Score Distribution Plot**: This plot shows two histograms: one for the confidence scores of True Positives (correct detections) and one for False Positives (incorrect detections).
    -   What do you observe? Ideally, where would you want these two distributions to be?
    -   Based on this plot, is the model generally more confident when it is correct than when it is wrong?
2.  **Reliability Diagram**: This plot shows model accuracy as a function of its confidence.
    -   Does your model appear to be well-calibrated, over-confident, or under-confident? Why?

In [None]:
# Separate True Positives and False Positives
tp_scores, fp_scores = [], []

for preds, targets in zip(all_predictions, all_targets):
    gt_boxes = targets['boxes']
    pred_boxes = preds['boxes']
    pred_scores = preds['scores']

    if len(pred_boxes) == 0: continue

    # Match predictions to ground truths
    ious = box_iou(gt_boxes, pred_boxes) if len(gt_boxes) > 0 else torch.zeros((0, len(pred_boxes)))

    # Track which GTs have been matched to prevent double counting
    gt_matched = [False] * len(gt_boxes)

    for i, pred_score in enumerate(pred_scores):
        # Find the best GT match for this prediction
        if ious.shape[0] > 0:
            max_iou, max_idx = ious[:, i].max(dim=0)
            if max_iou > IOU_THRESHOLD and not gt_matched[max_idx]:
                tp_scores.append(pred_score.item())
                gt_matched[max_idx] = True # Mark this GT as used
            else:
                fp_scores.append(pred_score.item())
        else: # No GT boxes, all preds are FPs
            fp_scores.append(pred_score.item())

# Plot the Score Distributions
plt.figure(figsize=(10, 6))
plt.hist(tp_scores, bins=50, range=(0,1), density=True, color='green', alpha=0.7, label=f'True Positives (n={len(tp_scores)})')
plt.hist(fp_scores, bins=50, range=(0,1), density=True, color='red', alpha=0.7, label=f'False Positives (n={len(fp_scores)})')
plt.title('Confidence Score Distribution for Predictions')
plt.xlabel('Confidence Score')
plt.ylabel('Density')
plt.legend()
plt.show()

# Plot Reliability Diagram
from sklearn.calibration import calibration_curve
all_scores = tp_scores + fp_scores
true_labels = [1] * len(tp_scores) + [0] * len(fp_scores)
prob_true, prob_pred = calibration_curve(true_labels, all_scores, n_bins=10)

plt.figure(figsize=(8, 8))
plt.plot(prob_pred, prob_true, "o-", label="Model Calibration")
plt.plot([0, 1], [0, 1], "k--", label="Perfect Calibration")
plt.xlabel("Mean Predicted Confidence (in bin)")
plt.ylabel("Fraction of True Positives (in bin)")
plt.title("Reliability Diagram for Pneumonia Detector")
plt.legend()
plt.grid(True)
plt.show()


<details>
<summary>Click for Discussion</summary>

1.  **Score Distribution Plot**:
    -   **Observation**: You will likely see that the distribution for True Positives is shifted to the right (higher scores) compared to the distribution for False Positives. However, there will be a significant overlap.
    -   **Ideal Scenario**: In a perfectly calibrated model, the True Positive distribution would be a sharp peak near a confidence of 1.0, and the False Positive distribution would be a sharp peak near 0.0, with very little overlap.
    -   **Interpretation**: The overlap indicates that the model sometimes makes mistakes with high confidence and sometimes makes correct predictions with low confidence. While it is generally more confident when it is correct, its scores are not perfectly reliable indicators of correctness.

2.  **Reliability Diagram**:
    -   **Observation**: You will likely see that the blue line (model calibration) lies consistently **below** the dashed line (perfect calibration), especially for higher confidence bins.
    -   **Interpretation**: This indicates the model is **over-confident**. For example, for the bin of predictions where the model's average confidence is ~0.9 (90%), the actual accuracy (fraction of true positives) might only be ~0.75 (75%). It systematically overestimates its own correctness, which is a common issue in modern neural networks.
</details>

### 6.3. Visualizing the Hardest Cases

While aggregated statistics and distributions are powerful, nothing builds understanding like looking at individual images. Instead of choosing random samples, it is far more insightful to programmatically find and visualize the model's most significant errors. This helps us build an intuition for *why* the model fails.

We will focus on two critical error types:

1.  **Top False Positives:** These are the detections the model made with the *highest confidence* but were actually incorrect. These are the model's most "confident mistakes" and reveal what kinds of image features are most likely to confuse it.
2.  **Top False Negatives:** These are the ground truth pneumonia cases that the model *failed to detect* with a reasonable confidence score. From a clinical perspective, these "misses" are often the most serious type of error.

In [None]:
from torchvision.ops import box_iou
from matplotlib import patches
import numpy as np
import torch

# Define a label map for our single class
LABEL_MAP = {1: "Pneumonia"}

# Helper to visualize single predictions
def visualize_single_case(image, gt_boxes=None, pred_boxes=None, title=""):
    # Un-normalize for display
    image = image.numpy().transpose(1, 2, 0)
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean
    image = np.clip(image, 0, 1)

    fig, ax = plt.subplots(1, 1, figsize=(8, 8))
    ax.imshow(image)
    ax.set_title(title, fontsize=16)

    # Draw Ground Truth Boxes (Green)
    if gt_boxes is not None:
        for box in gt_boxes:
            x1, y1, x2, y2 = box
            # Draw the box
            rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='g', facecolor='none')
            ax.add_patch(rect)
            # Add the "Ground Truth" label above the box with the new style
            ax.text(x1, y1 -1, 'Ground Truth',
                    color='white',
                    fontsize=14,
                    bbox=dict(facecolor='green', edgecolor='none', pad=1.5))

    # Draw Prediction Boxes (Red)
    if pred_boxes is not None:
        for box_data in pred_boxes:
            # Assumes pred_boxes is a list of tuples: (box, score, label)
            box, score, label = box_data
            x1, y1, x2, y2 = box
            # Draw the box
            rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='r', facecolor='none')
            ax.add_patch(rect)
            # Add the label with score above the box using the requested style
            label_text = LABEL_MAP.get(int(label), f'Class {int(label)}')
            ax.text(x1, y1 - 1, f"{label_text}:{score:.2f}",
                    color="white",
                    fontsize=14,
                    bbox=dict(facecolor="red", edgecolor="none", pad=1.5))

    ax.axis('off')
    plt.show()

# Identify and sort False Positives by score
# This part assumes 'all_predictions' and 'all_targets' are available from the previous QC steps
fp_info = []
IOU_THRESHOLD = 0.5 # A standard threshold

for preds, targets in zip(all_predictions, all_targets):
    gt_boxes = targets['boxes']
    ious = box_iou(preds['boxes'], gt_boxes) if len(gt_boxes) > 0 else torch.zeros((len(preds['boxes']), 0))

    for j, pred_box in enumerate(preds['boxes']):
        max_iou = ious[j].max().item() if len(gt_boxes) > 0 else 0
        if max_iou < IOU_THRESHOLD:
            fp_info.append({
                'image_idx': targets['image_id'].item(),
                'box': pred_box.numpy(),
                'score': preds['scores'][j].item(),
                'label': preds['labels'][j].item()
            })

fp_info.sort(key=lambda x: x['score'], reverse=True)

print("--- Top 3 Most Confident False Positives ---")
for fp in fp_info[:3]:
    img, _ = test_dataset[fp['image_idx']]
    # Pack prediction info into a list of tuples for the visualization function
    pred_data_for_vis = [(fp['box'], fp['score'], fp['label'])]
    visualize_single_case(img, pred_boxes=pred_data_for_vis, title=f"False Positive (Score: {fp['score']:.3f})")

# Identify False Negatives
fn_info = []
for i, (preds, targets) in enumerate(zip(all_predictions, all_targets)):
    gt_boxes = targets['boxes']
    pred_boxes = preds['boxes'][preds['scores'] > 0.5] # Only consider confident predictions

    if len(gt_boxes) == 0: continue

    ious = box_iou(gt_boxes, pred_boxes) if len(pred_boxes) > 0 else torch.zeros((len(gt_boxes), 0))

    for j, gt_box in enumerate(gt_boxes):
        max_iou = ious[j].max().item() if len(pred_boxes) > 0 else 0
        if max_iou < IOU_THRESHOLD:
            fn_info.append({
                'image_idx': targets['image_id'].item(),
                'box': gt_box.numpy()
            })

print("\n--- 3 Examples of False Negatives (Missed Detections) ---")
for fn in fn_info[:3]:
    img, _ = test_dataset[fn['image_idx']]
    visualize_single_case(img, gt_boxes=[fn['box']], title="False Negative (Missed)")


> 65 minutes


## 7. Advanced Quality Control II: Model Explainability with Grad-CAM

We have now evaluated our model's performance with metrics (the **what**) and analyzed its specific failure modes (the **where**). But to build true trust in a model, especially in a clinical setting, we must be able to answer the most important question: **why?**

-   *Why* did the model make a particular prediction?
-   *Which specific features* in the image led to its decision?
-   Is the model "looking" at the clinically relevant pathology, or is it exploiting a spurious correlation or artifact in the image?

This is the domain of **Explainable AI (XAI)**. XAI techniques aim to peek inside the "black box" of a neural network to make its decision-making process more transparent and interpretable.

For this task, we will use **Grad-CAM (Gradient-weighted Class Activation Mapping)**. Grad-CAM is a powerful and popular XAI technique that produces a visual **heatmap**, highlighting the most important regions in an input image for a particular prediction.

-   **Hotter areas (red/yellow)** on the heatmap indicate pixels that strongly influenced the model to make its prediction.
-   **Cooler areas (blue/green)** indicate pixels that were less influential.

By overlaying this heatmap on our original X-ray, we can visually verify if our model is focusing on the actual pneumonia opacities or if it's being "distracted" by irrelevant features. This is a critical final step in validating whether our model is not just accurate, but also right for the right reasons.

> ### Deep Dive: How Grad-CAM Works
>
> Grad-CAM creates its heatmap by combining two key pieces of information from the model:
>
> 1.  **Feature Maps from a Convolutional Layer**: Deep inside the network, convolutional layers produce feature maps that highlight abstract patterns like textures, edges, and shapes. The final convolutional layers capture the most high-level, class-specific information.
>
> 2.  **Gradients**: It calculates the gradient (the importance signal) of the model's final prediction score with respect to each feature map. A high gradient for a particular feature map means that map was very influential in the final decision.
>
> Grad-CAM then computes a weighted average of all the feature maps, where the weight for each map is its gradient. The result is a single heatmap that shows a spatially-localized summary of where the model found the most important evidence for its prediction.

In [None]:
# First, we need to install the pytorch-grad-cam library
!pip install grad-cam -q

from pytorch_grad_cam import GradCAM
from pytorch_grad_cam.utils.image import show_cam_on_image

print("Grad-CAM library and helpers imported.")

#### Helper function
`faster_rcnn_target_function` This function takes the model's raw output and returns a single scalar value (the sum of pneumonia scores) for Grad-CAM to explain. This is more robust than the library's built-in helpers.

In [None]:
def faster_rcnn_target_function(output):
    # For a single image, the output is a dictionary.
    scores = output['scores']
    labels = output['labels']

    # Find the indices of all boxes predicted as "pneumonia" (label 1)
    pneumonia_indices = (labels == 1).nonzero(as_tuple=True)[0]

    if len(pneumonia_indices) == 0:
        # If no pneumonia is detected, there is no score to explain.
        # Return a zero tensor.
        return torch.tensor(0.0, device=scores.device)

    # Sum the scores of all detected pneumonia boxes
    pneumonia_scores = scores[pneumonia_indices]
    return pneumonia_scores.sum()

### Q10: Select a target layer for explanation
**Your Task**: Grad-CAM needs to hook into a specific convolutional layer to generate its heatmap. The best layers are usually the final, deep feature-extracting layers of the model's backbone, as they contain the richest semantic information.

Let's inspect the model's backbone architecture. Based on the output, identify a suitable final layer. A good choice is often the last block of the main feature extractor. Fill in the `...` with the correct layer.

In [None]:
# Let's print the model's backbone to see the layers
print(best_model.backbone.body)

In [None]:
# Q10: Define the target layer for Grad-CAM
target_layer = [best_model.backbone.body.layer4]

# Instantiate the GradCAM object
cam = GradCAM(model=best_model, target_layers=target_layer)

In [None]:
# Q10: Define the target layer for Grad-CAM
# target_layer = [...] # Fill this in based on the architecture above

# Instantiate the GradCAM object
cam = GradCAM(model=best_model, target_layers=target_layer)

### Q11: Generate and Interpret the Grad-CAM Visualizations
**Your Task**: Run the code below to generate Grad-CAM heatmaps for a few test images. The code will display three images for each sample:
1.  The original image with ground truth (green) and prediction (red) boxes.
2.  The raw heatmap generated by Grad-CAM.
3.  The heatmap overlaid on the original image.

**Analyze the results:**
-   Do the high-attention areas (red/yellow) in the heatmap correspond to the actual pneumonia opacities?
-   In cases where the model made a **False Positive** (a red box with no green box), where is the heatmap located? Does this give you a clue as to what might have confused the model?
-   In cases where the model made a **False Negative** (a green box with no red box), is there any activation in the heatmap at all, or did the model completely ignore the area?

In [None]:
# Define a label map for our single class
LABEL_MAP = {1: "Pneumonia"}

def visualize_grad_cam(model, dataloader, cam_instance, num_samples=4, score_threshold=0.5):
    model.eval()
    images, targets = next(iter(dataloader))
    images = list(img.to(device) for img in images)

    print("--- Grad-CAM Explanations ---")
    print("Green = Ground Truth | Red = Prediction")

    for i in range(min(num_samples, len(images))):
        input_tensor = images[i].unsqueeze(0)

        with torch.no_grad():
            pred = model(input_tensor)[0]

        # Skip CAM generation if no boxes are detected
        if len(pred['boxes']) == 0:
            print(f"Sample {i+1}: No objects detected by the model. Skipping CAM generation.")
            # (Code to show image and GT for context remains the same)
            continue

        # Use our custom target function for Grad-CAM
        cam_targets = [faster_rcnn_target_function]
        grayscale_cam = cam_instance(input_tensor=input_tensor, targets=cam_targets)[0, :]

        # Visualization
        img_np = input_tensor[0].cpu().permute(1, 2, 0).numpy()
        mean, std = np.array([0.485, 0.456, 0.406]), np.array([0.229, 0.224, 0.225])
        img_unnormalized = std * img_np + mean
        img_unnormalized = np.clip(img_unnormalized, 0, 1)

        cam_overlay = show_cam_on_image(img_unnormalized, grayscale_cam, use_rgb=True)

        fig, axes = plt.subplots(1, 3, figsize=(18, 6))

        # Plot 1: Original with Boxes and Labels
        axes[0].imshow(img_unnormalized)
        axes[0].set_title('Original + Predictions')
        axes[0].axis('off')

        # Plot 2: Raw Heatmap
        axes[1].imshow(grayscale_cam)
        axes[1].set_title('Grad-CAM Heatmap')
        axes[1].axis('off')

        # Plot 3: Overlay with Boxes and Labels
        axes[2].imshow(cam_overlay)
        axes[2].set_title('Overlay')
        axes[2].axis('off')

        # Draw boxes and labels on both relevant plots (axes[0] and axes[2])
        for ax in [axes[0], axes[2]]:
            # Ground truth boxes (Green)
            for box in targets[i]['boxes']:
                x1, y1, x2, y2 = box.cpu().numpy()
                ax.add_patch(patches.Rectangle((x1, y1), x2-x1, y2-y1, lw=2, ec='g', fc='none'))

                # # Add styled text label for predictions
                ax.text(x1, y1 - 1, "Ground truth",
                        color="white",
                        fontsize=14,
                        bbox=dict(facecolor="green", edgecolor="none", pad=1.5))

            # Prediction boxes (Red) with labels
            for box, score, label in zip(pred['boxes'], pred['scores'], pred['labels']):
                if score > score_threshold:
                    x1, y1, x2, y2 = box.cpu().numpy()
                    ax.add_patch(patches.Rectangle((x1, y1), x2-x1, y2-y1, lw=2, ec='r', fc='none'))

                    # Add styled text label for predictions
                    label_text = LABEL_MAP.get(int(label), f'Class {int(label)}')
                    ax.text(x1, y1 - 1, f"{label_text}:{score:.2f}",
                            color="white",
                            fontsize=14,
                            bbox=dict(facecolor="red", edgecolor="none", pad=1.5))

        plt.tight_layout()
        plt.show()


> 85 minutes

In [None]:
# Instantiate the GradCAM object
threshold = 0.6  #@param {type: "slider", min: 0, max: 1, step: 0.05}
# Define the layer to use for Grad-CAM visualization

target_layer = [best_model.backbone.body.layer4]
# Or dynamically get the selected layer from the model's backbone body
layer_to_use = "layer3"  #@param ['layer1', 'layer2', 'layer3', 'layer4']
# We need to handle the case where the user might enter a layer name that doesn't exist,
# although the dropdown should prevent this. Using getattr is safer.
if hasattr(best_model.backbone.body, layer_to_use):
    target_layer = [getattr(best_model.backbone.body, layer_to_use)]
    print(f"Target layer for Grad-CAM set to: {layer_to_use}")
else:
    print(f"Error: Layer '{layer_to_use}' not found in the model's backbone body.")
    target_layer = None # Set to None or a default if the layer is not found


cam = GradCAM(model=best_model, target_layers=target_layer)

# Run the updated visualization on the test loader
visualize_grad_cam(best_model, test_loader, cam, num_samples=5, score_threshold= threshold)

## Conclusion and final thoughts

Congratulations on completing this comprehensive, hands-on journey through medical object detection!

Over the course of this notebook, you have successfully built and analyzed a deep learning model from start to finish. You have not just trained a model, but have also engaged in the critical practices that separate a simple experiment from a robust, well-understood AI system.

**Let's recap what you have achieved:**
-   You successfully handled and prepared a real-world medical imaging dataset in the complex **DICOM** format.
-   You built a custom **PyTorch `Dataset` and `DataLoader`**, mastering the specific data pipeline required for object detection.
-   You fine-tuned a state-of-the-art **Faster R-CNN** model, a powerful architecture used across the industry.
-   You learned to interpret crucial **object detection metrics** like mAP, moving beyond simple accuracy.
-   Most importantly, you performed an **advanced quality control analysis**, investigating your model's biases, calibration, and explainability with **Grad-CAM**.

**Key Takeaways:**
-   **Data is paramount:** A deep understanding of your data through EDA and quality control is the foundation of any successful model.
-   **Evaluation is multi-faceted:** A single metric is never enough. A thorough analysis of performance across different subsets (like object size), along with visual inspection of failure modes, is essential.
-   **Explainability builds trust:** For a model to be useful in a high-stakes field like medicine, we must be able to understand *why* it makes its decisions. Tools like Grad-CAM are a vital step toward building trustworthy AI.

### Next steps

> This notebook provides a strong foundation. From here, you could explore more advanced models, experiment with different data augmentation strategies, or incorporate the negative (non-pneumonia) samples to build a more comprehensive diagnostic tool.

Here are some exciting next steps you could take to build upon what you've learned:

-   **Incorporate Negative Samples:** Modify the `Dataset` to include the non-pneumonia cases. This would allow you to train a more complete diagnostic tool that can act as both a detector and a classifier.
-   **Experiment with Different Architectures:** Try replacing the Faster R-CNN model with more modern, single-stage detectors like **YOLO** or **RetinaNet**.
-   **Hyperparameter Tuning:** Systematically experiment with different learning rates, batch sizes, and data augmentation strategies to further improve your model's mAP score.
-   **Explore 3D Medical Imaging:** Apply these concepts to 3D datasets like CT or MRI scans, where the challenges of data handling and model architecture become even more interesting.




> 90 minutes