# Pediatric Pneumonia Detection - Complete Pipeline

This notebook implements the complete end-to-end workflow for detecting pneumonia in pediatric chest X-rays. 

**Workflow:**
1.  **Data Preparation**: Download dataset from Kaggle, explore structure, and create data generators.
2.  **Training**: Train a ResNet-50 model using specific two-stage transfer learning (Feature Extraction + Fine-Tuning).
3.  **Evaluation**: Evaluate performance on the test set using comprehensive metrics (Accuracy, AUC, Sensitivity, Specificity).
4.  **Explainability**: Visualize model focus regions using Grad-CAM.

---

## 1. Environment Setup
Import necessary libraries and project modules.

In [None]:
import sys
import os
import kagglehub
from pathlib import Path

# Add project root to path to import model_core
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

from model_core.data_pipeline import DataPipeline
from model_core.model_builder import ModelBuilder
from model_core.trainer import Trainer
from model_core.evaluator import ModelEvaluator
from model_core.gradcam import GradCAMVisualizer
from model_core.utils import Utils

# Configuration
OUTPUT_DIR = "../outputs"
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

## 2. Data Preparation
We download the dataset from Kaggle and prepare the data generators.

In [None]:
# Download latest version of the dataset
path = kagglehub.dataset_download("paultimothymooney/chest-xray-pneumonia")
print("Path to dataset files:", path)

# Set dataset path
DATASET_PATH = os.path.join(path, "chest_xray")
print(f"Using dataset at: {DATASET_PATH}")

In [None]:
# Initialize pipeline
pipeline = DataPipeline(DATASET_PATH, img_size=IMG_SIZE, batch_size=BATCH_SIZE)

# Explore dataset structure
stats = pipeline.explore_dataset()

# Create stratified validation split
pipeline.create_validation_split(val_ratio=0.15)

# Create data generators
train_gen, val_gen, test_gen = pipeline.create_generators(use_augmentation=True)

# Calculate class weights for imbalance handling
class_weights = Utils.calculate_class_weights(train_gen)

### Visualize Samples
Let's look at some representative X-ray images from the dataset.

In [None]:
pipeline.visualize_samples()

## 3. Training Stage 1: Feature Extraction
In this stage, we freeze the ResNet-50 backbone and only train the custom classification head.

In [None]:
# Build model with frozen backbone
model = ModelBuilder.build(img_size=IMG_SIZE, trainable_backbone=False)
ModelBuilder.compile(model, learning_rate=1e-4)

# Initialize trainer
trainer = Trainer(model, output_dir=OUTPUT_DIR)

# Train Stage 1
history1 = trainer.train(
    train_gen, val_gen, 
    epochs=8,
    stage='stage1',
    class_weight=class_weights
)

# Visualize training history
trainer.plot_history(history1, 'stage1')
Utils.print_best_metrics(history1, 'Stage 1')

## 4. Training Stage 2: Fine-Tuning
Now we unfreeze the top layers of the backbone to fine-tune the feature representations.

In [None]:
# Unfreeze layers
base_model = model.layers[1]
base_model.trainable = True
# Keep bottom layers frozen to prevent overfitting
for layer in base_model.layers[:140]:
    layer.trainable = False

# Recompile with lower learning rate
ModelBuilder.compile(model, learning_rate=1e-5)

# Train Stage 2
history2 = trainer.train(
    train_gen, val_gen, 
    epochs=5,
    stage='stage2',
    class_weight=class_weights
)

# Visualize fine-tuning history
trainer.plot_history(history2, 'stage2')
Utils.print_best_metrics(history2, 'Stage 2')

## 5. Save Model
We save the final fine-tuned model for future inference.

In [None]:
# Define path for final model
final_model_path = os.path.join(OUTPUT_DIR, "final_pneumonia_model.h5")

# Save model
model.save(final_model_path)
print(f"âœ… Final model saved to: {final_model_path}")
print(f"   (Checkpoints also saved in: {trainer.checkpoint_dir})")

## 6. Comprehensive Evaluation
Now that the model is trained, we evaluate its performance on the held-out test set.

In [None]:
# Initialize Evaluator with the trained model and test generator
evaluator = ModelEvaluator(model, test_gen)

# Calculate and print metrics
metrics = evaluator.calculate_metrics()

# Detailed Classification Report
evaluator.generate_classification_report();

### Performance Visualizations

In [None]:
evaluator.plot_confusion_matrix()

In [None]:
evaluator.plot_roc_curve()

In [None]:
evaluator.plot_precision_recall_curve()

## 7. Explainability (Grad-CAM)
We use Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize which regions of the X-ray the model focuses on when making predictions. This is crucial for verifying medical relevance.

In [None]:
gradcam = GradCAMVisualizer(model)

# Visualize a batch of random samples from the test set
gradcam.visualize_batch(test_gen, num_samples=8)