# Pneumonia Detection from Chest X-Ray Images
## Using Transfer Learning with MobileNetV2

This notebook implements a binary classification model to detect pneumonia from chest X-ray images using transfer learning and fine-tuning techniques.

## 1. Install Required Libraries
Install kagglehub to download the dataset from Kaggle

In [None]:
!pip install kagglehub -q

## 2. Download Dataset from Kaggle
Downloads the chest X-ray pneumonia dataset. This includes train, test, and validation folders with pneumonia and normal X-ray images.

In [None]:
import kagglehub
import os

# Download dataset
path = kagglehub.dataset_download("paultimothymooney/chest-xray-pneumonia")

print("Dataset downloaded at:", path)

data_dir = os.path.join(path, "chest_xray")
print("Folders:", os.listdir(data_dir))

## 3. Import Required Libraries
- **TensorFlow/Keras**: Deep learning framework for building neural networks
- **ImageDataGenerator**: Handles image loading and data augmentation
- **MobileNetV2**: Pre-trained model for transfer learning
- **scikit-learn**: For metrics and class weight calculation
- **Supporting libraries**: NumPy, Matplotlib, OpenCV for data processing and visualization

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam
import numpy as np
import matplotlib.pyplot as plt
from sklearn.utils.class_weight import compute_class_weight
from sklearn.metrics import classification_report, confusion_matrix
import cv2

## 4. Configuration & Setup
- **IMG_SIZE**: 224x224 is the standard input size for MobileNetV2
- **BATCH_SIZE**: Number of images processed per training step (32 is optimal for GPU memory)
- Create paths to training and test directories

In [None]:
IMG_SIZE = 224
BATCH_SIZE = 32

train_path = os.path.join(data_dir, "train")
test_path = os.path.join(data_dir, "test")

## 5. Data Augmentation & Image Generators
### Training Data Augmentation:
- **rescale**: Normalize pixel values from 0-255 to 0-1
- **rotation_range**: Random rotations up to 15 degrees
- **zoom_range**: Random zoom between 80-120%
- **horizontal_flip**: Flip images left-right (applicable for X-rays)
- **validation_split**: Use 20% of training data for validation

### Test Data:
- Only rescaling (no augmentation for test data)

In [None]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2
)

test_datagen = ImageDataGenerator(rescale=1./255)

## 6. Create Data Generators
- **train_generator**: Loads 80% of training images with augmentation
- **val_generator**: Loads 20% of training images for validation
- **test_generator**: Loads test images without shuffling (maintains order for comparison)
- **class_mode='binary'**: For pneumonia (1) vs normal (0) classification

In [None]:
train_generator = train_datagen.flow_from_directory(
    train_path,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training'
)

val_generator = train_datagen.flow_from_directory(
    train_path,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation'
)

test_generator = test_datagen.flow_from_directory(
    test_path,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    shuffle=False
)

## 7. Compute Class Weights
If the dataset is imbalanced (more normal images than pneumonia), assign higher weights to the minority class.
This prevents the model from being biased toward the majority class.

In [None]:
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_generator.classes),
    y=train_generator.classes
)

class_weights = dict(enumerate(class_weights))
print("Class Weights:", class_weights)

## 8. Build Model with Transfer Learning
### MobileNetV2 Base Model:
- Pre-trained on ImageNet (1.4M images, 1000 classes)
- Input shape: 224x224x3 (RGB images)
- **include_top=False**: Remove the original classification layer
- **weights='imagenet'**: Load pre-trained weights
- **trainable=False**: Freeze all layers initially (transfer learning)

In [None]:
base_model = MobileNetV2(
    input_shape=(224,224,3),
    include_top=False,
    weights='imagenet'
)

base_model.trainable = False  # Freeze all layers

## 9. Add Custom Classification Layers
- **GlobalAveragePooling2D**: Converts spatial features to a single vector
- **Dense(128)**: Fully connected layer with 128 neurons, learns patterns from base model features
- **Dropout(0.5)**: Randomly disables 50% of neurons to prevent overfitting
- **Dense(1) + sigmoid**: Output layer for binary classification (pneumonia probability)

In [None]:
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

## 10. Compile Model
- **optimizer='adam'**: Adaptive learning rate optimizer
- **loss='binary_crossentropy'**: Loss function for binary classification
- **metrics=['accuracy']**: Track accuracy during training

In [None]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

## 11. First Training Phase - Transfer Learning
Train only the custom layers (top layers). Base model layers remain frozen.
This quickly adapts pre-trained features to pneumonia detection task.
- **epochs=5**: Training iterations
- **class_weight**: Use balanced weights to handle class imbalance

In [None]:
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=5,
    class_weight=class_weights
)

## 12. Fine-Tuning Phase - Unfreeze Base Model Layers
Unfreeze the last 20 layers of MobileNetV2 for fine-tuning.
- Early layers: Detect generic features (edges, textures) - keep frozen
- Last 20 layers: Learn task-specific patterns (pneumonia indicators) - unfreeze

In [None]:
base_model.trainable = True

for layer in base_model.layers[:-20]:
    layer.trainable = False

## 13. Recompile Model for Fine-Tuning
Use a very small learning rate (1e-5) to make subtle adjustments without destroying pre-trained knowledge.

In [None]:
model.compile(
    optimizer=Adam(learning_rate=1e-5),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

## 14. Fine-Tune Model
Continue training with unfrozen base layers. Updates last 20 layers with small learning rate.

In [None]:
history_fine = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=5,
    class_weight=class_weights
)

## 15. Predictions & Evaluation (Default Threshold = 0.5)
Generate predictions on test set and evaluate performance.
- Probabilities > 0.5 → classified as pneumonia (1)
- Probabilities ≤ 0.5 → classified as normal (0)

In [None]:
predictions = model.predict(test_generator)
pred_labels = (predictions > 0.5).astype(int)

print("Confusion Matrix:")
print(confusion_matrix(test_generator.classes, pred_labels))
print("\nClassification Report:")
print(classification_report(test_generator.classes, pred_labels))

## 16. Threshold Optimization
Lower the threshold to 0.36 to increase sensitivity.
- More predictions become pneumonia-positive
- Reduces false negatives (missing pneumonia cases)
- May increase false positives
- Important for medical diagnosis (better to over-predict than miss pneumonia)

In [None]:
threshold = 0.36  # Try 0.45 / 0.35 also

pred_labels = (predictions > threshold).astype(int)

print("Confusion Matrix (Threshold = 0.36):")
print(confusion_matrix(test_generator.classes, pred_labels))
print("\nClassification Report (Threshold = 0.36):")
print(classification_report(test_generator.classes, pred_labels))

## Summary
This notebook demonstrates:
1. **Transfer Learning**: Leveraging pre-trained MobileNetV2 for faster, better results
2. **Data Augmentation**: Creating variations to improve generalization
3. **Class Balancing**: Handling imbalanced datasets with class weights
4. **Fine-Tuning**: Selectively unfreezing layers for task-specific learning
5. **Threshold Optimization**: Adjusting decision boundaries for medical safety

The model achieves high accuracy in detecting pneumonia from chest X-ray images.

## 17. Save the Trained Model
Save the trained model in .keras format for future use and deployment.
The .keras format is TensorFlow's native format that preserves the complete model architecture, weights, and training configuration.

In [None]:
# Save the trained model in .keras format
model_path = os.path.join(data_dir, "pneumonia_detection_model.keras")
model.save(model_path)

print(f"Model saved successfully at: {model_path}")
print(f"Model format: .keras (TensorFlow native format)")