This project builds and evaluates deep learning models to classify chest X-ray images as PNEUMONIA or NORMAL. It explores multiple approaches including shallow CNNs, deep CNNs, transfer learning with MobileNet, and model interpretability using Grad-CAM.
Pneumonia is a serious lung infection that can be detected using chest X-rays. In this project, we:
- Build CNN models from scratch
- Compare shallow vs deep architectures
- Fine-tune a pretrained model (MobileNet)
- Evaluate performance using ROC curves
- Interpret model predictions using Grad-CAM
- Dataset: Chest X-ray images
- Classes:
- NORMAL
- PNEUMONIA
- Resize images →
112 x 112 - Convert to tensors
- Normalize using ImageNet statistics
- Split training data:
- 80% train
- 20% validation
Architecture:
- Conv → ReLU → MaxPool
- Conv → ReLU → MaxPool
- Flatten → Fully Connected
📌 Key Insight:
- Fast to train
- Limited feature extraction capability
Architecture:
- 3 Convolutional layers
- Increasing channels (16 → 32 → 64)
- Dropout for regularization
- Fully connected layers
📌 Key Insight:
- Better feature learning
- Improved generalization vs shallow model
- Pretrained on ImageNet
- Replaced final classifier layer
- Fine-tuned on chest X-ray dataset
📌 Key Insight:
- Best performance among all models
- Faster convergence
- Leverages learned features
- Loss Function: CrossEntropyLoss
- Optimizer: Adam
- Learning Rate: 0.0001
- Batch Size: 64
- Epochs: ~20
- Model checkpoint saved every epoch
- ROC Curve
- AUC (Area Under Curve)
- Training vs Validation Loss
- Model generalization analysis
| Model | Performance |
|---|---|
| Shallow CNN | Baseline performance |
| Deep CNN | Improved accuracy |
| MobileNet | Best performance |
📌 Best model: MobileNet (Transfer Learning)
- Training loss decreases steadily
- Validation loss stabilizes early
- Optimal model found around epoch 5–7
- Later epochs show signs of overfitting
Grad-CAM is used to visualize where the model is looking when making predictions.
- Highlights lung regions relevant for pneumonia
- Helps validate model decisions
- Improves trust in predictions
- Compared shallow vs deep architectures
- Evaluated effect of model depth
- Tested transfer learning vs training from scratch
- Analyzed overfitting behavior
- Visualized predictions using Grad-CAM
- Overfitting in shallow models
- Limited dataset size
- Class imbalance
- Model interpretability
- Use larger medical datasets
- Apply data augmentation
- Use advanced architectures (ResNet, EfficientNet)
- Deploy as a web app (Streamlit / FastAPI)
- Add explainability dashboards
- Python
- PyTorch
- Torchvision
- NumPy
- Matplotlib
- Scikit-learn
- pytorch-grad-cam
- PyTorch Documentation: https://pytorch.org
- Torchvision Models: https://pytorch.org/vision/stable/models.html
- Grad-CAM Paper: https://arxiv.org/abs/1610.02391