Welcome to PulmoScan, a comprehensive deep learning project for automated detection and classification of pulmonary nodules from CT scans using 3D Convolutional Neural Networks and radiomics-based machine learning. This work implements the approach described in:
"3D Deep Learning from CT Scans Predicts Tumor Invasiveness of Subcentimeter Pulmonary Adenocarcinomas" Cancer Research, 2018 (arXiv:1801.09555)
PulmoScan offers a modular pipeline to detect, classify, and analyze pulmonary nodules using state-of-the-art 3D deep learning architectures and advanced radiomic feature extraction. The system processes CT scans through carefully tuned preprocessing pipelines, trains multi-stage detection and classification models, and validates improvements using comprehensive clinical metrics.
-
Advanced CT Preprocessing: Lung segmentation with Sobel edge detection, Otsu thresholding, and morphological operations
-
3D Dual Path Network (DPN) Architecture: Implements attention-enhanced 3D CNN with multi-scale feature extraction for nodule detection
-
Radiomics-Enhanced Classification: PyRadiomics-based feature extraction with Random Forest, SVC, and KNN models for malignancy prediction
-
Multi-Dataset Support: Works with LUNA16, LIDC-IDRI, and Chest CT-Scan datasets with standardized processing
-
VGG19-based Subtype Classification: Validates cancer subtypes with transfer learning achieving 92% accuracy
-
Clinical-Grade Benchmarking: Comprehensive metrics including AUC, sensitivity, specificity, and F1 scores
π pulmoscan/
βββ π data/ # Raw datasets
β βββ π luna16/
β βββ π lidc-idri/
β βββ π chest-ct/
βββ π processed/ # Preprocessed outputs
β βββ π segmented/
β βββ π normalized/
β βββ *.csv # Metadata and annotations
βββ π models/ # Saved models
β βββ π nodule_detector/ # 3D DPN models
β βββ π malignancy_classifier/ # RF, SVC, KNN models
β βββ π subtype_classifier/ # VGG19 models
βββ π experiments/ # Experimental logs and config
βββ π app/ # Flask application
β βββ π static/
β βββ π templates/
β βββ π utils/
β β βββ preprocessing.py # Segmentation and normalization
β β βββ feature_extraction.py # Radiomics and semantic features
β β βββ visualization.py # Result visualization
β βββ routes.py
β βββ __init__.py
βββ π monitoring/ # Prometheus & Grafana
β βββ π prometheus/
β βββ π grafana/
βββ π tests/ # Unit and integration tests
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
βββ main.py # Pipeline launcher
Our implementation follows a four-stage clinical pipeline:
- Data Preparation: CT scan loading, lung segmentation, Hounsfield normalization, and 3D resampling
- Nodule Detection: 3D DPN architecture with multi-scale feature extraction and region proposal
- Malignancy Classification: Radiomic feature extraction followed by ensemble ML classification
- Subtype Identification: Transfer learning with VGG19 for cancer subtype prediction
| Parameter | Value | Purpose |
|---|---|---|
| BATCH_SIZE | 16 | Optimized for GPU memory with 3D volumes |
| LEARNING_RATE | 0.0001 | Fine-tuned for stable convergence in 3D CNNs |
| INPUT_SIZE | 64Γ64Γ64 | Voxel dimensions for nodule patches |
| HU_MIN / HU_MAX | -1000 / 400 | Hounsfield unit clipping for lung tissue |
| AUGMENTATION | Rotation, Flip, Elastic | Data augmentation for robust generalization |
| NUM_FEATURES | 107 | PyRadiomics features (first-order, shape, texture) |
| ENSEMBLE_MODELS | RF, SVC, KNN | Multi-model voting for classification |
| VGG19_LAYERS | 19 | Deep feature extraction with transfer learning |
Our nodule detection implementation uses a 3D DPN architecture specifically optimized for CT volumes:
- Encoder: Multi-scale 3D convolutional layers with residual connections
- Dual Path Structure: Combines high-resolution and high-level semantic features
- Decoder: 3D transposed convolutions with skip connections
- Training: Weighted cross-entropy loss with AdamW optimizer and ReduceLROnPlateau scheduler
This approach achieves AUC β 0.91 for nodule detection, outperforming traditional 2D approaches.
All CT scans are processed with clinical-grade precision using carefully selected parameters:
- Lung Segmentation: Sobel edge detection + Otsu thresholding + morphological operations
- HU Normalization: Clipping to [-1000, 400] and scaling to [0, 1]
- Resampling: Standardized voxel spacing of 1Γ1Γ1 mmΒ³
- Patch Extraction: 64Γ64Γ64 voxel patches centered on nodule candidates
- Augmentation: 3D rotation, flipping, and elastic deformation
| Dataset | Scans | Annotations | Task | Notes |
|---|---|---|---|---|
| LUNA16 | 888 CT scans | 1,186 nodules | Detection | Grand Challenge for nodule detection |
| LIDC-IDRI | 1,018 scans | ~2,600 nodules | Classification | 4 radiologist annotations per nodule |
| Chest CT-Scan | 1,000 images | 4 classes | Subtyping | Adenocarcinoma, Squamous, Large cell, Normal |
Our 3D DPN implementation incorporates several architectural innovations:
- Multi-Scale Feature Fusion: Combines features from multiple resolution levels
- Residual Connections: Prevents gradient vanishing in deep networks
- Weighted Loss Function: Addresses class imbalance (nodule vs. non-nodule)
- Early Stopping & LR Scheduling: Ensures optimal convergence without overfitting
| Metric | Value | Clinical Significance |
|---|---|---|
| AUC-ROC | 0.91 | Excellent discrimination capability |
| Sensitivity | 87.3% | High true positive rate |
| Specificity | 89.6% | Low false positive rate |
| Precision | 0.88 | Reliable positive predictions |
| F1 Score | 0.875 | Balanced performance |
The radiomics-based classification represents our interpretable ML approach:
- Feature Extraction: 107 PyRadiomics features (first-order statistics, shape, texture)
- Semantic Features: XML-based annotations from LIDC-IDRI using PyLIDC
- Feature Selection: Recursive feature elimination with cross-validation
- Ensemble Classification: Random Forest, SVC, and KNN with voting strategy
- Validation: 5-fold stratified cross-validation for robust evaluation
| Model | Accuracy | F1 Score | Notable Strength |
|---|---|---|---|
| Random Forest | 91.2% | 0.90 | Interpretable features & fast inference |
| SVC | 88.7% | 0.87 | Strong generalization with RBF kernel |
| KNN | 85.3% | 0.84 | Simple baseline with good performance |
| 3D CNN (End-to-end) | 89.8% | 0.89 | Automated feature learning |
Our subtype classification model identifies four key lung tissue categories:
| Subtype | Precision | Recall | F1 Score | Support |
|---|---|---|---|---|
| Adenocarcinoma | 0.94 | 0.91 | 0.92 | 250 |
| Squamous Cell | 0.91 | 0.93 | 0.92 | 250 |
| Large Cell | 0.89 | 0.90 | 0.89 | 250 |
| Normal | 0.95 | 0.94 | 0.94 | 250 |
| Overall | 0.92 | 0.92 | 0.92 | 1000 |
The integrated pipeline demonstrates significant clinical value:
| Metric | Baseline | PulmoScan | Improvement |
|---|---|---|---|
| Detection Rate | 82.4% | 91.3% | +8.9% |
| False Positives | 4.2/scan | 2.1/scan | -50% |
| AUC | 0.87 | 0.91 | +4.6% |
| Approach | Accuracy | Sensitivity | Specificity | F1 Score |
|---|---|---|---|---|
| Radiologist Average | 88.5% | 84.2% | 91.3% | 0.87 |
| PulmoScan (RF) | 91.2% | 89.7% | 92.8% | 0.90 |
| Improvement | +2.7% | +5.5% | +1.5% | +0.03 |
# 1. Clone Repository
git clone https://github.com/dhouhameliane/PulmoScan
cd pulmoscan
# 2. Create Virtual Environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 3. Install Dependencies
pip install -r requirements.txt
# 4. Download Datasets
# LUNA16: https://luna16.grand-challenge.org/
# LIDC-IDRI: https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
# Place datasets in data/ directory
# 5. Run Preprocessing
python main.py --mode preprocess --dataset luna16
# 6. Train Models
python main.py --mode train --model nodule_detector
# Upload CT scan for analysis
POST /upload
Content-Type: multipart/form-data
Body: {file: CT_scan.mhd}
# Get prediction results
GET /report/<scan_id>
Response: {
"nodules_detected": int,
"malignancy_scores": [float],
"subtypes": [string],
"confidence": float
}
# Visualize 3D results
GET /visualize/<scan_id>
Response: Interactive 3D visualization
# Download diagnostic report
GET /download/<scan_id>
Response: PDF report with findings# Run all tests
pytest tests/
# Run specific test suites
pytest tests/test_preprocessing.py -v
pytest tests/test_models.py -v
pytest tests/test_api.py -v
# Generate coverage report
pytest --cov=app tests/PulmoScan includes comprehensive monitoring:
- Prometheus Metrics: Inference time, model accuracy, error rates
- Grafana Dashboards: Real-time visualization of system health
- Custom Metrics: Per-model performance tracking
- Alerting: Automated notifications for anomalies
# Key preprocessing steps
1. Load DICOM/MHD files
2. Resample to 1Γ1Γ1 mmΒ³ spacing
3. Apply lung segmentation mask
4. Clip HU values to [-1000, 400]
5. Normalize to [0, 1]
6. Extract 64Γ64Γ64 patches
7. Apply data augmentation# Radiomic features (107 total)
- First Order (19): Mean, Median, Std, Skewness, Kurtosis, etc.
- Shape (14): Volume, Surface Area, Sphericity, etc.
- Texture (74): GLCM, GLRLM, GLSZM, NGTDM, GLDM# Training configuration
- Optimizer: AdamW (weight_decay=1e-4)
- Loss: Weighted CrossEntropy
- LR Schedule: ReduceLROnPlateau (patience=5)
- Early Stopping: patience=10
- Batch Size: 16
- Epochs: 100 (with early stopping)- Integrate explainability with Grad-CAM and attention visualization
- Add multi-task learning for simultaneous detection and classification
- Enhance with transformer-based architectures (3D Vision Transformers)
- Develop real-time inference pipeline for clinical deployment
- Expand to multi-center validation with external datasets
- Implement uncertainty quantification for model predictions
- Web interface with PACS integration for clinical workflow
Project Team (ESPRIT Data Science, 2024-2025):
- Asser Aydi - Lead Developer
- Dhouha Meliane - ML Engineer & Architecture
- Harold Agbervo - Data Preprocessing
- Nouha Aouachri - Model Evaluation
- Ranim Souissi - Web Development
Supervisors:
- Ms. Sarah Zouari - Academic Supervisor
- Mr. Fares Khfecha - Technical Advisor
Licensed under MIT License.
Created by ESPRIT Data Science Team
π§ Contact: dhouhameliane@esprit.tn
- Wang, S., et al. "3D Deep Learning from CT Scans Predicts Tumor Invasiveness of Subcentimeter Pulmonary Adenocarcinomas" arXiv:1801.09555
- Setio, A. A., et al. "Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge" Medical Image Analysis, 2017
- Armato III, S. G., et al. "The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI)" Medical Physics, 2011
We thank ESPRIT for providing computational resources and the open-source community for maintaining the datasets and tools that made this project possible. Special thanks to the radiologists who contributed annotations to LIDC-IDRI and the LUNA16 challenge organizers.
β If you find this project helpful, please consider starring the repository!

