This project implements a deep learning approach for classifying Non-Small Cell Lung Carcinoma (NSCLC) into its major subtypes: Adenocarcinoma (ADC) and Squamous Cell Carcinoma (SCC). The system uses a patch-based Convolutional Neural Network (CNN) with an Expectation-Maximization (EM) algorithm to identify discriminative regions in CT scans, enabling accurate cancer subtype classification.
- Patch-based analysis of high-resolution CT scans
- Automatic identification of discriminative regions using EM algorithm
- Two-level classification approach with decision fusion
- Interactive web interface for clinical use
- High accuracy comparable to expert pathologists
This project implements the methodology described in the paper "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification." The key insight is that not all regions of a medical image are equally informative for diagnosis.
Approach:
- Patch Extraction: Extracts multiple high-resolution patches from CT scans
- EM-based Discriminative Patch Selection:
- Initially consider all patches as discriminative
- Train a CNN model to predict cancer subtypes
- Apply spatial Gaussian smoothing to probability maps
- Select patches with higher probability values as discriminative
- Iterate until convergence
Two-level Classification:
- First level: Patch-level CNN classification
- Second level: Decision fusion using logistic regression or SVM
The CNN architecture consists of:
- 5 convolutional blocks with batch normalization
- Adaptive pooling to ensure consistent feature map dimensions
- Fully connected layers for final classification
The system combines patch-level predictions using a Count-based Multiple Instance (CMI) learning approach:
- Creates histograms of patch-level predictions
- Trains a second-level classifier (logistic regression or SVM)
- Makes final image-level predictions
- Framework: PyTorch
- Dataset: NSCLC-Radiomics dataset
- Web Interface: Streamlit
- Image Processing: OpenCV, scikit-image, pydicom
Install required packages:
pip install torch torchvision streamlit opencv-python scikit-image pydicom matplotlib numpy pillow
streamlit run deploy.py
Create a .streamlit/config.toml
file with the following content:
[server]
runOnSave = false
enableStaticServing = true
[runner]
fastReruns = false
The model achieves high accuracy in distinguishing between ADC and SCC subtypes, comparable to the performance of expert pathologists. The patch-based approach with EM algorithm effectively identifies the most discriminative regions in the CT scans, improving classification accuracy.
The project includes a Streamlit web application that allows medical professionals to:
- Upload DICOM files or CT scan images
- Visualize the uploaded images
- Get cancer subtype predictions with confidence scores
- View probability distributions for different subtypes
- Extend the model to classify additional NSCLC subtypes
- Implement explainable AI techniques for better interpretability
- Integrate with hospital PACS systems for seamless clinical workflow
- Develop mobile applications for remote diagnosis
This tool is for research purposes only and should not be used for clinical diagnosis without proper validation.
This project is licensed under the MIT License - see the LICENSE file for details.
- Methodology based on the paper "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification"
- NSCLC-Radiomics dataset for providing the training and testing data