End-to-End Machine Learning Pipeline
Complete ML pipeline classifying breast tumors as Malignant or Benign using the Wisconsin Diagnostic dataset. Covers preprocessing, unsupervised exploration, supervised training, and clinical recommendation.
- Source: Breast Cancer Wisconsin (sklearn.datasets)
- Samples: 569 | Features: 30
- Target: Malignant (212) vs Benign (357)
| Model | Accuracy | Precision | Recall | F1 | AUC |
|---|---|---|---|---|---|
| Logistic Regression | 98.2% | 99.1% | 98.1% | 98.6% | 0.998 |
| SVM (RBF Kernel) | 97.1% | 98.1% | 97.2% | 97.7% | 0.996 |
| Decision Tree (depth=5) | 97.1% | 98.1% | 97.2% | 97.7% | 0.968 |
| KMeans Clustering | 90.5% | — | — | — | — |
| PCA Variance Explained | 63.2% | — | — | — | — |
Logistic Regression recommended for clinical use: highest AUC (0.998) with only 1 false positive in 170 benign test cases.
- Part A: Data loading, StandardScaler, EDA, correlation heatmap
- Part B: PCA (2 components), KMeans clustering, 2D visualization
- Part C: LR, SVM, Decision Tree on 70/30 split
- Part D: Confusion matrices, ROC curves, clinical recommendation
git clone https://github.com/Phoenixking-04/Breast-Cancer-Classification.git
pip install numpy pandas scikit-learn matplotlib seaborn jupyter
jupyter notebook breast_cancer_final.ipynb🔗 Developer: Kalyankumar Sandireddy