Skip to content

Phoenixking-04/Breast-Cancer-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

🩺 Breast Cancer Tumor Classification

End-to-End Machine Learning Pipeline

Python Scikit-learn Jupyter

Overview

Complete ML pipeline classifying breast tumors as Malignant or Benign using the Wisconsin Diagnostic dataset. Covers preprocessing, unsupervised exploration, supervised training, and clinical recommendation.

Dataset

  • Source: Breast Cancer Wisconsin (sklearn.datasets)
  • Samples: 569 | Features: 30
  • Target: Malignant (212) vs Benign (357)

🏆 Tested Results (70/30 Split)

Model Accuracy Precision Recall F1 AUC
Logistic Regression 98.2% 99.1% 98.1% 98.6% 0.998
SVM (RBF Kernel) 97.1% 98.1% 97.2% 97.7% 0.996
Decision Tree (depth=5) 97.1% 98.1% 97.2% 97.7% 0.968
KMeans Clustering 90.5%
PCA Variance Explained 63.2%

Logistic Regression recommended for clinical use: highest AUC (0.998) with only 1 false positive in 170 benign test cases.

Pipeline

  • Part A: Data loading, StandardScaler, EDA, correlation heatmap
  • Part B: PCA (2 components), KMeans clustering, 2D visualization
  • Part C: LR, SVM, Decision Tree on 70/30 split
  • Part D: Confusion matrices, ROC curves, clinical recommendation

How to Run

git clone https://github.com/Phoenixking-04/Breast-Cancer-Classification.git
pip install numpy pandas scikit-learn matplotlib seaborn jupyter
jupyter notebook breast_cancer_final.ipynb

🔗 Developer: Kalyankumar Sandireddy

About

End-to-end ML pipeline for breast tumor classification — Logistic Regression 98.2% accuracy, AUC 0.998 on Wisconsin Diagnostic dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors