**Breast Cancer Diagnosis Prediction using K-Nearest Neighbors (KNN) and K-Means Clustering** 
##
Classify breast cancer diagnoses into two categories: malignant (M) and benign (B).The project analyzes tumor characteristics from the Wisconsin Diagnostic Breast Cancer dataset.
##

**Objective**
##
Apply both unsupervised (K-Means) and supervised (KNN) learning techniques

Compare clustering vs classification approaches

Achieve high accuracy in cancer diagnosis prediction

Create reproducible and deployable machine learning models
##

**Import Libraries**
##
Libraries that is being used for all the approaches and stages
##

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pickle
import warnings
warnings.filterwarnings('ignore')

**Load and Process Data**
##
-  Dataset those are csv file has been loaded here and processing data by cleaning dataset.
-  Normalize Dataset for scaling between 0 to 1 using MinMax.
-  Spliting Data into 80% train, 20% test
#

In [12]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
df = pd.read_csv('../Data/dataset.csv')

df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

df.drop(columns=['id'], inplace=True, errors='ignore')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

X = df.drop(columns=['diagnosis'])
y = df['diagnosis']

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, stratify=y)
print("Data preprocessing complete.")
print(f"Training samples: {X_train.shape[0]}, Test samples: {X_test.shape[0]}")
df_preprocessed = X_scaled.copy()
df_preprocessed['diagnosis'] = y.reset_index(drop=True)

preprocess_path = '../Data/preprocess.csv'
df_preprocessed.to_csv(preprocess_path, index=False)
print(f"Saved preprocessed data to {preprocess_path}")

Data preprocessing complete.
Training samples: 455, Test samples: 114
Saved preprocessed data to ../Data/preprocess.csv
