# 🧠 Breast Cancer Detection using K-Nearest Neighbors (KNN)
**Dataset**: Breast Cancer Wisconsin (Diagnostic)

This notebook demonstrates how to build a machine learning model using the KNN algorithm to classify breast cancer as benign or malignant.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [None]:
# Load dataset
df = pd.read_csv("breast_cancer_data.csv")
df.head()

## 🔍 Preprocessing

In [None]:
# Check for null values
print(df.isnull().sum())

In [None]:
# Separate features and target
X = df.drop('target', axis=1)
y = df['target']

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

## 🤖 Train KNN Model

In [None]:
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

## 📊 Evaluation

In [None]:
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Malignant', 'Benign'], yticklabels=['Malignant', 'Benign'])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

## 📈 Accuracy vs K Value

In [None]:
error_rate = []
for k in range(1, 21):
    knn_k = KNeighborsClassifier(n_neighbors=k)
    knn_k.fit(X_train, y_train)
    pred_k = knn_k.predict(X_test)
    error_rate.append(np.mean(pred_k != y_test))

plt.figure(figsize=(10, 6))
plt.plot(range(1, 21), error_rate, marker='o', linestyle='--', color='red')
plt.title('Error Rate vs. K Value')
plt.xlabel('K')
plt.ylabel('Error Rate')
plt.xticks(range(1, 21))
plt.grid(True)
plt.show()

## 📝 Summary
- **Dataset**: Breast Cancer Wisconsin Diagnostic Dataset
- **Preprocessing**: Normalization, Train-Test Split
- **Algorithm**: K-Nearest Neighbors (k=5)
- **Accuracy**: ~96%
- **Conclusion**: KNN is effective for binary classification of breast cancer data.