# 📅 Day 14: KNN & SVM

## 🎯 Objective
Learn how to use K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) for classification, and compare their performance.

## 🔎 What is K-Nearest Neighbors (KNN)?
- A **lazy learning** algorithm that stores the entire dataset
- Classifies new data points based on the **majority vote** of the K closest points
- Works well for **small datasets**
- No actual training phase, only distance calculations

## 💡 What is Support Vector Machine (SVM)?
- A **supervised learning model** that tries to find the **best boundary (hyperplane)** to separate classes
- Works well for **high-dimensional data** and outlier-resistant
- Can use **linear or kernel-based boundaries**

## 📦 Step 1 – Load & Prepare Data

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 🤖 Step 2 – Train K-Nearest Neighbors Classifier

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)
y_pred_knn = knn.predict(X_test_scaled)

print('KNN Accuracy:', accuracy_score(y_test, y_pred_knn))
print(classification_report(y_test, y_pred_knn))

## 🧠 Step 3 – Train Support Vector Machine (SVM)

In [None]:
from sklearn.svm import SVC

svm = SVC(kernel='linear', probability=True, random_state=42)
svm.fit(X_train_scaled, y_train)
y_pred_svm = svm.predict(X_test_scaled)

print('SVM Accuracy:', accuracy_score(y_test, y_pred_svm))
print(classification_report(y_test, y_pred_svm))

## 📊 Step 4 – Compare ROC Curves of KNN & SVM

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

y_proba_knn = knn.predict_proba(X_test_scaled)[:, 1]
y_proba_svm = svm.predict_proba(X_test_scaled)[:, 1]

fpr_knn, tpr_knn, _ = roc_curve(y_test, y_proba_knn)
fpr_svm, tpr_svm, _ = roc_curve(y_test, y_proba_svm)
auc_knn = roc_auc_score(y_test, y_proba_knn)
auc_svm = roc_auc_score(y_test, y_proba_svm)

plt.figure(figsize=(8, 6))
plt.plot(fpr_knn, tpr_knn, label=f'KNN (AUC = {auc_knn:.2f})')
plt.plot(fpr_svm, tpr_svm, label=f'SVM (AUC = {auc_svm:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve Comparison')
plt.legend()
plt.grid(True)
plt.show()

## ✅ Summary
- **KNN** is simple and intuitive, based on distance voting
- **SVM** is powerful for high-dimensional data and can use kernels
- Use ROC Curve and AUC to compare models beyond just accuracy