# 🧠 Support Vector Machine (SVM) - Exhaustive Master Notebook

Welcome to the **most exhaustive SVM tutorial notebook**!
We cover deep theory, math, intuition, visualizations, code, hyperparameters, evaluation metrics, tuning, and more.

---

# 📍 1. Learning Roadmap

**Flow:**
```
Real-world intuition ➔ Why SVM ➔ Core Idea ➔ Math ➔ Margins ➔ Support Vectors ➔ Kernels ➔ Hinge Loss ➔ Code ➔ Error Analysis ➔ Tuning ➔ Roadmap ➔ Final Projects
```

# ✨ 2. Introduction to SVM
- SVM = **Separating data with maximum margin**.
- Used in: Face detection, OCR, Bioinformatics, Text classification (spam detection).
- Goal: Draw a decision boundary that **separates classes as wide as possible**.

# 🧠 3. Why Do We Need SVM?
- Logistic Regression struggles with complex boundaries.
- KNN sensitive to noise and scaling.
- SVM offers **robust generalization** even in high-dimensional spaces.

# 🔥 4. Core Idea Behind SVM: Maximize the Margin
- Margin = Distance between decision boundary and nearest points.
- Wider margin ➔ Better generalization ➔ Lower overfitting risk.

# 🔢 5. Mathematical Formulation
Minimize:
\[ \frac{1}{2} \|w\|^2 \]
Subject to:
\[ y_i (w \cdot x_i + b) \geq 1 \]
Where:
- \(w\) = weight vector
- \(b\) = bias
- \(y_i\) = label (+1 or -1)
- \(x_i\) = feature vector.

# 📈 6. Visualizing Hyperplanes & Margins
```
Class A (o o o)

     | Margin |
------------------- Hyperplane -------------------
     | Margin |

Class B (x x x)
```

# 🧮 7. Support Vectors Intuition
- Support vectors are closest points.
- Define decision boundary and margin.
- Without them, boundary would change!

# 🧩 8. Hard Margin vs Soft Margin

| Aspect | Hard Margin | Soft Margin |
|:---|:---|:---|
| Definition | Perfect separation | Allow some misclassifications |
| Use Case | Clean data | Noisy real-world data |
| Controlled by | Strict constraint | Penalty term (C) |

# ⚙️ 9. Hyperparameters of SVM (Exhaustive)

### C (Regularization)
- Low C ➔ Wider margin, more tolerance
- High C ➔ Narrow margin, less tolerance
- Tune via GridSearch

### Kernel
- 'linear', 'poly', 'rbf', 'sigmoid'
- Start with RBF

### Gamma
- Low gamma ➔ Far points influence more
- High gamma ➔ Only nearby points matter

### Degree (for poly)
- Degree of polynomial transformation
- Rarely > 3

# 🔁 10. Kernel Trick
- Map data into higher dimensions.
- RBF Kernel: Infinite dimensions.
- Polynomial Kernel: Explicit high-degree mapping.

# 🔍 11. Hinge Loss Function
Loss =
\[ \max(0, 1 - y_i (w \cdot x_i + b)) \]
- No penalty if correctly classified outside margin.
- Linear penalty inside margin or wrong side.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

scores = np.linspace(-2, 3, 100)
hinge_loss = np.maximum(0, 1 - scores)

plt.plot(scores, hinge_loss, label="Hinge Loss")
plt.axvline(x=1, linestyle="--", color="grey", label="Margin")
plt.title("Hinge Loss vs Score")
plt.xlabel("Score (y*(w.x+b))")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()

# 📊 12. Evaluation Metrics (Exhaustive)

| Metric | Formula | When Best |
|:---|:---|:---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | Balanced datasets |
| Precision | TP/(TP+FP) | FP costly |
| Recall | TP/(TP+FN) | FN costly |
| F1-Score | 2(Precision*Recall)/(Precision+Recall) | Imbalanced data |
| ROC-AUC | Probability ranking | Binary classification |

# 🛠️ 13. Hands-On SVM (Easy) - Iris Dataset

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build SVM
model = SVC(kernel='rbf', C=1, gamma='scale')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

# 🔎 14. Code Walkthrough
- Load Iris data.
- Split train/test.
- Scale features (important for SVM).
- Build SVM (RBF kernel).
- Train and predict.
- Evaluate.

# 🎯 15. Hyperparameter Tuning via GridSearchCV
```python
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01], 'kernel': ['rbf', 'poly']}
grid = GridSearchCV(SVC(), param_grid, verbose=2, cv=3)
grid.fit(X_train, y_train)
print(grid.best_params_)
```

# 📉 16. Error Analysis
- Plot misclassified points.
- Analyze support vectors.
- Study decision boundaries.

# 🚨 17. Common Mistakes
- No feature scaling ➔ Poor model.
- Wrong kernel choice ➔ Poor separation.
- Too small or too large C ➔ Under/Overfitting.

# 🧭 18. Where SVM Fits
- Small/Medium datasets.
- High-dimensional sparse spaces (text, bioinformatics).
- Needs scaling always.

# 🎯 19. Final Takeaways Checklist
✅ Maximize margin.
✅ Use support vectors.
✅ Tune hyperparameters carefully.
✅ Scale data always.

# 🚀 20. Mini Projects
- Handwritten digits classification (MNIST)
- Spam mail classification
- Plant disease detection

# ⚡ 21. SVM vs Logistic Regression vs Perceptron

| Feature | SVM | Logistic Regression | Perceptron |
|:---|:---|:---|:---|
| Loss | Hinge | Log Loss | Perceptron Loss |
| Goal | Maximize Margin | Maximize Likelihood | Correct Errors |
| Probabilities | No (unless calibrated) | Yes | No |

# 📜 22. Short History
- Invented by Vladimir Vapnik.
- Popularized in the 1990s.
- Rooted in Statistical Learning Theory.

# 🧹 23. Best Practices Summary
- Scale features.
- Start with RBF kernel.
- Use GridSearch.
- Watch for overfitting with C, gamma tuning.

# 🧠 24. Bonus (Advanced): Dual Form + SMO
- Dual problem: Solves for α instead of w.
- SMO algorithm: Efficient method for solving dual form for large datasets.