# Support Vector Machines (SVM)

SVM is a powerful supervised learning algorithm used for **classification** (SVC) and **regression** (SVR).

**Core Goal:** Find the optimal **Hyperplane** that best separates the classes with the **Maximum Margin**.



### 1. The Geometric Intuition
Imagine you have red and blue balls on a table. You want to place a stick (hyperplane) to separate them.
* You can put the stick in many places.
* **The Best Stick:** The one that has the widest possible gap (margin) between the red balls and the blue balls.
* **Support Vectors:** The specific data points closest to the hyperplane that "support" or define the margin. If you move other points, the boundary doesn't change. If you move a support vector, the boundary moves.

---

### 2. Hard Margin vs. Soft Margin

**1. Hard Margin (Strict):**
* Does **not** allow any errors. All data points must be correctly classified outside the margin.
* **Problem:** Only works if data is perfectly linearly separable. Highly sensitive to outliers (one outlier can ruin the model).

**2. Soft Margin (Flexible):**
* Allows some misclassifications (slack) to maintain a wider, more generalizable margin.
* **Controlled by `C`:** The regularization parameter.

---

### 3. The "Kernel Trick" (The Magic)
What if the data is not linearly separable (e.g., a red circle inside a blue ring)? You cannot draw a straight line to separate them.

**Solution:** Project the data into a **higher dimension**.
* In 2D, they overlap.
* If we lift the red center up (into 3D), we can slide a flat sheet (hyperplane) between the red and blue points.
* **Kernel Trick:** A mathematical shortcut that calculates high-dimensional relationships without actually transforming the data (saving huge computational power).



**Common Kernels:**
* **Linear:** For simple, large datasets (like text classification).
* **Polynomial:** Maps to degree `d`.
* **RBF (Radial Basis Function):** Infinite dimensions. The default and most powerful kernel for non-linear data.

---

### 4. Important Hyperparameters

**1. `C` (Regularization Parameter)**
* **High C:** Strict. Tries to classify *everything* correctly. Result: Small margin, risk of Overfitting.
* **Low C:** Loose. Accepts some errors to get a wider margin. Result: Smoother boundary, better Generalization.

**2. `Gamma` ($\gamma$) (For RBF Kernel)**
* Defines how far the influence of a single training example reaches.
* **High Gamma:** Close reach. Points must be very close to be similar. Result: Complex, "wiggly" boundary (Overfitting).
* **Low Gamma:** Far reach. Even distant points are considered similar. Result: Smooth boundary (Underfitting).

---

### 5. Pros & Cons

| Advantages | Disadvantages |
| :--- | :--- |
| **Accuracy:** Very effective in high-dimensional spaces. | **Slow:** computationally expensive for large datasets ($>100k$ rows). |
| **Memory Efficient:** Uses a subset of training points (support vectors). | **Noise:** Sensitive to overlapping classes and noise. |
| **Versatile:** Different Kernel functions for different decision boundaries. | **Black Box:** Harder to interpret probability (unlike Logistic Regression). |

---

### 6. FAQ (Interview Questions)

**Q: Why is it called "Support Vector" Machine?**
**A:** The decision boundary relies *only* on the data points closest to the line (the Support Vectors). If you remove all other data points, the boundary remains exactly the same.

**Q: How does SVM handle multi-class classification?**
**A:** SVM is natively binary. For multi-class, it uses:
* **One-vs-Rest (OvR):** Trains 1 classifier per class (Class A vs. All Others).
* **One-vs-One (OvO):** Trains a classifier for every pair ($A$ vs $B$, $B$ vs $C$, etc.).

**Q: When should I use a Linear Kernel vs. RBF?**
**A:**
* Use **Linear** if features > samples (e.g., Text Classification, DNA).
* Use **RBF** if samples > features and the relationship is non-linear.

---

### Code Implementation

```python
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# SVM requires Feature Scaling!
# C=1.0 is default. kernel='rbf' is default.
svm_model = make_pipeline(
    StandardScaler(),
    SVC(kernel='rbf', C=1.0, gamma='scale')
)

# svm_model.fit(X_train, y_train)