**Chapter 5 --- Support Vector Machines (Revision Notebook)**
===========================================================

**Goal:** compact theory + runnable snippets. Use this as a quick revision: all important concepts are explained in short markdown blocks and demonstrated using small code examples.

Files: this notebook contains markdown theory sections and code cells. Run cells in order to reproduce models.

* * * * *

1 --- Linear SVM Classification
=============================

**Idea:**\
Find the decision boundary that **maximizes the margin** --- the distance between classes.

**Decision function:**

`wᵀx + b = 0`

**Margin:**

``` margin = 2 / ||w|| ```

Support vectors = training points closest to the margin.\
Only these points affect the decision boundary.

* * * * *

### Linear SVM Example (Iris: 2 features)

* * * * *

In [None]:
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:, (2, 3)]     # petal length, petal width
y = (iris.target == 2).astype(int)

model = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', LinearSVC(C=1, loss='hinge'))
])

model.fit(X, y)
model.predict([[5.5, 1.7]])

2 --- Hard Margin vs Soft Margin SVM
==================================

### Hard Margin

-   No violations allowed

-   Only works if data is **perfectly separable**

-   Very sensitive to noise and outliers

### Soft Margin

Use hyperparameter **C** to allow violations.

-   **Low C** → wider margin → better generalization

-   **High C** → strict margin → fits noise → overfitting

* * * * *

3 --- Feature Scaling is Critical
===============================

SVMs are **extremely sensitive to feature scale**.

Always use:

`StandardScaler`

before applying SVM.

* * * * *

4 --- Nonlinear SVM & Kernels
===========================

SVM can learn nonlinear boundaries using the **kernel trick**.

Common kernels:

-   Polynomial kernel

-   RBF (Gaussian) kernel

-   Sigmoid kernel

Kernel trick maps inputs to a higher-dimensional space **implicitly** (no manual feature engineering).

* * * * *

4.1 Polynomial Kernel Example
=============================

In [None]:
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

X, y = make_moons(n_samples=200, noise=0.2, random_state=0)

poly_svm = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='poly', degree=3, C=5))
])

poly_svm.fit(X, y)

* * * * *

4.2 RBF Kernel (Gaussian)
=========================

**Hyperparameters:**

-   `gamma` → controls curvature

-   `C` → controls softness of margin

**Interpretation:**

-   **High gamma** → tight, wiggly boundary → overfitting

-   **Low gamma** → smoother boundary → underfitting

### RBF Example

In [None]:
rbf_svm = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(kernel='rbf', C=2, gamma=0.1))
])

rbf_svm.fit(X, y)

5 --- SVM Regression (SVR)
========================

SVM can perform regression by fitting an **epsilon-insensitive tube**.

ε = width of the tube.\
Only points outside the tube contribute to the loss.

### SVR Example

In [None]:
from sklearn.svm import SVR
import numpy as np

X = np.linspace(-3, 3, 50).reshape(-1, 1)
y = 0.5 * X[:,0]**2 + np.random.randn(50)

svr = SVR(kernel='rbf', C=50, gamma=0.5, epsilon=0.1)
svr.fit(X, y)

6 --- Hinge Loss (Under the Hood)
===============================

Used in SVM classification:

`Loss = max(0, 1 - y * (wᵀx))`

-   Only penalizes points inside the margin

-   Points outside the margin contribute **zero** loss

-   Support vectors are exactly the points with positive loss

* * * * *

7 --- Practical Notes
===================

-   **Always scale features** (StandardScaler).

-   `LinearSVC` scales well for **large datasets**.

-   Kernel SVC is powerful but **slow** for large sample sizes.

-   Works extremely well in **high-dimensional** feature spaces.

-   SVMs are robust to outliers *only with soft margins* (low C).

* * * * *

8 --- Summary Table (One-Glance)
==============================

| Concept | Meaning |
| --- | --- |
| Linear SVM | Fast, works for high-dimensional data |
| Polynomial SVM | Good for polynomial-shaped boundaries |
| RBF SVM | General non-linear boundary, most popular |
| SVR | Regression with epsilon-insensitive margin |
| C | Lower C → more regularization |
| gamma | Higher gamma → more complex curves |

* * * * *

9 --- Quick Reference
===================

-   **Linear SVM** = max-margin linear classifier

-   **Soft Margin SVM** = adds flexibility with parameter C

-   **Kernel SVM** = nonlinear boundaries

-   **RBF kernel** = default best choice

-   **SVR** = regression version of SVM