In [None]:
# === Environment Setup ===
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, Markdown, Image
from sklearn.svm import SVC
from sklearn.datasets import make_blobs, make_circles
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# --- Configuration ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'font.size': 14, 'figure.figsize': (10, 6), 'figure.dpi': 150})
np.set_printoptions(suppress=True, linewidth=120, precision=4)

# --- Utility Functions ---
def note(msg): display(Markdown(f"<div class='alert alert-info'>📝 {msg}</div>"))
def sec(title): print(f'\n{80*"="}\n| {title.upper()} |\n{80*"="}')

note("Environment initialized for Support Vector Machines.")

# Chapter 7.3: Support Vector Machines (SVMs)

---

### Table of Contents

1.  [**The Geometric Intuition: Maximal Margin Classifier**](#intro)
2.  [**Support Vectors and the Soft Margin**](#soft-margin)
3.  [**The Kernel Trick for Non-linear Data**](#kernel-trick)
4.  [**Code Lab: SVM for Classification**](#code-lab)
5.  [**When to Use SVMs**](#when-to-use)
6.  [**Summary**](#summary)

<a id='intro'></a>
## 1. The Geometric Intuition: Maximal Margin Classifier

Support Vector Machines (SVMs) are a powerful and elegant class of supervised learning algorithms. At their core, they are based on a simple geometric idea. For a linearly separable dataset, there can be infinitely many hyperplanes (lines in 2D, planes in 3D, etc.) that separate the classes.

> **Historical Context: The SVM**
> The Support Vector Machine was first introduced by Vapnik and Chervonenkis in 1963. The algorithm was further developed by Vapnik and his colleagues at AT&T Bell Laboratories in the 1990s. The SVM is a powerful tool for classification and regression, and it is one of the most widely used machine learning algorithms today.

The question is: which hyperplane is best? The SVM answers this by choosing the hyperplane that **maximizes the margin** between the classes. The margin is defined as the distance between the separating hyperplane and the closest data points from either class. This maximal margin hyperplane is considered optimal because it is the most robust to new data.

In [None]:
display(Image(filename='../images/07-Machine-Learning/svm_hyperplanes.png'))

<a id='soft-margin'></a>
## 2. Support Vectors and the Soft Margin

The data points that lie exactly on the margin are called **support vectors**. These are the most critical points in the dataset because they alone define the position and orientation of the maximal margin hyperplane. If any of these points were moved, the hyperplane would change. All other points are irrelevant.

In most real-world scenarios, data is not perfectly linearly separable. To handle this, the SVM algorithm introduces the concept of a **soft margin**. This allows for some misclassifications or points to fall within the margin. The trade-off between maximizing the margin and minimizing the classification error is controlled by a hyperparameter, often denoted as `C`. A smaller `C` creates a wider margin but allows more violations, while a larger `C` creates a narrower margin with fewer violations.

In [None]:
display(Image(filename='../images/07-Machine-Learning/svm_margin.png'))

<a id='kernel-trick'></a>
## 3. The Kernel Trick for Non-linear Data

The true power of SVMs is revealed when dealing with non-linearly separable data. The key idea is to project the data into a higher-dimensional space where it becomes linearly separable. This is done using a **kernel function**.

The **kernel trick** is a mathematical shortcut that allows us to operate in this high-dimensional space without ever having to compute the coordinates of the data in that space. We only need to compute the dot products between the images of the data points in the feature space, which is much more efficient.

Common kernels include:
- **Polynomial Kernel**: Captures polynomial relationships in the data.
- **Radial Basis Function (RBF) Kernel**: Can handle complex, non-linear relationships. It is the most commonly used kernel.
- **Sigmoid Kernel**: Similar to the activation function in neural networks.

In [None]:
from mpl_toolkits.mplot3d import Axes3D

sec("Visualizing the Kernel Trick")
# Generate concentric circles
X, y = make_circles(n_samples=500, noise=0.05, factor=0.5, random_state=42)

# Add a third dimension: z = x^2 + y^2
z = X[:, 0]**2 + X[:, 1]**2
X_3d = np.c_[X, z]

# Create the 3D plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

# Plot the two classes
ax.scatter(X_3d[y==0, 0], X_3d[y==0, 1], X_3d[y==0, 2], c='blue', marker='o', label='Class 0')
ax.scatter(X_3d[y==1, 0], X_3d[y==1, 1], X_3d[y==1, 2], c='red', marker='^', label='Class 1')

# Create a separating plane (for visualization)
xx, yy = np.meshgrid(np.linspace(-1, 1, 50), np.linspace(-1, 1, 50))
zz = np.ones_like(xx) * 0.8 # A simple plane at z=0.8
ax.plot_surface(xx, yy, zz, alpha=0.2, color='gray')

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z = X^2 + Y^2')
ax.set_title('Data Transformed to 3D Space')
ax.legend()
ax.view_init(elev=20, azim=45)
if not os.path.exists('../images/07-Machine-Learning'):
    os.makedirs('../images/07-Machine-Learning')
plt.savefig('../images/07-Machine-Learning/kernel_trick.png')
plt.close()
display(Image(filename='../images/07-Machine-Learning/kernel_trick.png'))

note("In the transformed 3D space, the classes become linearly separable by a plane.")

<a id='code-lab'></a>
## 4. Code Lab: SVM for Classification

Let's generate some non-linearly separable data and see how an SVM with an RBF kernel can effectively classify it.

In [None]:
sec("SVM with RBF Kernel")

# Generate non-linear data
X, y = make_blobs(n_samples=200, centers=2, random_state=6, cluster_std=1.1)

# Add some noise to make it more challenging
rng = np.random.RandomState(13)
X_noise = rng.randn(200, 2) * 2
X = X + X_noise

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# We instantiate the SVM classifier, specifying the RBF kernel and the regularization parameter C.
model = SVC(kernel='rbf', C=1.0, gamma='auto')
# We then fit the model to the training data.
model.fit(X_train, y_train)

# We can then use the trained model to make predictions on the test set.
y_pred = model.predict(X_test)
# We evaluate the model's performance using a classification report.
note("Classification Report for SVM with RBF Kernel:")
print(classification_report(y_test, y_pred))

<a id='when-to-use'></a>
## 5. When to Use SVMs

SVMs are particularly effective in:
- **High-dimensional spaces**: They work well even when the number of dimensions exceeds the number of samples.
- **Cases where a clear margin of separation is desirable**.
- **Memory efficiency**: They use a subset of training points (the support vectors) in the decision function.

They are less effective on very large datasets, as the training time complexity can be high.

<a id='summary'></a>
## 6. Summary

Support Vector Machines are a robust and versatile class of models. They offer a powerful, geometrically-motivated approach to classification. By leveraging the kernel trick, they can efficiently model complex, non-linear relationships. Their strength lies in finding the optimal decision boundary, making them a valuable tool for many classification tasks.