# Support Vector Machines (SVM)

Support Vector Machines (SVM) are a type of supervised machine learning algorithm that can be used for both classification and regression tasks. However, they are more commonly used for classification problems. SVMs are known for their ability to handle high-dimensional data and their versatility in modeling complex, non-linear decision boundaries.

In this tutorial, we'll explore the fundamentals of SVM, understand the mathematics behind it, and see it in action with Python.

## Linear SVM

In the context of SVM, when we talk about a linear SVM, we refer to the scenario where the data is linearly separable. This means that the two classes can be separated by a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions).

The goal of the SVM algorithm is to find the hyperplane that best separates the classes. This is the hyperplane for which the margin, the distance between the hyperplane and the nearest data point from either class, is maximized.

Mathematically, if our hyperplane is defined by the equation $w^T x + b = 0$, then the objective of the SVM is to maximize the margin $\frac{2}{||w||}$ while keeping the constraints that the samples are classified correctly.

Let's visualize this with a simple example.

In [None]:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# Generate synthetic data
X, y = make_blobs(n_samples=100, centers=2, random_state=6)

# Fit the SVM model
clf = SVC(kernel='linear', C=1000)
clf.fit(X, y)

# Plot the data points
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# Plot the decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# Plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
# Plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='k')
plt.title('Linear SVM with Decision Boundary and Support Vectors')
plt.show()

## Kernel SVM

While linear SVM works well for linearly separable data, real-world data is often not linearly separable. This is where the Kernel SVM comes into play.

The idea behind Kernel SVM is to transform the input data into a higher-dimensional space where it becomes linearly separable. This transformation is done using a kernel function. Once the data is transformed, we can then use a linear SVM to find the decision boundary.

There are several kernel functions used in practice:
- **Linear Kernel**: $K(x, x') = x^T x'$
- **Polynomial Kernel**: $K(x, x') = (1 + x^T x')^d$
- **Radial Basis Function (RBF) or Gaussian Kernel**: $K(x, x') = e^{-\gamma ||x - x'||^2}$
- **Sigmoid Kernel**: $K(x, x') = \tanh(\alpha x^T x' + c)$

Among these, the RBF kernel is the most commonly used. It can handle complex decision boundaries and works well in many scenarios.

Let's visualize how the Kernel SVM works using the RBF kernel.

In [None]:
from sklearn.datasets import make_circles

# Generate synthetic data with concentric circles
X, y = make_circles(n_samples=100, factor=0.3, noise=0.05, random_state=42)

# Visualize the data
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
plt.title('Synthetic Data with Concentric Circles')
plt.show()

In [None]:
# Fit the SVM model with RBF kernel
clf_rbf = SVC(kernel='rbf', C=1000)
clf_rbf.fit(X, y)

# Plot the data points
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# Plot the decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf_rbf.decision_function(xy).reshape(XX.shape)

# Plot decision boundary and margins
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
# Plot support vectors
ax.scatter(clf_rbf.support_vectors_[:, 0], clf_rbf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='k')
plt.title('Kernel SVM with RBF Kernel')
plt.show()

## Soft Margin and Hyperparameter Tuning

In real-world scenarios, data is often noisy and may contain outliers. In such cases, strictly maximizing the margin (as in a hard margin SVM) might not be ideal, as it can lead to a model that overfits to the training data. This is where the concept of a soft margin comes into play.

A soft margin SVM allows some misclassifications in order to achieve a better generalization to unseen data. The degree to which misclassifications are allowed is controlled by a hyperparameter, often denoted as `C`.

- **High C value**: Implies a smaller margin, which might result in a lower training error but a higher test error (potential overfitting).
- **Low C value**: Implies a larger margin, allowing some misclassifications in the training data for better generalization.

Tuning the `C` parameter is crucial for achieving the best performance with SVM. Similarly, when using Kernel SVM, the choice of kernel and its parameters (e.g., `gamma` for the RBF kernel) also need to be tuned.

Hyperparameter tuning can be done using techniques like grid search or random search combined with cross-validation.

Let's see the effect of the `C` parameter on the decision boundary using our synthetic dataset.

In [None]:
C_values = [0.1, 1, 10, 100]

plt.figure(figsize=(15, 10))

for i, C in enumerate(C_values, 1):
    # Fit the SVM model with RBF kernel and different C values
    clf_rbf = SVC(kernel='rbf', C=C)
    clf_rbf.fit(X, y)

    plt.subplot(2, 2, i)
    plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
    ax = plt.gca()
    xlim = ax.get_xlim()
    ylim = ax.get_ylim()

    # Create grid to evaluate model
    xx = np.linspace(xlim[0], xlim[1], 30)
    yy = np.linspace(ylim[0], ylim[1], 30)
    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T
    Z = clf_rbf.decision_function(xy).reshape(XX.shape)

    # Plot decision boundary and margins
    ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
    # Plot support vectors
    ax.scatter(clf_rbf.support_vectors_[:, 0], clf_rbf.support_vectors_[:, 1], s=100, facecolors='none', edgecolors='k')
    plt.title(f'Kernel SVM with RBF Kernel (C={C})')

plt.tight_layout()
plt.show()