# Support Vector Machine (SVM) - In-Depth Notes

Support Vector Machine (SVM) is a powerful supervised learning algorithm used for classification and regression tasks. SVMs are particularly effective in high-dimensional spaces and when the number of dimensions exceeds the number of samples.


## 1. Core Idea
SVM tries to find the optimal hyperplane that best separates the data into different classes. For linearly separable data, it finds the hyperplane that maximizes the margin between two classes.

**Key Terms:**
- **Hyperplane**: A decision boundary that separates different classes.
- **Margin**: Distance between the hyperplane and the nearest data points from each class.
- **Support Vectors**: Data points that lie closest to the hyperplane and influence its position and orientation.

In [None]:
# Sample illustration of linear SVM
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC

# Create a sample dataset
X, y = make_blobs(n_samples=50, centers=2, random_state=6)
clf = SVC(kernel='linear')
clf.fit(X, y)

# Plot
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap='bwr')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# Create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1])
yy = np.linspace(ylim[0], ylim[1])
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# Plot decision boundary and margins
plt.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
            linestyles=['--', '-', '--'])

# Plot support vectors
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
            linewidth=1, facecolors='none', edgecolors='k')
plt.title("SVM with Linear Kernel")
plt.show()

## 2. Kernel Trick
For non-linearly separable data, SVM uses a technique called the **kernel trick** to transform data into a higher-dimensional space where it becomes linearly separable.

**Popular Kernels:**
- Linear
- Polynomial
- Radial Basis Function (RBF)
- Sigmoid

In [None]:
# Example with non-linear kernel (RBF)
from sklearn.datasets import make_circles

X, y = make_circles(n_samples=100, factor=.1, noise=.1)
clf = SVC(kernel='rbf', C=1)
clf.fit(X, y)

# Plot
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap='bwr')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

xx = np.linspace(xlim[0], xlim[1])
yy = np.linspace(ylim[0], ylim[1])
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

plt.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
            linestyles=['--', '-', '--'])
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
            linewidth=1, facecolors='none', edgecolors='k')
plt.title("SVM with RBF Kernel")
plt.show()

## 3. Important Parameters
- `C`: Regularization parameter. A small `C` makes the margin wider but allows misclassification. A large `C` tries to classify all training examples correctly.
- `kernel`: Specifies the kernel type (`linear`, `poly`, `rbf`, `sigmoid`).
- `gamma`: Kernel coefficient for `rbf`, `poly`, and `sigmoid`. Defines how far the influence of a single training example reaches.
- `degree`: Degree of the polynomial kernel function (`poly`).

## 4. Pros and Cons
**Pros:**
- Effective in high-dimensional spaces.
- Memory efficient (uses support vectors).
- Versatile with different kernel functions.

**Cons:**
- Not suitable for large datasets (high training time).
- Choosing the right kernel and parameters can be complex.
- Less effective when data is heavily noisy and overlapping.