
# Support Vector Machine (SVM) – Binary Classification with Scattered Data

This notebook demonstrates an SVM classifier on more randomly scattered 2D data. We'll visualize the nonlinear decision boundary and highlight support vectors.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification
from sklearn.svm import SVC

In [None]:
# Generate more scattered 2D synthetic data
X, y = make_classification(
    n_samples=100, n_features=2, n_informative=1, n_redundant=0,
    n_clusters_per_class=1, random_state=52
)
df = pd.DataFrame(X, columns=["Feature1", "Feature2"])
df["Label"] = y
df.head()

In [None]:
# Train SVM with RBF kernel for nonlinear boundaries
model = SVC(kernel='linear', probability=True)
model.fit(X, y)

In [None]:
# Create mesh grid to visualize decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500),
                     np.linspace(y_min, y_max, 500))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision boundary and support vectors
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, palette='coolwarm', edgecolor='k')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
            s=100, facecolors='none', edgecolors='k', label='Support Vectors')
plt.title("SVM (RBF) – Decision Boundary with Scattered Features")
plt.xlabel("Feature1")
plt.ylabel("Feature2")
plt.legend()
plt.tight_layout()
plt.show()

A **Support Vector Machine (SVM)** is a supervised learning algorithm primarily used for binary classification tasks. The main goal of an SVM is to find the best boundary (called a hyperplane) that separates the two classes in your data as cleanly and confidently as possible.

**Support vectors** are the critical data points that lie closest to the decision boundary (hyperplane). These points are essential because:
They define the position and orientation of the optimal separating line (or hyperplane).
The SVM only uses these points to determine the best boundary; all other data points that are further away don’t directly affect it.
You can think of them as the most “at risk” points—if they were moved or removed, the decision boundary would shift.

An SVM tries to **maximize the margin** — the distance between the separating hyperplane and the nearest support vectors from each class. 

The intuition is:
"The more space we can leave between the classes, the better the generalization will be."

In simple terms:
If the boundary is too close to one class, it might misclassify new points.
If it’s as far away as possible from both classes (i.e., maximized margin), it’s more confident and robust.