# Support Vector Machines

Support Vector Machines (SVM) represent a versatile and potent paradigm in machine learning, adeptly handling both classification and regression tasks. The central tenet of SVM revolves around the discovery of an optimal hyperplane within a high-dimensional feature space, strategically demarcating data points belonging to disparate classes. This hyperplane's selection hinges on the maximization of the margin, which denotes the spatial interval between the hyperplane and the nearest data points emanating from each distinct class. Thus, SVM operates as a finely calibrated equilibrium between achieving high-fidelity alignment with training data and fostering robust extrapolation to novel, previously unseen data instances [Scholkopf and Smola, 2018, scikit-learn Developers, 2023].

## Basic principles of SVM

- **Hyperplane**: A discernment boundary that partitions data instances of divergent classes, essentially serving as a segregation threshold.
- **Support Vectors**: Data instances in closest proximity to the hyperplane, wielding significant influence over its placement and orientation.
- **Margin**: The spatial expanse between the hyperplane and the closest support vectors, acting as a buffer zone against potential misclassification.
- **Kernel Trick**: An ingenious technique empowering SVM to navigate intricate non-linear data configurations by effectively remapping data points to an augmented dimensional space.
- **C Parameter**: Governs the compromise between maximizing the margin's breadth and minimizing classification errors, thereby shaping SVM's risk appetite.
- **Kernel Functions**: Distinct kernel functions (e.g., linear, polynomial, radial basis function) pivotal in shaping the curvature and structure of the resultant decision boundary.

## Support Vector Classifiers


Support Vector Classifiers (SVCs) are a fundamental tool in binary classification problems, where the objective is to separate two classes, typically labeled as -1 and 1, by finding an optimal hyperplane. The mathematical representation of a hyperplane is as follows [James et al., 2023, Kuttler and Farah, 2020]:

\begin{equation} \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_p x_p = 0 \end{equation}

In this equation:
- $ \beta_0, \beta_1, \ldots, \beta_p $ are coefficients that define the orientation and position of the hyperplane.
- $ x_1, x_2, \ldots, x_p $ are the features of the data point.

<font color='Blue'><b>Example:</b></font> Consider a two-dimensional space defined by the X-Y plane. A concrete example of a hyperplane within this context could be represented by the equation [James et al., 2023]:

\begin{equation} 1 + 2X_1 + 3X_2 = 0\end{equation}

In [None]:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('https://raw.githubusercontent.com/HatefDastour/ENGG_680/main/Files/mystyle.mplstyle')

# Create a figure and axis with a specific size
fig, ax = plt.subplots(figsize=(5, 5))

# Generate a range of values for X1
X1 = np.linspace(-1.5, 1.5, 1000)

# Calculate corresponding values for X2 based on the hyperplane equation -(1 + 2*X1)/3
X2 = -(1 + 2*X1)/3

# Plot the hyperplane
ax.plot(X1, X2)

# Set labels, limits, and aspect ratio for X and Y axes
ax.set(xlabel=r'$X_1$', ylabel=r'$X_2$', xlim=[-1.5, 1.5], ylim=[-1.5, 1.5], aspect=1)

# Fill the region where 1 + 2X1 + 3X2 > 0 with a light green color and annotate
ax.fill_between(X1, -1.5, -(1 + 2*X1)/3, color='LimeGreen', alpha=0.1)
ax.annotate(r'$1 + 2X_1 + 3X_2 > 0$', xy=(-0.5, 0.5), fontsize='xx-large')

# Fill the region where 1 + 2X1 + 3X2 < 0 with an orange color and annotate
ax.fill_between(X1, -(1 + 2*X1)/3, 1.5, color='Orange', alpha=0.1)
ax.annotate(r'$1 + 2X_1 + 3X_2 < 0$', xy=(-0.9, -1), fontsize='xx-large')
ax.grid(False)
plt.tight_layout()

By executing this code snippet, a vivid graphical representation of a hyperplane within a two-dimensional space emerges. The code generates a plot where the hyperplane, defined by the equation $1 + 2X_1 + 3X_2 = 0$, partitions the plane into regions where this inequality holds true and where it does not. This not only exemplifies the concept of a hyperplane in a 2D plane but also underscores the utility of programming tools in visualizing mathematical constructs.

## Formulating the Maximal Margin Classifier

In this section, we delve into the procedure for constructing the maximal margin hyperplane based on a set of *n* training observations *$x_1, ..., x_n \in \mathbb{R}^p$*, accompanied by corresponding class labels *$y_1, ..., y_n \in \{-1, 1\}$*. The core objective of a Support Vector Classifier (SVC) lies in the maximization of the margin (*$M$*), which symbolizes the perpendicular distance between the hyperplane and the nearest data points from each class. The essence of the optimization problem is to pinpoint the optimal coefficients *$\beta_0, \beta_1, ..., \beta_p$* that achieve the maximum *$M$*, under specific constraints for each training observation (*$i$*) [James et al., 2023].

Mathematically, the optimization problem is formalized as follows:

\begin{equation}
y_i (\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip}) \geq M (1 - \epsilon_i)
\end{equation}

Where:
- *$y_i$* denotes the class label of the *$i$*-th observation, taking values -1 or 1.
- *$x_{i1}$*, *$x_{i2}$*, ..., *$x_{ip}$* represent the features of the *$i$*-th observation.
- *$\epsilon_i$* acts as a slack variable that accommodates potential misclassification. It assumes a value of zero for accurately classified points and a positive value for misclassified points.

These constraints collectively ensure that each observation resides on the appropriate side of the hyperplane, with a margin of at least *$M$*. Incorporating slack variables (*$\epsilon_i$*) introduces adaptability, allowing certain observations to exist within the margin or on the incorrect side of the hyperplane.

Another significant constraint, contributing to coefficient normalization, is captured by:

\begin{equation}
\sum_{j=1}^{p} \beta_j^2 = 1
\end{equation}

This constraint mitigates the dominance of individual features in determining the classification outcome.

Furthermore, the regularization parameter *$C$* plays a pivotal role in balancing the trade-off between maximizing the margin *$M$* and minimizing misclassifications:

\begin{equation}
\sum_{i=1}^{n} \epsilon_i \leq C
\end{equation}

The optimization challenge centers on identifying coefficients *$\beta_0, \beta_1, ..., \beta_p$* and slack variables *$\epsilon_i$* that satisfy these constraints while concurrently maximizing the margin *$M$*. This formulation culminates in the establishment of a hyperplane that effectively segregates classes, accounting for the interplay between margin and misclassification.

Through the resolution of this optimization problem, a Support Vector Classifier determines the optimal hyperplane for class separation within the feature space. Notably, the observations positioned closest to this hyperplane, known as support vectors, wield substantial influence over the placement and margin of the hyperplane.

In [None]:
import numpy as np
from sklearn.svm import SVC

# Sample data (features and class labels)
X = np.array([[2, 1], [3, 3], [4, 3], [5, 4], [6, 5], [7, 5], [8, 6], [9, 7]])
y = np.array([-1, -1, -1, -1, 1, 1, 1, 1])

# Create a Support Vector Classifier
svc = SVC(kernel='linear')

# Fit the model to the data
svc.fit(X, y)

# Get the support vectors
support_vectors = svc.support_vectors_
print("Support Vectors:")
for i in range(support_vectors.shape[1]):
    print(f'\t\t{support_vectors[:, i]}')

# Get the coefficients (beta values) and the intercept (beta_0)
coefficients = svc.coef_
intercept = svc.intercept_
print("Coefficients (Beta values):\t\t", coefficients)
print("Intercept (Beta_0):\t\t\t", intercept)

# Get the margin (M)
margin = 2 / np.linalg.norm(coefficients)
print(f'Margin (M):\t\t\t\t {margin:+.3f}')

# Get the slack variables (epsilon values)
dual_coefs = svc.dual_coef_[0]
epsilon_values = 1 - dual_coefs
print("Slack Variables (Epsilon values):\t", epsilon_values)

# Get the regularization parameter (C)
C = 1 / (2 * svc.C)
print("Regularization Parameter (C):\t\t", C)

---

<font color='Red'><b>Note:</b></font>

Note that in the above code:

* The `support_vectors` variable contains the support vectors that were identified by the Support Vector Classifier (SVC) during the fitting of the model. Support vectors are the data points from the training dataset that are closest to the decision boundary (hyperplane) and have the most influence on defining the decision boundary. These vectors "support" the position of the decision boundary and, as a result, are crucial in the classification process. Support vectors are important in SVM (Support Vector Machine) algorithms because they determine the margin, which is the distance between the decision boundary and the nearest data points. In the context of the code you've provided, `support_vectors` will hold the coordinates of these important data points in your feature space.

* The variable `coefficients` stores the coefficients associated with the features in the Support Vector Classifier (SVC) model. These coefficients represent the weights or importance of each feature in the decision boundary equation. For an SVC model with a linear kernel, the decision boundary is a hyperplane, and the coefficients represent the normal vector to this hyperplane. In other words, they define the orientation of the hyperplane in the feature space. In the context of your code, `coefficients` contains the coefficients for each feature dimension, and you can use them to understand the relative importance of each feature in the classification decision. These coefficients are determined during the training of the SVC model and are essential for making predictions.

* The variable `intercept` stores the intercept of the decision boundary in the Support Vector Classifier (SVC) model. The intercept represents the offset of the decision boundary from the origin in the feature space. For an SVC model with a linear kernel, the decision boundary is a hyperplane defined by the coefficients and the intercept. The intercept determines the position of this hyperplane along the feature space's axis that is orthogonal to the hyperplane. In simpler terms, the intercept shifts the decision boundary back and forth along this axis. If we have a positive intercept, it means the decision boundary is shifted in one direction, and if you have a negative intercept, it's shifted in the opposite direction. In the code, `intercept` contains the intercept value for the decision boundary and is determined during the training of the SVC model. It plays a crucial role in the classification process.

* In the above code, the variable `margin` calculates the margin of the Support Vector Classifier (SVC) model. The margin represents the distance between the decision boundary (hyperplane) and the nearest support vectors. It is a measure of how well the model can separate the classes. The formula used to calculate `margin` is as follows:

    ```python
    margin = 2 / np.linalg.norm(coefficients)
    ```

* In the above code, the comments indicate that we are obtaining the slack variables (epsilon values) in the context of a Support Vector Classifier (SVC) model. Let's break down what these lines of code are doing:

    1. `dual_coefs = svc.dual_coef_[0]`: In an SVM, the dual coefficients represent the Lagrange multipliers associated with the support vectors. These coefficients indicate the importance of each support vector in determining the location and orientation of the decision boundary. By accessing `svc.dual_coef_`, you are retrieving the dual coefficients for the support vectors. The `[0]` indexing suggests that you are extracting the dual coefficients for the first class (typically the positive class) of the binary classification problem.

    2. `epsilon_values = 1 - dual_coefs`: The variable `epsilon_values` is then calculated. Epsilon values, also known as slack variables, are introduced in soft-margin SVMs. They measure the degree to which a data point is allowed to be on the wrong side of the decision boundary while still satisfying the margin constraints. In this calculation, you are computing the epsilon values by subtracting the dual coefficients from 1. A value close to 1 indicates that the corresponding support vector is far from the margin, while a value close to 0 indicates that the support vector is very close to or on the wrong side of the margin.

    These values are useful for understanding the margin and classification properties of the SVM. A larger epsilon value indicates a greater margin violation by the corresponding support vector, and a smaller value indicates that the support vector is correctly classified or is very close to the margin.

---

In [None]:
from sklearn.inspection import DecisionBoundaryDisplay
from matplotlib.colors import ListedColormap

colors = ["#f5645a", "#b781ea"]
edge_colors = ['#8A0002', '#3C1F8B']
cmap_light = ListedColormap(['#f7dfdf', '#e3d3f2'])

# Create a figure and axis
fig, ax = plt.subplots(figsize=(7, 6))

DecisionBoundaryDisplay.from_estimator(svc, X, cmap=cmap_light, ax=ax,
                                       response_method="predict", plot_method="pcolormesh",
                                       xlabel='Feature 1', ylabel='Feature 2', shading="auto")

# Define labels and markers for different classes
class_labels = [1, -1]
markers = ['o', 's']

for label, marker in zip(class_labels, markers):
    class_data = X[y == label]
    ax.scatter(class_data[:, 0], class_data[:, 1], fc=colors[label == 1],
               ec=edge_colors[label == 1], label=str(label), marker=marker)

# Plot support vectors
support_vectors = svc.support_vectors_
ax.scatter(support_vectors[:, 0], support_vectors[:, 1], s=200,
           facecolors='none', edgecolors='k', lw=2, label='Support Vectors')

ax.legend(loc = 'lower right')
ax.set_title("""Support Vector Classifier""", fontweight='bold', fontsize=16)
ax.grid(False)

plt.tight_layout()

In the context of binary classification using a Support Vector Machine (SVM) with a linear kernel, it is possible to retrieve the coefficients associated with the decision boundary. These coefficients are indicative of the feature weights that define the orientation of the decision boundary, often referred to as a hyperplane. The equation representing the decision boundary takes the form:

\begin{equation}w_1 \cdot x_1 + w_2 \cdot x_2 + b = 0\end{equation}

Here, $w_1$ and $w_2$ represent the coefficients, $x_1$ and $x_2$ denote the features, and $b$ stands for the intercept.

To access the coefficients $w_1$ and $w_2$, you can utilize `svc.coef_`, and to obtain the intercept $b$, you may employ `svc.intercept_` from your SVM model denoted as `svc`. The procedure for accessing these values is as follows:

In [None]:
# Hyperplane equation
w1, w2 = svc.coef_[0]
b = svc.intercept_[0]
equation = f'${w1:+.3f} \cdot x_1 {w2:+.3f} \cdot x_2 {b:+.3f} = 0$'

# Display the equation using LaTeX
from IPython.display import Latex, display
display(Latex(equation))

In [None]:
from sklearn.inspection import DecisionBoundaryDisplay
from matplotlib.colors import ListedColormap

colors = ["#f5645a", "#b781ea"]
edge_colors = ['#8A0002', '#3C1F8B']
cmap_light = ListedColormap(['#f7dfdf', '#e3d3f2'])

# Create a figure and axis
fig, ax = plt.subplots(figsize=(7, 6))

# Decision boundary display
DecisionBoundaryDisplay.from_estimator(svc, X, cmap=cmap_light, ax=ax, response_method="predict", plot_method="pcolormesh", xlabel='Feature 1', ylabel='Feature 2', shading="auto")

# Define labels and markers for different classes
class_info = [(1, 'o', colors[1]), (-1, 's', colors[0])]

for label, marker, color in class_info:
    class_data = X[y == label]
    ax.scatter(class_data[:, 0], class_data[:, 1], fc=color, ec=edge_colors[label == 1],
               label=str(label), marker=marker)

# Plot support vectors
support_vectors = svc.support_vectors_
ax.scatter(support_vectors[:, 0], support_vectors[:, 1], s=200, facecolors='none',
           edgecolors='k', lw=2, label='Support Vectors')

# Decision boundary line
w1, w2 = svc.coef_[0]
b = svc.intercept_[0]
line_x = np.linspace(ax.get_xlim()[0], ax.get_xlim()[1], 100)
line_y = (-w1 / w2) * line_x - (b / w2)
ax.plot(line_x, line_y, color='black', linestyle='--', label= f'Decision Boundary: {equation}')

# Plot settings
ax.legend(loc = 'lower right')
ax.set_title("""Support Vector Classifier""", fontweight='bold', fontsize=16)
ax.grid(False)
ax.set_ylim(0, 8)
plt.tight_layout()

## Classification with Non-Linear Decision Boundaries

Support Vector Machines (SVMs) represent a significant leap beyond the conventional support vector classifier, enabling the handling of intricate non-linear decision boundaries in classification tasks. This transformative advancement stems from a novel approach that expands the feature space, providing a more comprehensive representation of the underlying data structure [James et al., 2023].

### Augmented Feature Space

In contrast to the reliance on *$p$* traditional features:

\begin{equation}
X_1, X_2, \ldots , X_p.
\end{equation}

SVMs introduce a revolutionary paradigm by extending the support vector classifier to function within an augmented feature space encompassing *$2p$* features:

\begin{equation}
X_1, X_1^2, X_2, X_2^2, \ldots , X_p, X^2_p.
\end{equation}

This augmentation equips SVMs to adeptly address intricate classification complexities characterized by non-linear relationships [James et al., 2023].

### Mathematical Formulation

The foundation of this advancement resides in the core optimization problem of SVMs, defined as follows:

\begin{align}
&\text{Maximize the margin } M \\
&\text{subject to:} \\
&\begin{cases}
\displaystyle{y_i \left(\beta_0 + \sum_{j=1}^{p}\beta_{j1} x_{ij}+\sum_{j=1}^{p}\beta_{j2} x_{ij}^2 \right)
\geq M (1-\epsilon_i)}
\\
\displaystyle{\sum_{j=1}^{p} \sum_{k=1}^{2} \beta_{jk}^2 = 1}
\\
\displaystyle{\epsilon_i \geq 0}
\\
\displaystyle{\sum_{i=1}^{n} \epsilon_i \leq C.}
\end{cases}
\end{align}

Key Insights from this Formulation:

- The ultimate objective remains the maximization of the margin (*$M$*), embodying the distinction between classes.
- The introduction of quadratic terms (*$x_{ij}^2$*) for each of the *$p$* original features enhances predictive capacity, enabling SVMs to capture intricate non-linear relationships inherent in the data.
- Analogous to the conventional support vector classifier, the conditions ensure that training observations (*$x_i$*) consistently align with the correct side of their respective hyperplanes. The incorporation of slack variables (*$\epsilon_i$*) accommodates potential misclassifications.
- A noteworthy addition lies in the utilization of coefficients (*$\beta_{j1}$* and *$\beta_{j2}$*) to capture both linear and quadratic facets of the transformed features. This versatility empowers SVMs to delineate non-linear decision boundaries.
- The constraint *$\sum_{j=1}^{p} \sum_{k=1}^{2} \beta_{jk}^2 = 1$* ensures the appropriate scaling of the augmented hyperplane within the expanded feature space.
- In alignment with its role in the original support vector classifier, the regularization parameter *$C$* retains its significance in balancing the optimization trade-off between maximizing the margin and permitting controlled misclassification.

<font color='Blue'><b>Example</b></font>: In this code example, a Decision Tree Classifier is utilized to illustrate decision boundaries on synthetic data. The synthetic dataset is generated using the `make_blobs` function from scikit-learn, designed for creating artificial datasets for various machine learning experiments. This particular dataset consists of the following characteristics:

- **Number of Samples:** 1000
- **Number of Features:** 2
- **Number of Classes:** 2
- **Random Seed (random_state):** 0
- **Cluster Standard Deviation (cluster_std):** 1.0

**Features:**
- The dataset contains 1000 data points, each described by a pair of feature values. These features are represented as 'Feature 1' and 'Feature 2'.

**Outcome (Target Variable):**
- The dataset also includes a target variable called 'Outcome.' This variable assigns each data point to one of two distinct classes, identified as 'Class 0' and 'Class 1'.

The dataset has been designed to simulate a scenario with two well-separated clusters, making it suitable for binary classification tasks. Each data point in this dataset is associated with one of the two classes, and it can be used for practicing and evaluating machine learning algorithms that deal with binary classification problems.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn import svm
from sklearn.inspection import DecisionBoundaryDisplay
from matplotlib.colors import ListedColormap

# Generate synthetic data
X, y = make_blobs(n_samples=1000, centers=2, random_state=0, cluster_std=1.0)

# Define colors and markers for data points
colors, markers = ["#f44336", "#40a347"], ['o', 's']
cmap_ = ListedColormap(colors)
# Titles for the subplots
titles = ['Linear Decision Boundaries', 'Non-Linear Decision Boundaries']

# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(9.5, 5))
axes = axes.ravel()

# Create SVM models and visualize decision boundaries
for ax, deg in zip(axes, [1, 2]):
    # Create SVM model with specific parameters
    clf = svm.SVC(kernel="poly", degree=deg, gamma="auto", C=1).fit(X, y)

    # Scatter plot of data points
    for num in np.unique(y):
        ax.scatter(X[:, 0][y == num], X[:, 1][y == num], c=colors[num],
                   s=40, edgecolors="k", marker=markers[num], label=str(num))

    # Display decision boundaries
    disp = DecisionBoundaryDisplay.from_estimator(clf, X, response_method="predict",
                                                  cmap=cmap_, alpha=0.2, ax=ax,
                                                  xlabel=r'$X_{1}$', ylabel=r'$X_{2}$')

    # Set the title for the subplot
    ax.set_title(f'Classification with {titles.pop(0)}', weight='bold')
    ax.grid(False)
    ax.legend(title='Class', fontsize=12)

# Add a title to the entire figure
plt.suptitle("SVM Poly Classifiers Comparison", fontsize=16, fontweight='bold')
# Adjust layout
plt.tight_layout()

## The Support Vector Machine: Kernelized Enhancement

### Linear Support Vector Classifier

The linear support vector classifier can be expressed as follows [James et al., 2023]:

\begin{align}
f(x) = \beta_0 + \sum_{i\in \mathcal{S}} \alpha_i\langle x, x_i\rangle,
\end{align}

Where:
- $f(x)$ represents the decision function that outputs a classification score for the input feature vector *$x$*.
- $\beta_0$ is the intercept term.
- $\alpha_i$ are the coefficients associated with the support vectors *$x_i$*.
- $\langle x, x_i\rangle$ denotes the inner product between the input vector *$x$* and the support vector *$x_i$*.
- The set $\mathcal{S}$ contains indices corresponding to the support vectors, which are the training samples that lie closest to the decision boundary.

---
<font color='Red'><b>Remark:</b></font>


Recall that the inner product of two *r*-vectors *$a$* and *$b$* is formally defined as $\langle a, b \rangle = \sum_{i=1}^{r} a_i b_i$, where the sum extends over *$r$* components [James et al., 2023].

---

### Kernelized Format

In a more generalized form, we replace the inner product with a function denoted as the **kernel**:

\begin{align}
K(x_i, x_{i'}) = K(x_{i'}, x_i),
\end{align}

Here, *$K$* is a function that computes the similarity between two data points *$x_i$* and *$x_{i'}$*. This kernelized representation enables us to express *$f(x)$* as [James et al., 2023]:

\begin{align}
f(x) = \beta_0 + \sum_{i\in \mathcal{S}} \alpha_iK(x_i, x_{i'}).
\end{align}

Various types of kernels can be harnessed in SVMs to capture different types of relationships between data points. Here are some illustrative examples:

- **Linear Kernel:** $\displaystyle{K(x_i, x_{i'}) = \sum_{j=1}^{p} x_{ij}x_{i'j}}$
- **Polynomial Kernel:** $\displaystyle{K(x_i, x_{i'}) = \left(1+\sum_{j=1}^{p} x_{ij}x_{i'j}\right)^d}$
- **Radial Kernel (RBF):** $\displaystyle{K(x_i, x_{i'}) = \exp\left(-\gamma \sum_{j=1}^{p} (x_{ij}-x_{i'j})^2\right)}$

where $d$ and $\gamma$ are hyperparameters that influence the shape and flexibility of the decision boundary.

Scikit-learn's SVM implementation provides flexibility in choosing the appropriate kernel and tuning hyperparameters to achieve the best classification performance for different types of data. The classifier aims to maximize the margin while ensuring accurate classification and controlled misclassification, guided by the chosen kernel's ability to capture the data's underlying structure.

---

<font color='Red'><b>Note:</b></font>


In the context of the provided text, $x'$ is used as a placeholder to represent another data point. It's not a specific variable; rather, it's a general notation to indicate that the kernel function $K(x_i, x_{i'})$ computes the similarity between the data point $x_i$ and some other data point $x_{i'}$.

So, $x_i$ and $x_{i'}$ are two different data points, and $K(x_i, x_{i'})$ measures the similarity between them using the chosen kernel function. The notation $x'$ is used to emphasize that this kernel function can be applied to any pair of data points in the dataset, not just $x_i$ and $x_{i'}$). It's a way to express the general concept of similarity between data points in the context of Support Vector Machines (SVM) and kernelized representations.

---

<font color='Blue'><b>Example:</b></font> In this example, we're using SVM classifiers along with matplotlib to demonstrate how various SVM kernels behave on the [Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set). We aim to show their decision boundaries in a grid of four smaller plots (2x2).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, svm
from sklearn.inspection import DecisionBoundaryDisplay
from matplotlib.colors import ListedColormap

# Load data
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

# Create SVM models
models = [svm.SVC(kernel="linear", C=1.0),
          svm.SVC(kernel="rbf", gamma=0.7, C=1.0),
          svm.SVC(kernel="poly", degree=3, gamma="auto", C=1.0),
          svm.SVC(kernel="poly", degree=5, gamma="auto", C=1.0)]

# Feature labels
xlabel, ylabel = [x.title().replace('Cm','CM') for x in iris.feature_names[:2]]

# Fit models to data
models = [clf.fit(X, y) for clf in models]

# Titles for the plots
titles = ["LinearSVC (linear kernel)",
          "SVC with RBF kernel",
          "SVC with polynomial (degree 3) kernel",
          "SVC with polynomial (degree 5) kernel"]

# Define colors and markers
colors = ["#f44336", "#40a347", '#0086ff']
edge_colors = ["#cc180b", "#16791d", '#11548f']
markers = ['o', 's', 'd']
cmap_ = ListedColormap(colors)

# Create 2x2 grid for plotting
fig, axes = plt.subplots(2, 2, figsize=(9.5, 9))

# Create a dictionary to map target names to numbers
target_names = {0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'}

# Plot decision boundaries
for clf, title, ax in zip(models, titles, axes.flatten()):
    disp = DecisionBoundaryDisplay.from_estimator(clf, X, response_method="predict", cmap=cmap_,
                                                  alpha=0.2, ax=ax, xlabel=xlabel, ylabel=ylabel)

    # Scatter plot of data points with target names
    for num in np.unique(y):
        ax.scatter(X[:, 0][y == num], X[:, 1][y == num], c=colors[num],
                   s=40, ec=edge_colors[num], marker=markers[num], label=target_names[num])

    ax.set_title(title, weight='bold')
    ax.grid(False)
    ax.legend(title = 'Flower Type', fontsize = 12)

# Add a title
plt.suptitle("SVM Classifier Comparison", fontsize=16, fontweight='bold')

# Adjust layout
plt.tight_layout()

## sklearn's  SVC and SVR

Support Vector Machine (SVM), a prominent machine learning algorithm accessible within the scikit-learn (sklearn) library, exhibits versatility and finds utility in both classification and regression tasks [scikit-learn Developers, 2023]. Within the sklearn SVM module, two primary classes are available:

* **Support Vector Classification (SVC):**
  SVC specializes in addressing classification tasks. Its core aim is to identify the optimal hyperplane that effectively segregates data points into distinct classes, with the additional goal of maximizing the margin separating these classes. The definition of this hyperplane is influenced by a subset of data points referred to as support vectors. Furthermore, SVC can adeptly address both linear and non-linear classification challenges through the application of various kernel functions, including linear, polynomial, radial basis function (RBF), and sigmoid kernels [scikit-learn Developers, 2023].

* **Support Vector Regression (SVR):**
  In contrast to SVC, SVR is tailored to regression tasks. Instead of seeking a hyperplane for class separation, SVR aims to determine a hyperplane that minimizes the error between predicted values and actual target values, all while considering a predefined margin of tolerance. Similar to SVC, SVR possesses the capability to handle non-linear regression tasks through the use of kernel functions [scikit-learn Developers, 2023].

The choice of kernel and hyperparameters is a pivotal decision, contingent on the data's characteristics and the specific problem at hand. Rigorous data preprocessing and comprehensive model evaluation are indispensable for achieving optimal performance within the context of academic or research endeavors.

<font color='Blue'><b>Example:</b></font>
In this example, we focus on the Auto MPG dataset, which is sourced from the [UCI Machine Learning Repository](http://archive.ics.uci.edu/dataset/9/auto+mpg). The aim is to demonstrate the use of Support Vector Regression (SVR) with this dataset.

| **Feature**    | **Description**                                                                                                |
|----------------|----------------------------------------------------------------------------------------------------------------|
| MPG            | Fuel efficiency in miles per gallon. Higher values indicate better fuel efficiency.                        |
| Cylinders      | Number of engine cylinders, indicating engine capacity and power. Common values: 4, 6, and 8 cylinders.    |
| Displacement   | Engine volume in cubic inches or cubic centimeters, reflecting engine size and power. Higher values mean more power. |
| Horsepower     | Engine horsepower, measuring its ability to perform work. Higher values indicate a more powerful engine. |
| Weight         | Vehicle mass in pounds or kilograms, influencing fuel efficiency. Lighter vehicles tend to have better MPG. |
| Acceleration   | Vehicle's acceleration performance, usually measured in seconds to reach 60 mph (or 100 km/h) from a standstill. |
| Model Year     | Year of vehicle manufacturing, useful for tracking technology and efficiency trends. |
| Origin         | Country or region of vehicle origin, often a categorical variable. Values: 1 (USA), 2 (Germany), 3 (Japan), and more. |
| Car Name       | Name or model of the car, useful for identification and categorization of different car models. |

In [None]:
# Download the zip file using wget
!wget -N "http://archive.ics.uci.edu/static/public/9/auto+mpg.zip"

# Unzip the downloaded zip file
!unzip -o auto+mpg.zip auto-mpg.data

# Remove the downloaded zip file after extraction
!rm -r auto+mpg.zip

In [None]:
import pandas as pd
# You can download the dataset from: http://archive.ics.uci.edu/static/public/9/auto+mpg.zip

# Define column names based on the dataset's description
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model_Year', 'Origin', 'Car_Name']

# Read the dataset with column names, treating '?' as missing values, and remove rows with missing values
auto_mpg_df = pd.read_csv('auto-mpg.data', names=column_names,
                          na_values='?', delim_whitespace=True).dropna()

# Remove the 'Car_Name' column from the DataFrame
auto_mpg_df = auto_mpg_df.drop(columns=['Car_Name']).reset_index(drop= True)

auto_mpg_df['MPG'] = np.log(auto_mpg_df['MPG'])
auto_mpg_df.rename(columns = {'MPG' : 'ln(MPG)'}, inplace = True)

# Display the resulting DataFrame
display(auto_mpg_df)

In [None]:
import numpy as np
from sklearn import metrics
from sklearn.model_selection import KFold
from sklearn import svm

# Prepare the data
X = auto_mpg_df.drop('ln(MPG)', axis=1)  # Features
y = auto_mpg_df['ln(MPG)']  # Target variable

# Initialize KFold cross-validator
n_splits = 5
kf = KFold(n_splits=n_splits, shuffle=True, random_state= 0)

# Lists to store train and test scores for each fold
train_r2_scores, test_r2_scores, train_MSE_scores, test_MSE_scores = [], [], [], []

svr = svm.SVR(kernel='rbf')  # Create a Support Vector Regression model with an RBF kernel

def _Line(n = 80):
    print(n * '_')
    
def print_bold(txt):
    _left = "\033[1;34m"
    _right = "\033[0m"
    print(_left + txt + _right)
    
# Perform Cross-Validation
for train_idx, test_idx in kf.split(X):
    X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]  # Extract training and testing subsets by index
    y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
    svr.fit(X_train, y_train)  # Train the SVR model
    # train
    y_train_pred = svr.predict(X_train)
    train_r2_scores.append(metrics.r2_score(y_train, y_train_pred))
    train_MSE_scores.append(metrics.mean_squared_error(y_train, y_train_pred))
    # test
    y_test_pred = svr.predict(X_test)
    test_r2_scores.append(metrics.r2_score(y_test, y_test_pred))
    test_MSE_scores.append(metrics.mean_squared_error(y_test, y_test_pred))

_Line()
#  Print the Train and Test Scores for each fold
for fold in range(n_splits):
    print_bold(f'Fold {fold + 1}:')
    print(f"\tTrain R-Squared Score = {train_r2_scores[fold]:.4f}, Test R-Squared Score = {test_r2_scores[fold]:.4f}")
    print(f"\tTrain MSE Score = {train_MSE_scores[fold]:.4f}, Test MSE Score= {test_MSE_scores[fold]:.4f}")

_Line()
print_bold('R-Squared Score:')
print(f"\tMean Train R-Squared Score: {np.mean(train_r2_scores):.4f} ± {np.std(train_r2_scores):.4f}")
print(f"\tMean Test R-Squared Score: {np.mean(test_r2_scores):.4f} ± {np.std(test_r2_scores):.4f}")
print_bold('MSE Score:')
print(f"\tMean MSE Accuracy Score: {np.mean(train_MSE_scores):.4f} ± {np.std(train_MSE_scores):.4f}")
print(f"\tMean MSE Accuracy Score: {np.mean(test_MSE_scores):.4f} ± {np.std(test_MSE_scores):.4f}")
_Line()

<font color='Blue'><b>Example:</b></font> Here, we have a code example that demonstrates the use of a Decision Tree Classifier to visualize decision boundaries on synthetic data. This synthetic dataset is created using scikit-learn's `make_blobs` function, specifically designed for generating artificial datasets for various machine learning experiments. This particular dataset has the following characteristics:

- **Number of Samples:** 2000
- **Number of Features:** 2
- **Number of Classes:** 4
- **Random Seed (random_state):** 0
- **Cluster Standard Deviation (cluster_std):** 1.0

**Features:**
- Within the dataset, there are 2000 data points, each characterized by a pair of feature values, denoted as 'Feature 1' and 'Feature 2'.

**Target Variable:**
- The dataset also includes a target variable named 'Outcome.' This variable assigns each data point to one of four distinct classes, labeled as 'Class 0', 'Class 1', 'Class 2', and 'Class 3'.

In [None]:
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

colors = ["#f5645a", "#b781ea", '#B2FF66', '#0096ff']
edge_colors = ['#8A0002', '#3C1F8B', '#6A993D', '#2e658c']
markers = ['o', 's', 'd', '*']
# Generate synthetic data
X, y = make_blobs(n_samples=2000, centers=4, random_state=0, cluster_std=1.0)


# Create a scatter plot using Seaborn
fig, ax = plt.subplots(1, 1, figsize=(9.5, 7))

for num in np.unique(y):
    ax.scatter(X[:, 0][y == num], X[:, 1][y == num], c=colors[num],
               s=40, ec=edge_colors[num], marker=markers[num], label=str(num))

ax.set_title(title, weight='bold')
ax.grid(True)
ax.legend(title='Class', fontsize=14)

ax.set(xlim=[-6, 6], ylim=[-2, 12])
ax.set_title('Synthetic Dataset', weight = 'bold', fontsize = 16)
plt.tight_layout()

In [None]:
import numpy as np
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC

def _Line(n = 80):
    print(n * '_')
    
def print_bold(txt):
    _left = "\033[1;31m"
    _right = "\033[0m"
    print(_left + txt + _right)
    
# Initialize StratifiedKFold cross-validator
n_splits = 5
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)
# The splitt would be 80-20!

# Lists to store train and test scores for each fold
train_acc_scores, test_acc_scores, train_f1_scores, test_f1_scores = [], [], [], []
train_class_proportions, test_class_proportions = [], []

cls = SVC(kernel = 'rbf')
    
# Perform Cross-Validation
for fold, (train_idx, test_idx) in enumerate(skf.split(X, y), 1):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    cls.fit(X_train, y_train)
    
    # Calculate class proportions for train and test sets
    train_class_proportions.append([np.mean(y_train == cls) for cls in np.unique(y)])
    test_class_proportions.append([np.mean(y_test == cls) for cls in np.unique(y)])
    
    # train
    y_train_pred = cls.predict(X_train)
    train_acc_scores.append(metrics.accuracy_score(y_train, y_train_pred))
    train_f1_scores.append(metrics.f1_score(y_train, y_train_pred, average = 'weighted'))
    
    # test
    y_test_pred = cls.predict(X_test)
    test_acc_scores.append(metrics.accuracy_score(y_test, y_test_pred))
    test_f1_scores.append(metrics.f1_score(y_test, y_test_pred, average = 'weighted'))

_Line()
#  Print the Train and Test Scores for each fold
for fold in range(n_splits):
    print_bold(f'Fold {fold + 1}:')
    print(f"\tTrain Class Proportions: {train_class_proportions[fold]}*{len(y_train)}")
    print(f"\tTest Class Proportions: {test_class_proportions[fold]}*{len(y_test)}")
    print(f"\tTrain Accuracy Score = {train_acc_scores[fold]:.4f}, Test Accuracy Score = {test_acc_scores[fold]:.4f}")
    print(f"\tTrain F1 Score (weighted) = {train_f1_scores[fold]:.4f}, Test F1 Score (weighted)= {test_f1_scores[fold]:.4f}")

_Line()
print_bold('Accuracy Score:')
print(f"\tMean Train Accuracy Score: {np.mean(train_acc_scores):.4f} ± {np.std(train_acc_scores):.4f}")
print(f"\tMean Test Accuracy Score: {np.mean(test_acc_scores):.4f} ± {np.std(test_acc_scores):.4f}")
print_bold('F1 Score:')
print(f"\tMean F1 Accuracy Score (weighted): {np.mean(train_f1_scores):.4f} ± {np.std(train_f1_scores):.4f}")
print(f"\tMean F1 Accuracy Score (weighted): {np.mean(test_f1_scores):.4f} ± {np.std(test_f1_scores):.4f}")
_Line()