### Project description

The aim of this project is creating a SVC (Support Vector Classifier) and configure it with different kernels to separate the moon-dataset, correctly.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.svm import SVC

#### make data

In the first step a synthetic moon dataset with 200 samples is generated (make_moons). This dataset simulates two interleaving half-moons, making it a bit challenging for linear classifiers.

In [None]:
# Generate moon dataset
X, y = make_moons(n_samples=200, noise=0.1, random_state=42)

The SVCGenerator has methods to fit the model, make predictions, and plot support vectors to experiment with different kernels and parameters.

In [None]:
class SVCGenerator:
    def __init__(self, kernel='rbf', C=1.0, gamma='scale'):
        self.kernel = kernel
        self.C = C
        self.gamma = gamma
        self.model = None

    def fit(self, X, y):
        self.model = SVC(kernel=self.kernel, C=self.C, gamma=self.gamma)
        self.model.fit(X, y)

    def predict(self, X):
        return self.model.predict(X)

    def plot_support_vectors(self, X, y):
        sv_indices = self.model.support_
        plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
        plt.scatter(X[sv_indices, 0], X[sv_indices, 1], facecolors='none', edgecolors='k', s=100)
        plt.xlim(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5)
        plt.ylim(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5)
        plt.xlabel('X1')
        plt.ylabel('X2')
        plt.title('Support Vectors')

For different kernel such as Radial Basis Function (RBF), linear, and polynomial kernels, different instances are created.

In [None]:
# Creating an instance with different configurations
svc_rbf = SVCGenerator(kernel='rbf', C=1.0, gamma='scale') #RBF kernel
svc_linear = SVCGenerator(kernel='linear', C=1.0) #linear kernel
svc_poly = SVCGenerator(kernel='poly', C=1.0, gamma=3.0) #polynomial kernel

### Fitting Models
This step's aim is finding the decision boundary that best separates the two classes while maximizing the margin between support vectors.

In [None]:
svc_rbf.fit(X, y)
svc_linear.fit(X, y)
svc_poly.fit(X, y)

#### Visualization models

A mesh grid is created over the feature space to visualize the decision boundaries and regions.  This grid allows to visualize how well each classifier separates the data.

In the following code, the grid used a function called np.meshgrid. It helps to make a bunch of dots that are spaced apart by h set as 0.02. The 0.02 amount seems a good balance. If we picked a smaller number, the dots would be closer together, and we could see more details in the picture, but it would take longer for the computer to figure out.If we went with a bigger number, the dots would be farther apart, so the picture would have fewer details, but the computer would work faster.

In [None]:
# Create a mesh grid for visualization
h = 0.02
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

#### Predictions
In this step, the trained models are used to make predictions on the mesh grid points.

In [None]:
#Getting predictions for each model on the mesh grid
Z_rbf = svc_rbf.predict(np.c_[xx.ravel(), yy.ravel()])
Z_linear = svc_linear.predict(np.c_[xx.ravel(), yy.ravel()])
Z_poly = svc_poly.predict(np.c_[xx.ravel(), yy.ravel()])

#### Reshaping

The predictions are reshaped to match the shape of the mesh grid.

In [None]:
Z_rbf = Z_rbf.reshape(xx.shape)
Z_linear = Z_linear.reshape(xx.shape)
Z_poly = Z_poly.reshape(xx.shape)

#### Visualization:

To visualize the results for each kernel configuration these methods are used:

plt.contourf: plot decision boundaries as filled contours

plot_support_vectors: plot the support vectors

In [None]:
# Plot the moon dataset and decision boundaries with support vectors
plt.figure(figsize=(12, 4))
plt.subplot(131)
plt.contourf(xx, yy, Z_rbf, cmap=plt.cm.Paired, alpha=0.8)
svc_rbf.plot_support_vectors(X, y)
plt.title('SVC with RBF Kernel')

plt.subplot(132)
plt.contourf(xx, yy, Z_linear, cmap=plt.cm.Paired, alpha=0.8)
svc_linear.plot_support_vectors(X, y)
plt.title('SVC with Linear Kernel')

plt.subplot(133)
plt.contourf(xx, yy, Z_poly, cmap=plt.cm.Paired, alpha=0.8)
svc_poly.plot_support_vectors(X, y)
plt.title('SVC with Polynomial Kernel')

plt.tight_layout()
plt.show()

Plots show the moon dataset, decision boundaries, and support vectors for each kernel configuration. This allows to compare how well each kernel performs in separating the moon-shaped clusters.

RBF Kernel: As the RBF kernel is a for non-linear problems the decision boundary captures the non-linear pattern of the moon dataset. The support vectors concentrate around the areas where the classes are closer and overlapping.

Linear Kernel: The linear kernel is for linearly separable datasets. As the figure shows, the decision boundary is a straight line and Support vectors almost located near the boundary.

Polynomial Kernel: The polynomial kernel captures moderately non-linear relationships. As shown in the figure, this kernel works like RBF but in the so moderately ways and smoothly. The decision boundary are curved to accommodate the moon shapes.