# <font color='#eb3483'> Support Vector Machines (SVM) </font>

Support vector machines were all the rage ten years ago and still remain an awesome machine learning algorithm.

SVMs work by trying to find a hyperplane (a fancy high-dimensional verson of a line/plane) in N-dimensional space (where N is the number of features) that separates your data-points into different classes.

In this module we'll walk through how to train a SVM using scikit-learn and visualize it.

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

### <font color='#eb3483'> Breast Cancer Wisconsin Dataset </font>

In [None]:
# Load the Breast Cancer Wisconsin dataset
breast_cancer = datasets.load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

In [None]:
print(breast_cancer.DESCR)

In [None]:
breast_cancer.target_names

### <font color='#eb3483'> Training a SVM </font>

SVM models in `scikit-learn` are in the module `sklearn.svm`.   

SVM is another algorithm that can be used both for regression (continuous variables) and classification (categories).  

There is an implementation for regression (`SVR`) and another one for classification (`SVC`).

In [None]:
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create an SVM classifier with a linear kernel
svm = SVC()

# Train the model
svm.fit(X_train, y_train)

# Make predictions
y_pred = svm.predict(X_test)
y_pred[:10]

In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Detailed classification report
print(classification_report(y_test, y_pred))

# Confusion matrix
print(confusion_matrix(y_test, y_pred))

Here are some parameters you might find helpful:
- **C** is the Cost parameter (that regulates the slack variables that help regularize the model).
- **kernel** specifies the kernel (rbf, (radial basis function) by default). We can use any kernel we define or any of the available ones (`rbf`, `poly` (polynomial), `linear`or `sigmoid`).
- **class_weight**, allows us to use a dictionary `{clase:peso}` that allows us to assign custom weights to classes. For imbalanced classification problems we can use `class_weight=balanced` that automatically balances the classes based on their support.
- **decision_function_shape** Choose if using One-versus-one (ovo) or One-versus-rest (ovr) for multiclass classification.
- **probability**. If we want to calculate the class probabilities (and use `predict_proba`) (False by default).
- **cache_size** is the size in (megabytes) the model can use to store calculation data in memory. SVMs are computationally intensive, so the bigger the cache the better.

## <font color='#eb3483'>  Kernels </font>

Let's see  the effect of different kernels (i.e. ways to measure distance between points) on the decision hyperplane.

We will use only the first 2 dataset variables to be able to plot them on a scatter plot.

In [None]:
X = breast_cancer.data[:, :2]
y = breast_cancer.target
X.shape

We are going to use a utility function in, `plot_decision_regions` that plots a diagram indicating the different decision regions for each class. You will need to install Mlxtend (machine learning extensions), a Python library of useful tools for the day-to-day data science tasks. `conda install mlxtend --channel conda-forge`

In [None]:
from mlxtend.plotting import plot_decision_regions

**Linear Kernel**
This kernel is simple and works well when the data is linearly separable.

The linear kernel is defined as:

$$\text{Linear Kernel}: k(x,y) = x^Ty+c$$

In [None]:
estimator_svm_lineal = SVC(kernel="linear")
estimator_svm_lineal.fit(X, y)

plot_decision_regions(X, y, clf=estimator_svm_lineal);

**Polynomial Kernel**

The polynomial kernel calculates the product of two vectors in a dimensional space of the polynomial combinations of the vectors. So if we have 2 vectors $V_1$ and $V_2$ shaped $[x_1, x_2]$, the polynomial kernel is going to transform them into $[x_1, x_2, x_1^2, x_1x_2, x_2^2...]$ . It has the formula:

$$\text{polynomial kernel}: k(x,y) = (\alpha x^Ty+c)^p$$

The polynomial kernel has a hyperparameter `d` (degree) that indicates the degree of the polynomial expansion (3 by default).

In [None]:
estimator_svm_polinomial = SVC(kernel="poly")
estimator_svm_polinomial.fit(X, y)

plot_decision_regions(X, y, clf=estimator_svm_polinomial);

We see that the decision boundary became a polynomial line (curved line). The more degrees of the expansion the more "curved" the lines can be. If we use a polynomial kernel with `degree=1` we get a linear kernel.

In [None]:
estimator_svm_polinomial_1 = SVC(kernel="poly", degree=1).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_polinomial_1);

In [None]:
estimator_svm_polinomial_2 = SVC(kernel="poly", degree=2).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_polinomial_2);

In [None]:
estimator_svm_polinomial_3 = SVC(kernel="poly", degree=3, gamma=0.1).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_polinomial_3);

A low `d` reduces the complexity of the polynomial kernel (turning it into a linear kernel).

The **Radial Basis Function**

This kernel can map the data to a higher-dimensional space and is effective for non-linearly separable data.

(RBF) kernel does a radial transformation (that is, transforms the points based to their distance to the origin). It has the formulation:

$$\text{radial kernel}: k(x,y) = \exp(-\gamma ||x - y^2||))$$

In [None]:
estimator_svm_rbf = SVC(kernel="rbf")
estimator_svm_rbf.fit(X, y)

plot_decision_regions(X, y, clf=estimator_svm_rbf);

We can control the shape of the decision boundary with the hyperparameter `gamma`:

In [None]:
estimator_svm_rbf_a = SVC(kernel="rbf", gamma=0.1).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_rbf_a);

In [None]:
estimator_svm_rbf_b = SVC(kernel="rbf", gamma=7).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_rbf_b);

In [None]:
estimator_svm_rbf_c = SVC(kernel="rbf", gamma=100).fit(X, y)
plot_decision_regions(X, y, clf=estimator_svm_rbf_c);

Higher `gamma` values increase rbf's kernel capability to create areas around the data.
We see that for `gamma=100` the model is overfitting (basically creating tiny circles around each observation.

## <font color='#eb3483'>  Evaluate SVM with different kernels </font>

In [None]:
# Load the Breast Cancer Wisconsin dataset
breast_cancer = datasets.load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Function to train and evaluate SVM with different kernels
def evaluate_svm(kernel_type, **kwargs):
    # Create an SVM classifier with the specified kernel
    svm = SVC(kernel=kernel_type, **kwargs)

    # Train the model
    svm.fit(X_train, y_train)

    # Make predictions
    y_pred = svm.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    print(f'Kernel: {kernel_type}')
    print(f'Accuracy: {accuracy:.2f}')
    print(classification_report(y_test, y_pred))
    print(confusion_matrix(y_test, y_pred))
    print("\n")

# Evaluate SVM with different kernels
evaluate_svm('linear')
evaluate_svm('poly', degree=3)
evaluate_svm('rbf', gamma=0.1)