### Support Vector Machines

> A support vector machine (SVM) is a powerful and versatile machine learning model, capable of performing linear or nonlinear classification, regression, and even novelty detection
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> Does not scale very well to very large datasets.
> > Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> The decision boundary of an SVM classifier....(is a)...line...(that)...not only separates...two classes but also stays as far away from the closest training instances as possible.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> The best hyperplane, also known as the “hard margin,” is the one that maximizes the distance between the hyperplane and the nearest data points from both classes.
> https://www.geeksforgeeks.org/support-vector-machine-algorithm/

> Adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street. These instances are called the support vectors.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

### Soft and Hard Margin Classification
> If we strictly impose that all instances must be off the street and on the correct side, this is called hard margin classification. There are two main issues with hard margin classification. First, it only works if the data is linearly separable. Second, it is sensitive to outliers.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side). This is called soft margin classification.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> Regularization hyperparameters, such as the regularization parameter in linear regression or the dropout rate in neural networks, control the model's complexity. Higher values of these hyperparameters penalize complex models, helping to prevent overfitting.
> https://encord.com/glossary/hyper-parameters-definition/#:~:text=Regularization%20hyperparameters%2C%20such%20as%20the,models%2C%20helping%20to%20prevent%20overfitting.

> If you set it to a low value, then you end up with the model on the left of Figure 5-4. With a high value, you get the model on the right. As you can see, reducing C makes the street larger, but it also leads to more margin violations.

In [2]:
# The following Scikit-Learn code loads the iris dataset and trains a linear SVM classifier to detect Iris virginica flowers.
# The pipeline first scales the features, then uses a LinearSVC with C=1:

from sklearn.datasets import load_iris
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

# Load the dataset as a pandas dataframe (i.e. two-dimensional data structure)
iris = load_iris(as_frame=True)
print(iris)

# Sets X to a two-dimensional array, with each index containing an array of the values for
# petal length (cm), petal width (cm)
X = iris.data[["petal length (cm)", "petal width (cm)"]].values
# print(X)
# print(X[0][0])
# print(X[0][1])

# Sets Y to the iris.target of the desired value for the classifier --> # Iris virginica
y = (iris.target == 2)

# Create the classifier and apply it to the extracted dataset
svm_clf = make_pipeline(StandardScaler(), LinearSVC(C=1, random_state=42))
svm_clf.fit(X, y)
# make_pipeline: Construct a pipeline
# Pipeline allows you to sequentially apply a list of transformers to preprocess the data and,
# if desired, conclude the sequence with a final predictor for predictive modeling.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

# StandardScaler(): Standardize features by removing the mean and scaling to unit variance
# https://scikit-learn.org/1.6/modules/generated/sklearn.preprocessing.StandardScaler.html

# LinearSVC(C=1, random_state=42)): Linear Support Vector Classification
# https://scikit-learn.org/1.6/modules/generated/sklearn.svm.LinearSVC.html

# Use the model to make a prediction
X_new = [[5.5, 1.7], [5.0, 1.5]]
print(svm_clf.predict(X_new))

# Display the signed distance between each instance and the decision boundary
print(svm_clf.decision_function(X_new))

{'data':      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns], 'target': 0      0
1      0
2 

### Nonlinear SVM Classification

> Although linear SVM classifiers are efficient and often work surprisingly well, many datasets are not even close to being linearly separable. One approach to handling nonlinear datasets is to add more features, such as polynomial features (as we did in Chapter 4); in some cases this can result in a linearly separable dataset.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

To implement this idea using Scikit-Learn, you can create a pipeline containing a PolynomialFeatures transformer (discussed in “Polynomial Regression”), followed by a StandardScaler and a LinearSVC classifier.

In [3]:
# The following Scikit-Learn code generates a moon dataset and trains a polynomial SVM classifier
# The pipeline first introduces polynomial features, then scales the data set, and finally applies
# Linear SVM

# Construct a Pipeline from the given estimators.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
from sklearn.pipeline import make_pipeline

# Generate a new feature matrix consisting of all polynomial combinations of
# the features with degree less than or equal to the specified degree. 
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html
from sklearn.preprocessing import PolynomialFeatures

# Standardize features by removing the mean and scaling to unit variance
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
from sklearn.preprocessing import StandardScaler

# Linear Support Vector Classification
# https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
from sklearn.svm import LinearSVC

# Make two interleaving half circles.
# A simple toy dataset to visualize clustering and classification algorithms.
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html
from sklearn.datasets import make_moons

# n - the total number of points generated 
# noise - Standard deviation of Gaussian noise added to the data
# random_state - Determines random number generation for dataset shuffling and noise.
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

# Creates the classfier and applies it to the moon data set
# C - Regularization parameter - The regularization parameter in a
# Support Vector Machine (SVM) is \(C\),which controls how much to penalize misclassified data. 
# Google AI
# max_iter - The maximum number of iterations to be run
# random_state - Controls the pseudo random number generation for shuffling the data 
# https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
polynomial_svm_clf = make_pipeline(
    PolynomialFeatures(degree=3),
    StandardScaler(),
    LinearSVC(C=10, max_iter=10_000, random_state=42)
)
polynomial_svm_clf.fit(X, y)

Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=3)),
                ('standardscaler', StandardScaler()),
                ('linearsvc',
                 LinearSVC(C=10, max_iter=10000, random_state=42))])


# Polynomial Kernel

> Adding polynomial features is simple to implement and can work great with all sorts of machine learning algorithms (not just SVMs). That said, at a low polynomial degree this method cannot deal with very complex datasets, and with a high polynomial degree it creates a huge number of features, making the model too slow.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> The kernel trick makes it possible to get the same result as if you had added many polynomial features, even with a very high degree, without actually having to add them.
> Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition - Chapter 5

> Although there are some obstacles to understanding the kernel trick, it is highly important to understand how kernels are used in support vector classification. For practical reasons, it is important to understand because implementing support vector classifiers requires specifying a kernel function, and there are not established, general rules to know what kernel will work best for your particular data.
> https://medium.com/towards-data-science/the-kernel-trick-c98cdbcaeb3f

In [1]:
# The following code demonstrates training an SVM classifier 
# using a third-degree polynomial kernel

# Construct a Pipeline from the given estimators.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
from sklearn.pipeline import make_pipeline

# Standardize features by removing the mean and scaling to unit variance
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
from sklearn.preprocessing import StandardScaler

# C-Support Vector Classification
# https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
from sklearn.svm import SVC

# Make two interleaving half circles.
# A simple toy dataset to visualize clustering and classification algorithms.
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html
from sklearn.datasets import make_moons

# n - the total number of points generated 
# noise - Standard deviation of Gaussian noise added to the data
# random_state - Determines random number generation for dataset shuffling and noise.
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

# Creates the classfier and applies it to the moon data set
poly_kernel_svm_clf = make_pipeline(StandardScaler(),
                                    SVC(kernel="poly", degree=3, coef0=1, C=5))
poly_kernel_svm_clf.fit(X, y)

In [16]:
# The following code trains a model to predict the cultivator of a wine
# based on the chemical analysis

# Evaluate a score by cross-validation.
# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html
from sklearn.model_selection import cross_val_score

# Construct a Pipeline from the given estimators.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
from sklearn.pipeline import make_pipeline

# Standardize features by removing the mean and scaling to unit variance
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
from sklearn.preprocessing import StandardScaler

# Linear Support Vector Classification
# https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
from sklearn.svm import LinearSVC

# Load and return the wine dataset (classification).
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html
from sklearn.datasets import load_wine

# Load the dataset as a pandas dataframe (i.e. two-dimensional data structure)
wines = load_wine(as_frame=True)
print(wines.data)
print(wines.DESCR)

# Sets X to a two-dimensional array, with each index containing an array of the values for
# flavanoids and proline
X = wines.data[["flavanoids", "proline"]].values
# print(X)
# print(X[0][0])
# print(X[0][1])

# Sets Y to the wines.target of the desired value for the classifier
y = (wines.target == 2)

# Create the classifier and apply it to the extracted dataset
svm_clf = make_pipeline(StandardScaler(), LinearSVC(C=1, random_state=42))
svm_clf.fit(X, y)
# make_pipeline: Construct a pipeline
# Pipeline allows you to sequentially apply a list of transformers to preprocess the data and,
# if desired, conclude the sequence with a final predictor for predictive modeling.
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html
# https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

# StandardScaler(): Standardize features by removing the mean and scaling to unit variance
# https://scikit-learn.org/1.6/modules/generated/sklearn.preprocessing.StandardScaler.html

# LinearSVC(C=1, random_state=42)): Linear Support Vector Classification
# https://scikit-learn.org/1.6/modules/generated/sklearn.svm.LinearSVC.html

# Use the model to make a prediction
X_new = [[3.06, 1065.0], [0.61, 740.0]]
print(svm_clf.predict(X_new))

# Assess the cross validation score
print(cross_val_score(svm_clf, X, y).mean())

     alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0      14.23        1.71  2.43               15.6      127.0           2.80   
1      13.20        1.78  2.14               11.2      100.0           2.65   
2      13.16        2.36  2.67               18.6      101.0           2.80   
3      14.37        1.95  2.50               16.8      113.0           3.85   
4      13.24        2.59  2.87               21.0      118.0           2.80   
..       ...         ...   ...                ...        ...            ...   
173    13.71        5.65  2.45               20.5       95.0           1.68   
174    13.40        3.91  2.48               23.0      102.0           1.80   
175    13.27        4.28  2.26               20.0      120.0           1.59   
176    13.17        2.59  2.37               20.0      120.0           1.65   
177    14.13        4.10  2.74               24.5       96.0           2.05   

     flavanoids  nonflavanoid_phenols  proanthocyan