### Linear SVM Classification
##### Soft-Hard Margin Classfication
The following Scikit-Learn code loads the iris dataset, scales the features, and then
trains a linear SVM model (using the LinearSVC class with C=1 and the hinge loss
function, described shortly) to detect Iris virginica flowers

In [2]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]
y = (iris["target"] == 2).astype(np.float64)

svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge")),
])

In [3]:
svm_clf.fit(X, y)
svm_clf.predict([[5.5, 1.7]])

array([1.])

### Non-Linear SVM Classification
To implement this idea using Scikit-Learn, create a Pipeline containing a Polyno
mialFeatures transformer (discussed in “Polynomial Regression” on page 128), fol‐
lowed by a StandardScaler and a LinearSVC. Let’s test this on the moons dataset: this
is a toy dataset for binary classification in which the data points are shaped as two
interleaving half circles (see Figure 5-6). You can generate this dataset using the
make_moons() function:

In [1]:
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

polynomial_svm_clf = Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])

In [None]:
polynomial_svm_clf.fit(X, y)
polynomial_svm_clf.predict([[5.5, 1.7]])



array([1])

### Polynomial Kernel
when using SVMs you can apply an almost miraculous mathematical
technique called the kernel trick (explained in a moment). The kernel trick makes it
possible to get the same result as if you had added many polynomial features, even
with very high-degree polynomials, without actually having to add them. So there is
no combinatorial explosion of the number of features because you don’t actually add
any features. This trick is implemented by the SVC class. Let’s test it on the moons
dataset:

In [3]:
from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])

poly_kernel_svm_clf.fit(X, y)

poly_kernel_svm_clf.predict([[5.5, 1.7]])

array([1])

This code trains an SVM classifier using a third-degree polynomial kernel. It is repre‐
sented on the left in Figure 5-7. On the right is another SVM classifier using a 10th-
degree polynomial kernel. Obviously, if your model is overfitting, you might want to
reduce the polynomial degree. Conversely, if it is underfitting, you can try increasing
it. The hyperparameter coef0 controls how much the model is influenced by high-
degree polynomials versus low-degree polynomials.

### Gaussian RBF Kernel
Just like the polynomial features method, the similarity features method can be useful
with any Machine Learning algorithm, but it may be computationally expensive to
compute all the additional features, especially on large training sets. Once again the
kernel trick does its SVM magic, making it possible to obtain a similar result as if you
had added many similarity features. Let’s try the SVC class with the Gaussian RBF
kernel:

In [7]:
rbf_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
    ])
rbf_kernel_svm_clf.fit(X, y)

0,1,2
,steps,"[('scaler', ...), ('svm_clf', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,copy,True
,with_mean,True
,with_std,True

0,1,2
,C,0.001
,kernel,'rbf'
,degree,3
,gamma,5
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


This model is represented at the bottom left in Figure 5-9. The other plots show mod‐
els trained with different values of hyperparameters gamma (γ) and C. Increasing gamma
makes the bell-shaped curve narrower (see the lefthand plots in Figure 5-8). As a
result, each instance’s range of influence is smaller: the decision boundary ends up
being more irregular, wiggling around individual instances. Conversely, a small gamma
value makes the bell-shaped curve wider: instances have a larger range of influence,
and the decision boundary ends up smoother. So γ acts like a regularization
hyperparameter: if your model is overfitting, you should reduce it; if it is underfitting,
you should increase it (similar to the C hyperparameter).

### SVM Regression
You can use Scikit-Learn’s LinearSVR class to perform linear SVM Regression. The
following code produces the model represented on the left in Figure 5-10 (the train‐
ing data should be scaled and centered first):

In [8]:
from sklearn.svm import LinearSVR
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)

0,1,2
,epsilon,1.5
,tol,0.0001
,C,1.0
,loss,'epsilon_insensitive'
,fit_intercept,True
,intercept_scaling,1.0
,dual,'auto'
,verbose,0
,random_state,
,max_iter,1000


To tackle nonlinear regression tasks, you can use a kernelized SVM model.
Figure 5-11 shows SVM Regression on a random quadratic training set, using a
second-degree polynomial kernel. There is little regularization in the left plot (i.e., a
large C value), and much more regularization in the right plot (i.e., a small C value).

The following code uses Scikit-Learn’s SVR class (which supports the kernel trick) to
produce the model represented on the left in Figure 5-11:

In [9]:
from sklearn.svm import SVR
svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

0,1,2
,kernel,'poly'
,degree,2
,gamma,'scale'
,coef0,0.0
,tol,0.001
,C,100
,epsilon,0.1
,shrinking,True
,cache_size,200
,verbose,False


The SVR class is the regression equivalent of the SVC class, and the LinearSVR class is
the regression equivalent of the LinearSVC class. The LinearSVR class scales linearly
with the size of the training set (just like the LinearSVC class), while the SVR class gets
much too slow when the training set grows large (just like the SVC class).