<a href="https://colab.research.google.com/github/AmiraliSajadi/handson-ml2-code-note/blob/main/5_support_vector_machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machines

In [18]:
import numpy as np

from sklearn import datasets
from sklearn.datasets import make_moons

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.svm import LinearSVC, SVC, LinearSVR, SVR

## Linear SVM

In [2]:
iris = datasets.load_iris()

In [3]:
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [4]:
iris["feature_names"]

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [5]:
iris["target_names"]

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [6]:
X = iris["data"][:, (2, 3)]    # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)    # Iris virginica

C is a hyperparameter for SVM models. Larger C is a larger margin for classification (a larger street - page 155) but it can also mean better generalization for the model. </br>
* If the SVM model is overfit, one way to regularize the model is to reduce C hyperparameter.

In [7]:
svm_clf = Pipeline([
                    ("scalar", StandardScaler()),
                    ("linear_svc", LinearSVC(C=1, loss="hinge")),
  ])

In [8]:
svm_clf.fit(X, y)

Pipeline(steps=[('scalar', StandardScaler()),
                ('linear_svc', LinearSVC(C=1, loss='hinge'))])

Making Predictions </br> The model should now be able to predict whether we got virginica or not based on petal length and width with whatever value we through at it. looking at the data, we see that 5.5, 1.7 values should belong to virginica. So we expect a 1:

In [9]:
svm_clf.predict([[5.5, 1.7]])

array([1.])

Nuice!

big statement with drum rolls:</br>**Unlike LR classifiers SVM classifiers don't put out probabilities for each of the classes.**

**Pointers**:
1. Using LinearSVC is the same as using SVC(kernel="linear")
2. Using LinearSVC is the same as using SGDClassifier(loss="hinge", alpha=1/(m*C))
3. SGDClassifier uses regular SGD to traina linear SVC classifier. It doesn't converge as fast as LinearSVC but it's useful for online classification or huge datsets that don't fit in memroy in one go (out-of-core training).
4. To use LinearSVC you should center the training set (subtract its mean) because it regularizes the bias term. This happens automatically if you scale the data with StandardScaler.
5. Always set the LinearSVC's *loss* to "hinge".
6. Always set the LinearSVC's *dual* hyperparameter to false unless you have more features than training instances.


## Nonlinear SVM Classification

Let's be real. Most datasets are not linearly separable. One way of handling these datasets is to add more features like polynomial features (chapter 4).
moons dataset is a toy dataset for binary classification that we'll be using here:

In [10]:
X, y = make_moons(n_samples=100, noise=0.15)

In [11]:
# we use 3rd degree polynomial because moons dataset is in the sahpe of two
# interleaving half circles

polynomial_svm_clf = Pipeline([
                       ("poly_features", PolynomialFeatures(degree=3)),
                       ("scalar", StandardScaler()),
                       ("svm_clf", LinearSVC(C=10, loss="hinge")),
    ])

In [12]:
polynomial_svm_clf.fit(X, y)



Pipeline(steps=[('poly_features', PolynomialFeatures(degree=3)),
                ('scalar', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge'))])

And remember that you can use the polynomial with any model (not just SVMs)

Here's the thing: High degree polynomial is too many features (and a slow model) and low degree polynomial doesn't handle complex datasets well. Solution?</br> **Kernel Trick**: this concept is basically explained as mathematical magic (and nothing more). Kernel trick allows you to get results as if you've added the polynomial features without actually adding them. That is better results With faster computation. We use SVC for it. Here's the implementation:

In [14]:
# coef0 controls how much model is influenced by high degree vs low-degree polynomials
# Do consider grid searches for finding the right hyperparameters here ;)
poly_kernel_svm_clf = Pipeline([
                                ('scalar', StandardScaler()),
                                ('svm_clf', SVC(kernel="poly", degree=3, coef0=1, C=5))
])

Similarity Features</br>
Using similarity features for nonlinear problems also adds features. The added features are computed with a *similarity function* that measures how much each instance resembles a particular landmark (do look into page 159 as it's super hard to summarize this one without pictures).</br> The similarity functiont that we'll use here is *Gaussian RBF*


In [15]:
rbf_kernel_svm_clf = Pipeline([
                               ("scaler", StandardScaler()),
                               ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])

rbf_kernel_svm_clf.fit(X, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=0.001, gamma=5))])

How to choose a kernel?</br>
Well... Always try to start with a linear kernel (LinearSVC is faster than SVC(kernel="linear")). After that move on to Gaussian RBF which usually works. If that didn't work, you gotta experiment with other kernels and grid searches. You can also check out the computational complexities of SVC SGDClassifier and LinearSVC in page 162's table.

## SVM Regression
who said SVMs can only classify?

In [17]:
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)

LinearSVR(epsilon=1.5)

Need nonlinear regression done? No problem! Use a kernelized SVM model:

In [19]:
# the lower the C the more the regularization:
svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

SVR(C=100, degree=2, kernel='poly')

The rest of the chapter explains the math under the hood of the SVM models. Code's done.