<a href="https://colab.research.google.com/github/AmiraliSajadi/handson-ml2-code-note/blob/main/5_support_vector_machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machines

In [20]:
import numpy as np

from sklearn import datasets
from sklearn.datasets import make_moons

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import LinearSVC
from sklearn.svm import SVC

## Linear SVM

In [3]:
iris = datasets.load_iris()

In [6]:
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [7]:
iris["feature_names"]

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [11]:
iris["target_names"]

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [12]:
X = iris["data"][:, (2, 3)]    # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)    # Iris virginica

C is a hyperparameter for SVM models. Larger C is a larger margin for classification (a larger street - page 155) but it can also mean better generalization for the model. </br>
* If the SVM model is overfit, one way to regularize the model is to reduce C hyperparameter.

In [13]:
svm_clf = Pipeline([
                    ("scalar", StandardScaler()),
                    ("linear_svc", LinearSVC(C=1, loss="hinge")),
  ])

In [14]:
svm_clf.fit(X, y)

Pipeline(steps=[('scalar', StandardScaler()),
                ('linear_svc', LinearSVC(C=1, loss='hinge'))])

Making Predictions </br> The model should now be able to predict whether we got virginica or not based on petal length and width with whatever value we through at it. looking at the data, we see that 5.5, 1.7 values should belong to virginica. So we expect a 1:

In [16]:
svm_clf.predict([[5.5, 1.7]])

array([1.])

Nuice!

big statement with drum rolls:</br>**Unlike LR classifiers SVM classifiers don't put out probabilities for each of the classes.**

**Pointers**:
1. Using LinearSVC is the same as using SVC(kernel="linear")
2. Using LinearSVC is the same as using SGDClassifier(loss="hinge", alpha=1/(m*C))
3. SGDClassifier uses regular SGD to traina linear SVC classifier. It doesn't converge as fast as LinearSVC but it's useful for online classification or huge datsets that don't fit in memroy in one go (out-of-core training).
4. To use LinearSVC you should center the training set (subtract its mean) because it regularizes the bias term. This happens automatically if you scale the data with StandardScaler.
5. Always set the LinearSVC's *loss* to "hinge".
6. Always set the LinearSVC's *dual* hyperparameter to false unless you have more features than training instances.


## Nonlinear SVM Classification

Let's be real. Most datasets are not linearly separable. One way of handling these datasets is to add more features like polynomial features (chapter 4).
moons dataset is a toy dataset for binary classification that we'll be using here:

In [17]:
X, y = make_moons(n_samples=100, noise=0.15)

In [18]:
# we use 3rd degree polynomial because moons dataset is in the sahpe of two
# interleaving half circles

polynomial_svm_clf = Pipeline([
                       ("poly_features", PolynomialFeatures(degree=3)),
                       ("scalar", StandardScaler()),
                       ("svm_clf", LinearSVC(C=10, loss="hinge")),
    ])

In [19]:
polynomial_svm_clf.fit(X, y)



Pipeline(steps=[('poly_features', PolynomialFeatures(degree=3)),
                ('scalar', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge'))])

And remember that you can use the polynomial with any model (not just SVMs)

Here's the thing: High degree polynomial is too many features (and a slow model) and low degree polynomial doesn't handle complex datasets well. Solution?</br> **Kernel Trick**: this concept is basically explained as mathematical magic (and nothing more). Kernel trick allows you to get results as if you've added the polynomial features without actually adding them. That is better results With faster computation. We use SVC for it. Here's the implementation:

In [21]:
# coef0 controls how much model is influenced by high degree vs low-degree polynomials
poly_kernel_svm_clf = Pipeline([
                                ('scalar', StandardScaler()),
                                ('svm_clf', SVC(kernel="poly", degree=3, coef0=1, C=5))
])