# Support Vector Machines
Fits the widest possible lane between clusters of data. (large margin classification)<br/>
Adding more instances will not affect the decision boundary, because it is fully supported by the 
instances located on the edge of the lanes (support vectors).

## Soft Margin Classification
Hard margin classification requires that all the data be classified on sides of the lane.  This requires the data
to be linearly separable and it is very affected by outliers.  Outliers can make it impossible to find a hard margin
because they may appear on the wrong side of the lane.<br/>
Soft margin classification keeps the lane as large as possible but also minimizes the margin violation. 
### Scikit-Learn
Control the balance using the C hyperparameter.  Smaller c leads to a wider lane, but more margin violations, and a
larger c will give you a smaller lane, with fewer margin violations. To counter overfitting, regularize by reducing c.

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]
y = (iris["target"] == 2).astype(np.float64)

svm_clf = Pipeline([("scaler", StandardScaler()), 
                    ("linear_svc", LinearSVC(C=1, loss="hinge"))])

svm_clf.fit(X, y)


In [None]:
svm_clf.predict([[5.5, 1.7]])


## Nonlinear SVM Classification
When datasets are not linearly separable: can add polynomial features.<br/>
Scikit-learn has a polynomial features transformer for the pipeline.

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

X_moons, y_moons = make_moons(n_samples=100, shuffle=True, noise=0.1, random_state=42)

X = X_moons[:,0]
y = X_moons[:,1]
plt.scatter(X, y)
plt.show()

polynomial_svm_clf = Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])

In [None]:
polynomial_svm_clf.fit(X_moons, y_moons)


## Polynomial Kernel
It is easy to add polynomial features- low number of polynomial degree cannot deal with complexity, high number
will make the model slow.<br/>

### Kernel Trick
Get the same result as adding many polynomial features (without actually doing it). 

In [None]:
from sklearn.svm import SVC
# Kernel trick on the moon dataset
poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))  # coef0 controls how much the model is influenced by
    # high degree polynomials versus low-degree.
])

poly_kernel_svm_clf.fit(X_moons, y_moons)


## Adding Similarity Features
To handle non-linear problems you can add features computed using a similarity function.  The similarity function
measures how much each instance resembles a landmark.  It is a bell shaped function from 0 to 1, with larger number
indicating closer to the landmark.<br/>
To create a landmark, you can set one for each data point in the dataset- this is not the best for larger datasets.



## Gaussian RBF Kernel
