A Support Vector Machine (SVM) is a very powerful and versatile Machine Learning
model, capable of performing linear or nonlinear classification, regression, and even
outlier detection. 

SVMs are particularly well suited for classification of complex but small- or medium-sized datasets.

# Linear SVM Classification

Large margin classification: fitting the widest possible street (represented by the parallel dashed lines) between the classes.

<img src="img1.png" witdh=720 height=720 />

Notice that adding more training instances “off the street” will not affect the decision
boundary at all: it is fully determined (or “supported”) by the instances located on the
edge of the street. These instances are called the support vectors 


SVMs are sensitive to the feature scales:

<img src="img2.png" witdh=720 height=720 />

## Soft Margin Classification

Hard margin classification:
All instances be off the street and on the right side

There are two main issues with hard margin classification:
1. It only works if the data is linearly separable
2. It is quite sensi‐tive to outliers

<img src="img3.png" witdh=720 height=720 />

##### Soft margin classification:

find a good balance between keeping the street as large as possible and limiting the
margin violations (i.e., instances that end up in the middle of the street or even on the
wrong side) --- flexible model.


In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparameter
a smaller C value leads to a wider street but more margin violations.

<img src="img4.png" witdh=720 height=720 />

On the left, using a high C value the classifier makes fewer margin violations but ends up with a smaller margin.

On the right, using a low C value the margin is much larger, but many instances end up on the street


However, it seems likely that the second classifier will generalize better: in fact even on this
training set it makes fewer prediction errors, since most of the margin violations are
actually on the correct side of the decision boundary.

`If your SVM model is overfitting, you can try regularizing it by reducing C`



In [16]:
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import numpy as np

In [17]:
data = load_iris()
X = data['data'][:, (2, 3)]
y = (data["target"] == 2).astype(np.float64) 

In [20]:
svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svc", LinearSVC(C=0.1, loss="hinge"))])

In [21]:
svm_clf.fit(X, y)

In [22]:
svm_clf.predict([[5.5, 1.7]])


array([1.])

Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class

Alternatively, you could use the SVC class, using SVC(kernel="linear", C=1), but it is much slower, especially with large training sets, so it is not recommended. Another option is to use the SGDClassifier class, with SGDClassifier(loss="hinge",
alpha=1/(m*C)). This applies regular Stochastic Gradient Descent (see Chapter 4) to train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it can be useful to handle huge datasets that do not fit in memory (out-of-core training), or to handle online classification tasks.

##### Note:
1. The LinearSVC class regularizes the bias term -> center the training set first by subtracting its mean -> scale the data using the StandardScaler.
2. loss ="hinge"
3. dual = False