### Section 6 

### Support Vector Machines

A Support Vector Machine (SVM) is a very powerful Machine Learning model, capable of performing linear or nonlinear classification and even outlier detection.


### Hard Margin Classification
The following figure describes the idea of support vector machines in which, Two classes from iris datasets were used with two features only.

<img src="HardMargin.png">

The two classes can clearly be separated easily with a straight line (they are linearly separable). The left plot shows the decision boundaries of three possible linear classifiers. The
model whose decision boundary is represented by the dashed line is so bad that it does not even separate the classes properly. The other two models work perfectly on
this training set, but their decision boundaries come so close to the instances that these models will probably not perform as well on new instances. In contrast, the solid line in the plot on the right represents the decision boundary of an SVM classifier;
this line not only separates the two classes but also stays as far away from the closest training instances as possible. You can think of an SVM classifier as fitting the widest possible street (represented by the parallel dashed lines) between the classes.
This is called large margin classification.Notice that adding more training instances “off the street” will not affect the decision boundary at all.

### Scale Sensitivity
SVMs are sensitive to the feature scales, as you can see in the following
Figure: on the left plot, the vertical scale is much larger than the
horizontal scale, so the widest possible street is close to horizontal.
After feature scaling (e.g., using Scikit-Learn’s StandardScaler),
the decision boundary looks much better (on the right plot).

<img src='scale_sensitivity.png'>

In [4]:
from sklearn.preprocessing import StandardScaler
data = [[5, 20], [1, 50], [3, 80], [5, 60]]
scaler = StandardScaler()
scaler.fit(data)
print(scaler.mean_)
print(scaler.transform(data))

[ 3.5 52.5]
[[ 0.90453403 -1.5011107 ]
 [-1.50755672 -0.11547005]
 [-0.30151134  1.27017059]
 [ 0.90453403  0.34641016]]


## Feature Scaling
So Feature scaling is important,ex: If a feature has a variance that is of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

<img src='scale_sensitivity2.png'>

since X is vector of X1 and X2
$$X=(X1,X2)$$
so W is Vector os W1 and W2
$$W=(W1,W2)$$
so if range of X2>>X1  it will dominate the objective function

### Soft Margin Classification
If we strictly impose that all instances be off the street and on the right side, this is
called hard margin classification. The issue with hard margin classification that it is quite sensitive
to outliers. The figure below shows the iris dataset with just one additional outlier: on
the left, it is impossible to find a hard margin.

<img src='softvshard.png'>

To avoid these issues it is preferable to use a more flexible model. The objective is to
find a good balance between keeping the street as large as possible and limiting the
margin violations (i.e., instances that end up in the middle of the street or even on the
wrong side). This is called soft margin classification.
In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparameter:
a smaller C value leads to a wider street but more margin violations. The following Figure
shows the decision boundaries and margins of two soft margin SVM classifiers on a
nonlinearly separable dataset. On the left, using a low C value the margin is quite
large, but many instances end up on the street. On the right, using a high C value the
classifier makes fewer margin violations but ends up with a smaller margin. However,
it seems likely that the first classifier will generalize better: in fact even on this training
set it makes fewer prediction errors, since most of the margin violations are
actually on the correct side of the decision boundary.
<img src='diffC.png'>

If your SVM model is overfitting, you can try regularizing it by
reducing C.

The following Scikit-Learn code loads the iris dataset, scales the features, and then trains a linear SVM model (using Support Vector Classifier (SVC) class with C = 1) to detect Iris-Virginica flowers which is the right part in the previous figure.

In [16]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica
svm_clf = Pipeline((
("scaler", StandardScaler()),
("linear_svc", SVC(C=1, kernel="linear")),
))
svm_clf.fit(X, y)
svm_clf.predict([[5.5, 1.7]])

array([1.])

### Nonlinear SVM Classification
Although linear SVM classifiers are efficient and work surprisingly well in many cases, many datasets are not even close to being linearly separable. One approach to handling nonlinear datasets is to add more features, such as polynomial features in some cases this can result in a linearly separable dataset. Consider the left plot in following Figure : it represents a simple dataset with just one feature x1. This dataset is not linearly separable, as you can see. But if you add a second feature $$x2 = (x1)^2$$, the resulting 2D dataset is perfectly linearly separable.
<img src='nonLinearSVM.png'>

To implement this idea using Scikit-Learn, you can create a Pipeline containing a PolynomialFeatures transformer, followed by a StandardScaler and a LinearSVC or using SVC with polynomial after standard scaler

Let’s test this on the moons
dataset: this is a toy dataset for binary classification in which the data points are shaped
as two interleaving half circles. You can generate this dataset
using the make_moons() function:
the following are the features of moons dataset
<img src='moon_features.png'>

In [17]:
#from sklearn.pipeline import Pipeline
#from sklearn.preprocessing import PolynomialFeatures

#polynomial_svm_clf =  Pipeline([
 #       ("scaler", StandardScaler()),
  #      ("svm_clf", SVC(kernel="poly", degree=3, coef0=100, C=5))
   # ])



By using the above code but for the dataset of the moon and changing the degree of polynomial from 3 to 100 you will get the following
graphs you  can see the Code in Not_required.ipynb file
<img src='polynomials.png'>

Obviously, if your model is overfitting, you might want to reduce the polynomial degree. Conversely, if it is underfitting, you can try increasing it. 
Note: The Right is overfitting

#### Derivation of Linear SVM Functions
The linear SVM classifier model predicts the class of a new instance x by simply computing
the decision function
$$w^T.x+b=w1x1+w2x2+...+b$$

if the result is positive, the predicted class ŷ is the positive class target variable=1
else it is the negative class
<img src='equation.png'>

The following figure two-dimensional plane since this dataset has two features (petal width and petal length). The decision boundary is the set of points where the decision function is equal to 0: it is the intersection of two planes, which is a straight line (represented by the thick solid line). While, The dashed lines represent the points where the decision function is equal to 1 or –1: they are parallel and at equal distance to the decision boundary, forming a margin around it. Training a linear SVM classifier means finding the value of w and b that make this margin as wide as possible while avoiding margin violations (hard margin) or limiting them (soft margin).
<img src='decisionfunction.png'>

### Why to min ||w||
Consider the slope of the decision function: it is equal to the norm of the weight vector,
∥ w ∥. If we divide this slope by 2, the points where the decision function is equal
to ±1 are going to be twice as far away from the decision boundary. In other words,
dividing the slope by 2 will multiply the margin by 2. Perhaps this is easier to visualize
in 2D in Figure 5-13. The smaller the weight vector w, the larger the margin.
<img src='weightsdecreased.png'>