# Linear SVM Classification
<br>
<img src="../img/large_mrg_cls.png" alt="Large Margin Classification" style="width: 500px;"/> <br>
Both these images are can be seperated by a line (linearly seperable). The left plot shows the deciswion boundaries of three possible linear classifiers. The model whose decisioun boundary is seperated by the dashed line is so bad that ist does not even seperate the classes properly. The other two models work perfectly on the training set. but their decision boundairies come so close to the instances that these models will probably not perform as well on new instances. <br><br>
Large margin classification - fitting the widest possible street (represented by the parallel dashed lines) between the classes. <br><br>
Support Vectors - adding more training instances "off the street" will not affect the decision boundary at all: it is fully determined (or "supported") by the instances located on the edge of the street. These are called Support Bectors (they are circles in the image above).<br> <br>
<img src="../img/sensitivity_to_feat_scale.png" alt="Sensitivity to feature scaling" style="width: 500px;"/> <br> 
SVMs are sensitive to to the feature scales. the vertical scale is much larger than the horizontal scale, so the widest possible street is close to hoizontal. 

# Soft Margin Classification
hard margin classification - If we strictly impose that all instances must be off the street and on the right side, this is called hard margin classification. <br>
Hard margin has 2 main issues: first, it only wirks if the data is linearly seperable, second, it is sensitive to outliers. See image below to see issue with hard margin.<br>
<img src="../img/hard_margin_sens.png" alt="Hard Margin Sensitivity" style="width: 500px;"/> <br>
To avoid these issues, use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations. This is called soft margin classification.<br>
<br>
When creating a SVM model using ScikitLearn, you can specify a number of hyperparameters. C is one of those hyperparameters. If youur SVM model is overfitting, you can try regularizing it by reducing C. If w eset it to a low value, then we end up with the model on the left. With a high value, we get the model on the right. Margin violations are bad. It's usually better to have few of them. However, in this case the model on the left has a lot of margin violations but will probably generalize better.<br>
<img src="../img/c_hyperparameter.png" alt="Reducing C Hyperparamert" style="width: 500px;"/><br> 

In [1]:
# Loads Iris dataset, scales the features, adn then trains a linear SVM model (using LinearSVC class with C=1
# and the hinge-loss function) to detect Iris Virginica flowers
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2,3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) #Iris Virginica

svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge"))
])
# Unlike Logistic Regression classifiers, SVM classifiers dow not output probabilities for each class.




# Nonlinear SVM classification
Linear SVM are efficient and work well with many datasets. However, a lot of datasets are not linear at all. To solve this we can add another feature, the resulting 2D dataset will then be perfectly linear, see below.<br>
<img src="../img/nonlinear_svm.png" alt="Nonlinear SVM" style="width: 500px;"/><br>
<img src="../img/linearsvm_using_poly_features.png" alt="LinearSVM using Polynomial Features" style="width: 500px;"/><br>

In [4]:
# To implement a nonlinear SVM, create a pipeline containing a PolynomialFeatures transofrmer., followed by a 
# StandardScaler and a LinearSVC. 
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures

X, y = make_moons(n_samples=100, noise=0.15)
polynomial_svm_clf = Pipeline([
    ("poly_Features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
])

polynomial_svm_clf.fit(X, y)

Pipeline(steps=[('poly_Features', PolynomialFeatures(degree=3)),
                ('scaler', StandardScaler()),
                ('svm_clf', LinearSVC(C=10, loss='hinge'))])

# Polynomial Kernels
A low polynomial degree cannot deal with very complex datasets. A high polynomial degree creates a huge number of features, making the model too slow. <br>
However, while using SVMs, you can apply a mathematical technique called the kernel trick.<br>
Kernel Trick - Makess it possible to get the same results if you had added many polynomia features, even with high-degree polynomials, without actually having to add them. Can be implemented by the SVC class. <br>
<br>
The code below's graph: <br>
<img src="../img/svm_clf_poly_kernel.png" alt="Polynomial clf with polynomial kernel" style="width: 500px;">

In [5]:
# Trains a SVM classifier using a third-degree polynomial kernel.
from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf.fit(X, y)

Pipeline(steps=[('scaler', StandardScaler()),
                ('svm_clf', SVC(C=5, coef0=1, kernel='poly'))])

# Similarity Features
Another technique to tackle nonlinear problems is to add features computed using a similarity function.<br>
Similarity Function - Measures how much each instance resembles a particular landmark.<br>


# SVM Regression
To use SVM for regression instead of classification, the trick is to reverse the objective: instead of trying to fit the largest possible street between two classes while limiting margin violations, SVM regression tries to fit as many instances as possible on the street while limiting margin violations (i.e., instances off the street). <br>

<img src="../img/svm_reg.png" alt="SVM Regression" style="width: 500px;"> <br>
Adding more instances within the margin does not effect the model's predictions; thus, the model is said to be &epsilon;-insensitive

In [6]:
# Tackle linear SVM Regression
# LinearSVR class is the regression equivalent to SVC class.
from sklearn.svm import LinearSVR

svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)

LinearSVR(epsilon=1.5)

In [7]:
# Tackle non-linear SVM regression
# SVR class is the regression equivalent to SVC class.
from sklearn.svm import SVR

svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg.fit(X, y)

SVR(C=100, degree=2, kernel='poly')