# Support Vector Machines

## What is a SVM? 

A support vector machine is a supervised learning algorithm, that can be used for both classification and regression, but mostly classification. SVMs classify data by finding a hyperplane ("dividing line" that splits the input variables) between the classes in the training data. The hyperplane **maximizes the distance between the hyperplane and the closest data points (the "margin")**.

<img src="https://miro.medium.com/max/469/0*j6b6qNc-E0RfBxFj">

## How does the SVM draw the hyperplane?

The hyperplane is chosen as the dividing line which separates the data points *as widely as possible*, hence why the margin is maximized. First, the SVM draws "Support Vectors", that is, two hyperplanes with one intersecting the first data point of class A and the other intersecting the first data point of class B. Then the final hyperplane is drawn in the middle.

## How are SVM algorithms implemented in practice?

We use something called the *kernel trick* which transformed the lower-dimensional input data set, using linear algebra, into a higher-dimensional space. 

Why? *So it will be easier to find a hyperplane that can separate the data.* 

<img src="https://miro.medium.com/max/700/0*ZnINGVLyQZfrcZYG">

How do we implement the kernel trick? The linear SVM can be transformed by computing the **inner product** of any two given observations. The inner product of two input vectors is the sum of each pair of input values multipled together.

A [kernel method](https://en.wikipedia.org/wiki/Kernel_method) uses kernel functions to transform the input data. Types of kernel functions used in SVM include:
- Linear kernel (as mentioned above, compute the inner product)
- Polynomial kernel
- RBF (Radial Basis Function) kernel


We would need to use polynomial or RBF (more common) if the data set is not linearly separable. 

### Polynomial kernel
Instead of using inner product, we can use a polynomial kernel function to transform the input vectors x_1 and x_2. 

$$ K(x_1, x_2) = (x_1^Tx_2 + c)^d $$

### RBG kernel
Defined mathematically as

$$ K(x_1, x_2) = exp(-\frac{|| x_1 - x_2 ||^2}{2\sigma^2}) $$

And note that the $|| x_1 - x_2||^2$ is the squared Euclidean distance between two feature vectors, and $\sigma$ is a free parameter.

## When would you use SVM over Random Forest?
* When the data set is not linearly separable. Then SVM can use the kernel trick, such as with RBF kernel.
* When the data is very high-dimensional. For example in text classification and other NLP problems.


In [14]:
import numpy as np
import pandas
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import accuracy_score

In [4]:
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

In [20]:
scaler = StandardScaler()
X = scaler.fit_transform(X)
# y = 

In [21]:
## C is the reg. parameter
C = 10

## for GPC
kernel = 1.0 * RBF([1.0, 1.0])

models = {
    'Linear SVC': SVC(kernel='linear', C=C, probability=True, random_state=0),
    'RBF SVC': SVC(kernel='rbf', C=C, probability=True, random_state=0)
    # 'GaussianProcessClassifier': GaussianProcessClassifier(kernel) 
}

n_models = len(models)

In [45]:
def classify_data(X, y, models):
    model_probs = []
    for index, (name, model) in enumerate(models.items()):
        model.fit(X, y)
        y_pred = model.predict(X)
        accuracy = accuracy_score(y, y_pred)
        print(f"Training Accuracy for {name}: {accuracy*100}%")

        """ Returns probability of the sample for each class in the model. The columns correspond             to the classes in sorted order, as they appear in the attribute classes_. """
        probs = model.predict_proba(X)  
        model_probs.append(probs)

        n_classes = np.unique(y_pred).size
        print(n_classes)

    return model_probs

In [46]:
# models['Linear SVC'].classes_

In [47]:
model_probs = classify_data(X, y, models)

Training Accuracy for Linear SVC: 73.2%
2
Training Accuracy for RBF SVC: 98.1%
2


In [48]:
linear_svc_probs = model_probs[0]
linear_svc_probs
### E.g. so the model classifies the first 3 rows as class 1 (the second col) because it has the highest probability

array([[0.20415669, 0.79584331],
       [0.12557568, 0.87442432],
       [0.31298895, 0.68701105],
       ...,
       [0.26609342, 0.73390658],
       [0.0972355 , 0.9027645 ],
       [0.468214  , 0.531786  ]])

# Links

[sklearn RBF example](https://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html#sphx-glr-auto-examples-svm-plot-rbf-parameters-py)

[sklearn svm classif example](https://scikit-learn.org/stable/auto_examples/classification/plot_classification_probability.html#sphx-glr-auto-examples-classification-plot-classification-probability-py)

https://machinelearningmastery.com/support-vector-machines-for-machine-learning/

https://en.wikipedia.org/wiki/Support-vector_machine

https://alekhyo.medium.com/interview-questions-on-svm-bf13e5fbcca8


