# Support Vector Machine (SVM)
An SVM is about determining the best decision boundary to separate classifications.

### Visual Example of Linear SVM
<img src="images/svm/svm_example.png" height="75%" width="75%"></img>

Maximum Margin Hyperplane: The line that separates the two classifications.
- The two support vectors must be equidistant to the maximum margin
    - The sum of the two distances of the support vectors to the maximum margin must be maximized
- It's called a "hyperplane" because it's no longer a line in a non-2D space
    
The points on the negative and positive hyperplanes are called the "support vectors."
- Support vectors are called "vectors" because they're no longer points in a non-2D space

### Support Vector Regression (SVR) vs Support Vector Classification (SVC)
Unlike in SVR which defines the distance of the two support vectors using epsilon, SVC finds the maximum distance of the support vectors to the margin.
- Therefore, there is no such "epsilon" hyperparameter in SVC

### What's So Special About SVMs?
Image we're trying to classify between apples and oranges.

<img src="images/svm/apple_orange_svm_example.png" height="75%" width="75%"></img>
- The most "apply" of apples are circled, which the machine learning model understands
- The most "orangy" of oranges are circled, which the machine learning model understands

The machine learning would very well understand what an apple and an orange is.
- This is because the machine learns from stock apples and stock oranges, nothing out of the ordinary

<hr>

In the case of SVMs, they use the tricky-to-identify apples and tricky-to-identify oranges to construct the maximum margin hyperplane and determine the positive and negative hyperplanes.
- These tricky-to-identify points are actually the "support vectors"

<img src="images/svm/svm_tricky_example.png" height="75%" width="75%"></img>

So how was this SVM Model in the picture constructed? It used the two support vectors!
- It used the "tricky" to identify Apple (the orange-colored apple that looks like an orange) as a support vector
- It also used the "tricky" to identify Orange (look at where the arrow is pointing, the green orange that look like an apple) as a support vector

In a sense, Support Vector Machines rely on the "difficult to classify" vectors to construct the model.

In [2]:
# import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [3]:
# import the data set
ads_df = pd.read_csv("datasets/social_network_ads.csv")

ads_df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [4]:
# x is the Age and Estimated Salary columns
x = ads_df.iloc[:, [2, 3]].values

# y is the Purchased column
y = ads_df.iloc[:, 4].values

In [5]:
# split the data set into training and testing data sets
from sklearn.model_selection import train_test_split 
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

In [6]:
# import a Standarization Scaler for Feature Scaling
from sklearn.preprocessing import StandardScaler

# feature scale the training and testing sets
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)



# Linear Support Vector Machine

In [7]:
# import the support vector classifier class
from sklearn.svm import SVC

In [11]:
# create a linear SVC classifier, then fit to the training set
classifier = SVC(kernel="linear", random_state=0)
classifier.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=0,
  shrinking=True, tol=0.001, verbose=False)

In [12]:
# predict the training set results
y_pred = classifier.predict(x_test)

y_pred

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1])

# Confusion Matrix

In [13]:
# import the confusion matrix function
from sklearn.metrics import confusion_matrix

In [15]:
# create a confusion matrix that compares the y_test (actual) to the y_pred (prediction)
cm = confusion_matrix(y_test, y_pred)

"""
Read the Confusion Matrix diagonally:
66 + 24 = 90 correct predictions
8 + 2 = 10 incorrect predictions
"""
cm

array([[66,  2],
       [ 8, 24]])