# Lecture 10 Support Vector Classifiers
__MATH 3480__ - Dr. Michael Olson

Reading:
* Geron, Chapter 5

[Machine Learning Landscape](https://raw.githubusercontent.com/drolsonmi/math3480/main/Notes/Images/3480_05_ML_Landscape.png)

## The Concept behind Support Vector Classifiers
* Two datasets have a gap between them
* Draw a line to separate the datasets
  * The distance from the closest datapoint to the separator is known as the __margin__
  * When the separator is in the middle, the margin is maximized for both datasets. This is known as the __maximal margin classifier__ (mmc)
  * This margin has two problems:
    1. Only works if data is linearly separable
    2. Sensitive to outliers - If you have a datapoint from one dataset that is near the other dataset, the mmc is decreased and misplaced. New datapoints near the second dataset could be classified in the first dataset
* Bias/Variance tradeoff
  * If we force all points to be correct, we have *low bias*. However, this overfits the data, so our predictions will often be incorrect, giving us a *high variance*
  * If we allow misclassifications, we have *high bias*, but the predictions are more accurate, giving us a *low variance*
* Allow misclassifications (or *margin violations*)
  * When we allow misclassifications, then we call that margin a __soft margin__
* Determine best soft margin
  * Use cross validation
  
Using a soft margin is a machine learning model known as a __soft margin classifier__, more commonly known as a __Support Vector Classifier__ (SVC)
* With 2-dimensional data, the SVC is a line
* With 3-dimensional data, the SVC is a plane
* ...
* With n dimensions, the SVC has n-1 dimensions

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt

In [None]:
iris = datasets.load_iris()

In [None]:
fig = plt.figure(figsize=(8,8))
#ax = fig.add_subplot(111, projection='3d')
ax = fig.add_subplot()
ax.scatter(#iris['data'][50:,0],
           iris['data'][50:,2],
           iris['data'][50:,3],
           c=iris['target'][:100])
          
ax.set_xlabel('Petal Length (cm)')
ax.set_ylabel('Petal Width (cm)')

## Preprocessing
1. Missing Data - No missing values in this example
2. Encode Categorical Variables - Using original data, no categorical variables
3. Split the data
4. Feature Scaling

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

In [None]:
list(iris)

In [None]:
iris['feature_names']

In [None]:
X = iris['data'][:,(2,3)]
y = (iris['target'] == 2).astype(np.float64)

## SVC model

In [None]:
svc = Pipeline([
    ('scaler', StandardScaler()),
    ('linear_svc', LinearSVC(C=1, loss='hinge'))
])

Description of hyperparameters:
* `C` is the regularization parameter - determines the number of misclassifications
  * High C means we regularize more (allow fewer misclassifications) - smaller margins
  * Low C means we regularize less (allow more misclassifications) - larger margins
* `loss` is the loss function
  * None selected by default - we have to set one to run the model
  * `hinge` is the typical loss type

## Evaluate the model

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))