# Support Vector Machine Classifiers 

Most of our time now has been spent with simple regression algorithms.

Previously, we've only looked at how we can take regression models, identify data distributions that sync with our regression model, and use the model to classify our data.

In this example, we'll be talking about a model that does this at a higher level of abstraction, using some advanced linear mathematics.

We'll be looking at the Support Vector Machine.

So far, we've only been playing with the Iris dataset in SciKit-Learn. We'll continue with using it to convey why SVMs are so powerful.

In [4]:
from sklearn.datasets import load_iris 
import pandas as pd 
import numpy as np
from sklearn.svm import SVC

You're probably wondering why we import the model SVC() and why not SVM()?

Support Vector Machines are an entire subgroup of machine learning models used for much more than data classification. So for our intents and purposes, we want to specify that we only want to do data classification with our data.

## Load in Our Data

In [5]:
iris_data = load_iris()
print(iris_data.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

## Split the data

In [11]:
X, y = load_iris(return_X_y=True)

In [14]:
len(X), len(y)

(150, 150)

In [15]:
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25, random_state=0)
y_train, y_test = np.ravel(y_train), np.ravel(y_test)

## Instantiate and Fit Model 

In [16]:
svc = SVC(kernel="linear", C=1.0,gamma="auto")
svc.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

## Make Predictions

In [None]:
accuracy = np.mean(np.array(svc.predict(X_test)==y_test))
accuracy