# Support Vector Machines

Support vector machines (SVM), or as I am going to be using them here support vector classifiers (SVC), are supervised binomial classifiers. They divide data points with a hyperplane that uses the "widest road" method. The data points that define the margins of the hyperplane are the titular support vectors. Some data sets are not easily divisible as-is, so *kernel tricks* are employed which transform the data into a higher dimension where a hyperplane can do its work.

## Import libraries.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Acquire the data.

The dataset we'll use for this lesson is included as part of Scikit-Learn, namely the breast cancer dataset.

In [2]:
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()

Scikit-learn packages its datasets within dictionaries.

In [3]:
cancer.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

See that you can get a description of the dataset with the `DESCR` key.

In [4]:
print(cancer['DESCR'])

Breast Cancer Wisconsin (Diagnostic) Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry 
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.

        

We lack the domain knowledge to make real use of these data, but it'll do well enough for demonstration purposes. The target variable variable of this dataset tracks whether a patient's cancer was malignant (`1`) or benign (`0`).

## Prepare a dataframe.

In [5]:
df_feat = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])
df_feat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
mean radius                569 non-null float64
mean texture               569 non-null float64
mean perimeter             569 non-null float64
mean area                  569 non-null float64
mean smoothness            569 non-null float64
mean compactness           569 non-null float64
mean concavity             569 non-null float64
mean concave points        569 non-null float64
mean symmetry              569 non-null float64
mean fractal dimension     569 non-null float64
radius error               569 non-null float64
texture error              569 non-null float64
perimeter error            569 non-null float64
area error                 569 non-null float64
smoothness error           569 non-null float64
compactness error          569 non-null float64
concavity error            569 non-null float64
concave points error       569 non-null float64
symmetry error             569 

In [6]:
df_target = pd.DataFrame(cancer['target'], columns=['malignant'])

## Poke around the data.

As I said before, without thorough domain knowledge, just throwing the data at some plots will be less than illuminating, so we'll eschew this for now. The later project will offer you more opportunities to make beautiful plots.

## Split the data.

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), test_size=0.3, random_state=101)

## Train the support vector classifier.

In [8]:
from sklearn.svm import SVC

svc = SVC()
svc.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

## Make predictions, perform evaluations.

This might not go the way you expect it to.

In [9]:
from sklearn.metrics import classification_report, confusion_matrix

predictions = svc.predict(X_test)
print(confusion_matrix(y_test, predictions))
print('\n')
print(classification_report(y_test, predictions))

[[  0  66]
 [  0 105]]


             precision    recall  f1-score   support

          0       0.00      0.00      0.00        66
          1       0.61      1.00      0.76       105

avg / total       0.38      0.61      0.47       171



  'precision', 'predicted', average, warn_for)


What a mess! Everyting was classified as a single class - `0` in this case. The parameters of this model need be adjusted. But how?

## Perform a grid search.

The range of values that each parameter might usefully take is large. Furthermore, the combinations of all these values is enormous and it's not practical to type it all out manually. Fortunately, there is a tool made just for this kind of situation, called grid search.

Scikit-learn's GridSearchCV submodule (the CV stands for *cross-validation*) allows us to search for the optimal combination of parameters easily, if not exactly quickly.

One of the great things about GridSearchCV is that it is a meta-estimator. It takes an estimator like SVC, and creates a new estimator, that behaves exactly the same - in this case, like a classifier. You should add refit=True and choose verbose to whatever number you want, higher the number, the more verbose (verbose just means the text output describing the process).

GridSearchCV will do a fit of all possible combinations in the parameter grid with cross validation and select the best one. Then, it does another fit of all of the data without cross validation, building a single model with the best possible combination of parameters.

In [10]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10, 100, 1000],
              'gamma': [1, 0, 0.1, 0.01, 0.001, 0.0001],
              'kernel': ['rbf', 'linear']}

grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=1)
grid.fit(X_train, y_train)

Fitting 3 folds for each of 60 candidates, totalling 180 fits


[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed:  2.8min finished


GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf', 'linear']},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=1)

In [17]:
grid.best_params_

{'C': 100, 'gamma': 1, 'kernel': 'linear'}

In [19]:
grid.best_estimator_

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [20]:
grid_predictions = grid.predict(X_test)

print(confusion_matrix(y_test, grid_predictions))
print(classification_report(y_test, grid_predictions))

[[ 60   6]
 [  2 103]]
             precision    recall  f1-score   support

          0       0.97      0.91      0.94        66
          1       0.94      0.98      0.96       105

avg / total       0.95      0.95      0.95       171



With the optimal parameters chosen, this model is performing extremely well!