## Imports

In [None]:
from sklearn import datasets

## Dataset

For this SVM example, we will be using the default breast cancer dataset from `sklearn` package. This binary dataset contains 30 features and 569 samples.

In [None]:
cancer = datasets.load_breast_cancer()
print(cancer.data.shape)

We can take a look at the features and our target variable.

In [None]:
print("Features: ", cancer.feature_names)
print("Target: ", cancer.target_names)

Taking a look at our features:

In [None]:
print(cancer.data[0:3])

And our target variable:

We can take a quick look at our features. We can start with the first two features: `mean_texture` vs `mean_radius`. We can see that there can be a linear distinction between the two.

In [None]:
import matplotlib.pyplot as plt

plt.scatter(cancer.data[:, 0], cancer.data[:, 1], c = cancer.target, cmap=plt.cm.get_cmap('RdBu_r'))
plt.xlabel('mean_radius',size=15)
plt.ylabel('mean_texture',size=15)
plt.title("mean_texture vs mean_radius",size=15)
plt.show()

## Splitting data

We will be leveraging the `train_test_split` to split our dataset. We could also do this manually by slicing our dataset after sampling. We will split 80/20 train/test. 

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2,random_state=109)

## Modeling

Sklearn has a built in `svm` model. Since this is an easy binary dataset, we can just use a linear kernal. For more complex, a polynomial kernal can be considered. 

In [None]:
from sklearn import svm

classifer = svm.SVC(kernel='linear')
#Model Train
classifer.fit(X_train, y_train)

#Predict on test
y_pred = classifer.predict(X_test)

## Model Metrics

Our SVM has an accuracy of 95%

In [None]:
from sklearn import metrics

# Model metrics
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
print("Precision:",metrics.precision_score(y_test, y_pred))
print("Recall:",metrics.recall_score(y_test, y_pred))
