##Build your own Machine Learning Model (SVM)

Assignment: Today we are going to try to predict if a tumor is malignant or benign using Support Vector Machines based on certain characteristics of how the tumor looks like.

Information about the dataset can be found here: https://scikit-learn.org/stable/datasets/index.html#breast-cancer-dataset

###Dataset

Here we are importing all the datasets you need :) Please feel free to import other libaries as necessary.

In [2]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm
import numpy as np

Normally, we would have to preprocess the dataset. For this example, let us assume that the data is clean and preprocessed.

Feel free to print the dataset if you would like. 

In [3]:
cancer = datasets.load_breast_cancer()
print(cancer)

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
 

In [4]:
X = cancer.data
y = cancer.target

Split X, y into 80:20 (Training:Validation)

In [8]:
train_features, test_features, train_labels, test_labels = train_test_split(X, y, test_size = 0.2)
len(test_features)

114

###Model

Create a SVM classification model:

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC

Make sure to use a linear kernel. (Hint: It's a parameter for SVM classification model. Please check above documentation.) All other parameters should be default parameters.

In [9]:
model = svm.SVC(C = 10, kernel = "linear")

Fit the model to our training data

In [10]:
model.fit(train_features, train_labels)

SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Determine the accuracy of the model on the testing data

In [11]:
model.score(test_features, test_labels)

0.956140350877193

###Obtaining the best parameters for SVM

We would like to choose the optimal value for C in SVM. Common values of C include 0.001, 0.01, 0.1, 1, 10.

Output the accuracies of the model for all Cs and then in the text box below the code, write which is the best value of C.

Hint: Use lists and for loops

In [12]:
for c in [0.001, 0.01, 0.1, 1, 10]:
  model = svm.SVC(C = c, kernel = "linear")
  model.fit(train_features, train_labels)
  print("c =", c, ":", model.score(test_features, test_labels))

c = 0.001 : 0.9473684210526315
c = 0.01 : 0.956140350877193
c = 0.1 : 0.9298245614035088
c = 1 : 0.9298245614035088
c = 10 : 0.956140350877193


C = 0.01

**Confusion Matrix**

In [13]:
from sklearn.metrics import confusion_matrix

In [17]:
pred_labels = model.predict(test_features)

In [None]:
#confusion matrix returns 
#[[True Negatives, False Positives],
#[False Negatives, True Positives]]

In [18]:
confusion_matrix(test_labels, pred_labels)

array([[38,  5],
       [ 0, 71]])