## Support Vector Machines (SVM)

*https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/
* This SVM tutorial contains code from "Evaluating a Classification Model" post available at http://www.ritchieng.com/machine-learning-evaluate-classification-model/

## The Data

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

https://www.kaggle.com/uciml/pima-indians-diabetes-database/version/1#


In [1]:
# read the data into a Pandas DataFrame
import pandas as pd

df = pd.read_csv('pima_indians_diabetes.csv')
df.head()


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [2]:
# define X and y
X = df[['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age']]

y = df['Outcome']

In [3]:
# split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)

In [16]:
from sklearn import svm

# instantiate model
model = svm.SVC(kernel='linear') 
#model = svm.SVC() 

# fit model
model.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [17]:
# make class predictions for the testing set
y_pred_class = model.predict(X_test)

**Classification accuracy**: percentage of correct predictions

In [18]:
# calculate accuracy
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_pred_class))
print(metrics.roc_auc_score(y_test, y_pred_class))

0.7792207792207793
0.7313131313131312


**How to tune Parameters of SVM?**

* Tuning parameters value for machine learning algorithms effectively improves the model performance. Let’s look at the list of parameters available with SVM.
* sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)
* kernel: We have already discussed about it. Here, we have various options available with kernel like, “linear”, “rbf”, ”poly” and others (default value is “rbf”).  
* “rbf” and “poly” are useful for non-linear hyper-plane. 

In [None]:
# Let's learn & test the model with linear kernel
model = svm.SVC(C=100, kernel='linear') 
model.fit(X_train, y_train)
y_pred_class = model.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred_class))

In [None]:
# Let's learn & test the model with linear kernel
model = svm.SVC(kernel='poly') 
model.fit(X_train, y_train)
y_pred_class = model.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred_class))