#    Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges.
Support Vectors are simply the coordinates of individual observation.
The idea of SVM is simple: 
The algorithm creates a line or a hyperplane which separates the data into classes.
SVM algorithms use a set of mathematical functions that are defined as the kernel.
The function of kernel is to take data as input and transform it into the required form.
These functions can be different types.
For example linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.

Pros: It works really well with a clear margin of separation. It is effective in high dimensional spaces.

Cons: It doesn't perform well when we have large data set because the required training time is higher.

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns

In [2]:
Train = pd.read_csv('SalaryData_Train(1).csv')
Train.head(2)

Unnamed: 0,age,workclass,education,educationno,maritalstatus,occupation,relationship,race,sex,capitalgain,capitalloss,hoursperweek,native,Salary
0,39,State-gov,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K


In [3]:
Test = pd.read_csv('SalaryData_Test(1).csv')
Test.head(2)

Unnamed: 0,age,workclass,education,educationno,maritalstatus,occupation,relationship,race,sex,capitalgain,capitalloss,hoursperweek,native,Salary
0,25,Private,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K


In [4]:
string_columns = ["workclass","education","maritalstatus","occupation","relationship","race","sex","native"]

In [5]:
#Preprocessing the data. As, there are categorical variables
from sklearn.preprocessing import LabelEncoder
number = LabelEncoder()
for i in string_columns:
        Train[i]= number.fit_transform(Train[i])
        Test[i]=number.fit_transform(Test[i])

In [6]:
#Capturing the column names which can help in futher process
colnames = Train.columns
colnames

len(colnames)

14

In [7]:
x_train = Train[colnames[0:13]]
y_train = Train[colnames[13]]
x_test = Test[colnames[0:13]]
y_test = Test[colnames[13]]

In [8]:
#Normalmization
def norm_func(i):
    x = (i-i.min())/(i.max()-i.min())
    return (x)
x_train = norm_func(x_train)
x_test =  norm_func(x_test)

In [9]:
from sklearn.svm import SVC

In [10]:
model_linear = SVC(kernel = "linear")
model_linear.fit(x_train,y_train)

SVC(kernel='linear')

In [11]:
pred_test_linear = model_linear.predict(x_test)

In [12]:
np.mean(pred_test_linear==y_test)

0.8098273572377158

In [13]:
#Accuracy = 81%

In [14]:
#polynomial kernel
model_poly = SVC(kernel = "poly")
model_poly.fit(x_train,y_train)
pred_test_poly = model_poly.predict(x_test)

In [15]:
np.mean(pred_test_poly==y_test)

0.8435590969455511

In [16]:
#Accuracy = 84%

In [17]:
#radial basis function kernel

model_rbf = SVC(kernel = "rbf")
model_rbf.fit(x_train,y_train)
pred_test_rbf = model_rbf.predict(x_test)

In [18]:
np.mean(pred_test_rbf==y_test) 

0.8432934926958832

In [21]:
#Accuracy = 84%

In [19]:
#sigmoid kernel
model_sig = SVC(kernel = "sigmoid")
model_sig.fit(x_train,y_train)
pred_test_sig = model_rbf.predict(x_test)

In [20]:
np.mean(pred_test_sig==y_test) 

0.8432934926958832

In [None]:
#Accuracy = 84%