### ALGORITHM 4: SUPPORT VECTOR MACHINE (SVM) / SUPPORT VECTOR CLASSIFIER

This supervised classification algorithm, plots data points in n-dimensional space, where n is the no. of
features. These data point coordinates are called Support Vectors. Now the points are divided into required
classified groups by a hyperplane. This plane/line should be such that the nearest point from the plane from
each groups should be the farthest from each group. This line clearly divides the data points into required
groups. Now depending on this deciding line, we classify the new data point based on which side these land on.

![svm-1](../docs/svm1.jpg)

The followings are important concepts in SVM −

**Support Vectors** − Data points that are closest to the hyperplane is called support vectors. Separating line will be defined with the help of these data points.

**Hyperplane** − As we can see in the above diagram, it is a decision plane or space which is divided between a set of objects having different classes.

**Margin** − It may be defined as the gap between two lines on the closet data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors. Large margin is considered as a good margin and small margin is considered as a bad margin.

The main goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH) and it can be done in the following two steps −

1. First, SVM will generate hyperplanes iteratively that segregates the classes in best way.
2. Then, it will choose the hyperplane that separates the classes correctly.


In [4]:
# importing required libraries
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# read the train and test dataset
dataset = pd.read_csv("../data/titanic.csv")
train_data, test_data = train_test_split(dataset, test_size=0.2, shuffle=False)

# separate the train X,y and test X,y dataset
train_X = train_data.drop("Survived", axis=1)
train_y = train_data["Survived"]

test_X = test_data.drop("Survived", axis=1)
test_y = test_data["Survived"]


In [5]:
# create the model and train with data
model = SVC()
model.fit(train_X, train_y)

print("test data :")
display(test_data.head())


test data :


Unnamed: 0,Survived,Age,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,SibSp_0,SibSp_1,...,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6,Embarked_C,Embarked_Q,Embarked_S
712,0,35.0,7.125,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
713,0,20.0,7.05,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
714,0,26.0,7.8958,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
715,1,58.0,146.5208,1,0,0,1,0,1,0,...,1,0,0,0,0,0,0,1,0,0
716,1,35.0,83.475,1,0,0,1,0,0,1,...,1,0,0,0,0,0,0,0,0,1


In [6]:
# predict the results
pred_y = model.predict(test_X)
print("predicted survivors : ", pred_y[:5])

# score of the model
score = accuracy_score(test_y, pred_y)
print("score of model      : ", score * 100, "%")


predicted survivors :  [0 0 0 1 1]
score of model      :  72.62569832402235 %
