<h2 style='color:red' align="center">Support Vector Machine</h2>

* Support Vector Machines is considered to be a ``classification approach``, but it can be employed in both types of `classification and regression problems.`
* It can easily ``handle multiple continuous and categorical variables``.
* SVM constructs a ``hyperplane in multidimensional space to separate different classes``.
* SVM generates ``optimal hyperplane in an iterative manner``, which is used to ``minimize an error``.
* The core idea of ``SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes``.

## SVM algorithm can be used for
* Face detection,
* image classification,
* text categorization, etc.

## Support Vectors
* ``Support vectors are the data points``, which are ``closest to the hyperplane``.
* These points will define the ``separating line better by calculating margins``.
* These points are more ``relevant to the construction of the classifier``.

## Hyperplane
* A hyperplane is a `` decision plane which separates between a set of objects having different class memberships.``

## Margin
* A margin is ``a gap between the two lines on the closest class points``.
* This is calculated as ``the perpendicular distance from the line to support vectors or closest points``.
* If the ``margin is larger in between the classes``, then it is **considered a good margin**, a `smaller margin` is a **bad margin**.

## How does SVM work?
* The main objective is to ``segregate the given dataset in the best possible way.``
* The ``distance between the either nearest points is known as the margin.``
* The objective is to ``select a hyperplane with the maximum possible margin between support vectors in the given dataset.``
*  `SVM searches` for the` maximum marginal hyperplane `in the following steps:

> * Generate ``hyperplanes which segregates the classes in the best way``.
> * ``Left-hand side figure showing three hyperplanes black, blue and orange.``
> * Here, the ``blue and orange have higher classification error``, but the **black is separating the two classes correctly**.

> * ``Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.``

![](http://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1526288454/index2_ub1uzd.png)

## Dealing with non-linear and inseparable planes
Some problems ``can’t be solved using linear hyperplane``, as shown in the figure below (left-hand side).

* In such situation, ``SVM uses a kernel trick to transform the input space to a higher dimensional space as shown on the right``.
* The data points are plotted on the ``x-axis and z-axis (Z is the squared sum of both x and y: z=x^2=y^2).``
* Now you`` can easily segregate these points using linear separation.``

![](http://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1526288453/index_bnr4rx.png)

## Classifier Building in Scikit-learn

### Loading Data

In [1]:
#Import scikit-learn dataset library
from sklearn import datasets

#Load dataset
cancer = datasets.load_breast_cancer()


In [2]:
cancer

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
         1.189e-01],
        [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
         8.902e-02],
        [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
         8.758e-02],
        ...,
        [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
         7.820e-02],
        [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
         1.240e-01],
        [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
         7.039e-02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
        1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
        1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
        1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0

### Exploring Data

In [2]:
# print the names of the 13 features
print("Features: ", cancer.feature_names)

# print the label type of cancer('malignant' 'benign')
print("Labels: ", cancer.target_names)


Features:  ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Labels:  ['malignant' 'benign']


In [6]:
# print data(feature)shape
cancer.data.shape

(569, 30)

### Splitting Data

In [4]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.3,random_state=109) # 70% training and 30% test


### Generating Model

In [5]:
#Import svm model
from sklearn import svm

#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel   , Non Linear- rbf

#Train the model using the training sets
clf.fit(X_train, y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)


### Evaluating the Model

In [7]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy: how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))


Accuracy: 0.9649122807017544


#### Well, you got a classification rate of 96.49%, considered as very good accuracy.

## Advantages
* SVM ``Classifiers offer good accuracy and perform faster prediction compared to Naïve Bayes algorithm``.
* They also ``use less memory because they use a subset of training points in the decision phase.``
* SVM ``works well with a clear margin of separation and with high dimensional space.``

## Disadvantages
* ``SVM is not suitable for large datasets because of its high training time`` and it also ``takes more time in training compared to Naïve Bayes.``
* It ``works poorly with overlapping classes and is also sensitive to the type of kernel used.``