# Support Vector Machines with Scikit-learn
### One of the most popular and widely used supervised machine learning algorithms.

* SVM offers very high accuracy compared to other classifiers such as logistic regression, and decision trees. 


* It is known for its kernel trick to handle nonlinear input spaces. 


* It is used in a variety of applications such as face detection, intrusion detection, classification of emails, news articles and web pages, classification of genes, and handwriting recognition.


##### The classifier separates data points using a hyperplane with the largest amount of margin. 

##### That's why an SVM classifier is also known as a discriminative classifier. 

##### SVM finds an optimal hyperplane which helps in classifying new data points.



<img src='img/svm1.JPG'>

## Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. 

* It can easily handle multiple continuous and categorical variables. 


* SVM constructs a hyperplane in multidimensional space to separate different classes. 


* SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. 


* The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

### Support Vectors
* Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier.

### Hyperplane
* A hyperplane is a decision plane which separates between a set of objects having different class memberships.

### Margin
* A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin.




## How does SVM work?

#### The main objective is to segregate the given dataset in the best possible way. 

* The distance between the either nearest points is known as the margin. 
* The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. 

### SVM searches for the maximum marginal hyperplane in the following steps:

1. Generate hyperplanes which segregates the classes in the best way. Left-hand side figure showing three hyperplanes black, blue and orange. Here, the blue and orange have higher classification error, but the black is separating the two classes correctly.



2. Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.

<img src='img/svm2.JPG'>

# Dealing with non-linear and inseparable planes


Some problems can’t be solved using linear hyperplane, as shown in the figure below (left-hand side).

<br><br>
<img src='img/svm3.JPG'>
* In such situation, SVM uses a **kernel trick** to transform the input space to a higher dimensional space as shown on the right. 


* The data points are plotted on the x-axis and z-axis (Z is the squared sum of both x and y: z=x^2+y^2). 


* Now you can easily segregate these points using linear separation.

# SVM Kernels

The SVM algorithm is implemented in practice using a kernel. 


A kernel transforms an input data space into the required form. 

### Kernel Trick
* SVM uses a technique called the kernel trick. 
    * Here, the kernel takes a low-dimensional input space and transforms it into a higher dimensional space. 

* In other words, you can say that it converts nonseparable problem to separable problems by adding more dimension to it. 

* It is most useful in non-linear separation problem. 

* Kernel trick helps you to build a more accurate classifier.

### Types of Kernels
<br>

* Linear Kernel 
    * A linear kernel can be used as normal dot product any two given observations. 
    * The product between two vectors is the sum of the multiplication of each pair of input values.

<br><br>

* Polynomial Kernel 
    * A polynomial kernel is a more generalized form of the linear kernel. 
    * The polynomial kernel can distinguish curved or nonlinear input space.

<br><br>

* Radial Basis Function Kernel 
    * The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. 
    * RBF can map an input space in infinite dimensional space.

# SVM Classifier Building in Scikit-learn


In the model the building part, you can use the cancer dataset, which is a very famous multi-class classification problem. 

This dataset is computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

The dataset comprises 30 features (mean radius, mean texture, mean perimeter, mean area, mean smoothness, mean compactness, mean concavity, mean concave points, mean symmetry, mean fractal dimension, radius error, texture error, perimeter error, area error, smoothness error, compactness error, concavity error, concave points error, symmetry error, fractal dimension error, worst radius, worst texture, worst perimeter, worst area, worst smoothness, worst compactness, worst concavity, worst concave points, worst symmetry, and worst fractal dimension) and a target (type of cancer).

This data has two types of cancer classes: malignant (harmful) and benign (not harmful). Here, you can build a model to classify the type of cancer. 

The dataset is available in the scikit-learn library or you can also download it from the UCI Machine Learning Library.




In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Import Dataset

In [2]:
from sklearn.datasets import load_breast_cancer

In [3]:
data = load_breast_cancer()

In [4]:
data

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
         1.189e-01],
        [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
         8.902e-02],
        [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
         8.758e-02],
        ...,
        [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
         7.820e-02],
        [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
         1.240e-01],
        [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
         7.039e-02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
        1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
        1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
        1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0

In [5]:
data['data']

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [6]:
data['target_names']

array(['malignant', 'benign'], dtype='<U9')

In [7]:
data['target']

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

In [10]:
df = pd.DataFrame(np.c_[data['data'],data['target']],
                  columns=np.append(data['feature_names'],['target']))

In [12]:
df.columns

Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension',
       'target'],
      dtype='object')