# Support Vector Machines

Support Vector Machines (SVMs) are a particularly powerful and flexible class of supervised algorithms for both classification and regression. Suppor Vector Machine is highly preferred by many as it produces significant accuracy with less computation power. In this section, we will develop the intuition behind support vector machines and their use in classification problems.

## What is Support Vector Machine?

The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N - the number of features) that distinctly classifies the data points.

<img src='files/img/hyperplane.png'>

To separate the two classes of data points, there are many possible hyperplanes that could be chosen. Our objective is to find a plane that has the maximum margin, i.e. the maximum distance between data points of both classes. Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence.

## Hyperplanes and Support Vectors

<img src='files/img/hyperplane3d.png'>

Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the number of features exceed 3.

<img src='files/img/supportvectors.jpg'>

Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.

##

In [None]:
import matplotlib.pyplot as plt
import numpy as np

class SVM:
    def __init__(self, visualization=True):
        self.visualization = visualization
        self.colors = {1: 'r', -1: 'b'}
        if self.visualization:
            self.fig = plt.figure()
            self.ax = self.fig.add_subplot()
            
    def fit(self, X, y):
        self.X = X
        self.y = y

        #{||w||: (w, b)}
        opt_dict = {}
        transforms = [[1, 1], [-1, 1], [-1, -1], [1, -1]]
        
        self.max_feature_value = np.max(X)
        self.min_feature_value = np.min(X)
        
        # With smaller learning rate our margins and db will be more precise
        learning_rates = [self.max_feature_value * 0.1,
                          self.max_feature_value * 0.001,
                          self.max_feature_value * 0.0001]
        
        b_range_multiple = 5
        b_step_multiple = 5
        
        latest_optimum = self.max_feature_value * 10
        
        """
        Objective is to satisfy yi*(w.x + b) >= 1 for all training dataset such that ||w|| is minimum.
        For this we will start with random w, and try to satisfy it with making b bigger and bigger.
        """
        # Making step size smaller and smaller to get precise value
        for lrate in learning_rates:
            w = np.array([latest_optimum, latest_optimum])
            
            optimized = False
            while not optimized:
                # b = [-maxvalue to maxvalue] we wanna maximize the b values
                # so check for every b value
                for b in np.arange(-1 * self.max_feature_value * b_range_multiple,
                                   self.max_feature_value * b_range_multiple,
                                   lrate * b_step_multiple):
                    
                    for transformation in transforms:
                        w_t = w * transformation
                        
                        # Every data point should be correct
                        correctly_classified = True
                        for yi in 