Support Vector Machine is a supervised Machine Learning algorithm widely used for solving different machine learning problems. Given a dataset, the algorithm tries to divide the data using hyperplanes and then makes the predictions. SVM is a non-probabilistic linear classifier. While other classifiers, when classifying, predict the probability of a data point to belong to one group or the another, SVM directly says to which group the datapoint belongs to without using any probability calculation.

## Understanding the Mathematics involved
Let’s take the example of the following dataset and see how can we divide the data into appropriate groups.
<img src='svr/SVM_intution.png'  width="300">

We can see that there are two groups of data. The question is how to divide these points into two groups. It can be done using any of the three lines. Or, for that purpose, there can be an infinite number of straight lines that can divide these points into two classes. Now, which line to choose?
SVM solves this problem using the maximum margin as shown 
<img src='svr/SVM_hyperplane.png' width="600">


The black line in the middle is the optimum classifier. This line is drawn to maximise the distance of the classifier line from the nearest points in the two classes. It is also called a __hyperplane__ in terms of  SVM. 
A _Hyperplane_ is an n-1 dimensional plane which optimally divides the data of n dimensions. Here, as we have only a 2-D data, so the hyperplane can be represented using one dimension only. Hence, the hyperplane is a line here.
The two points (highlighted with circles) which are on the yellow lines, they are called the __support vectors__. As it is a 2-D figure, they are points. In a multi-dimensional space, they will be vectors, and hence, the name- support vector machine as the algorithm creates the optimum classification line by maximising its distance from the two support vectors.

When the data is not linearly separable,  then to create a hyperplane to separate data into different groups, the SVM algorithm needs to perform computations in a higher-dimensional space. But the introduction of new dimensions makes the computations for the SVMs more intensive, which impacts the algorithm performance. To rectify this, mathematicians came up with the approach of Kernel methods. 
Kernel methods use kernel functions available in mathematics. The unique feature of a kernel function is to compute in a higher-dimensional space without calculating the new coordinates in that higher dimension. It implicitly uses predefined mathematical functions to do operations on the existing points which mimic the computation in a higher-dimensional space without adding to the computation cost as they are not actually calculating the coordinates in the higher dimension thereby avoiding the computation of calculating distances from the newly computed points.  This is called the kernel trick.
<img src= "svr/SVM_3D_Hyperplane.png" width="600">
                                                                        Image: bogotobogo.com 


In the left diagram above, we have a non-linear distribution of data as we can not classify a data using a linear equation. To solve this problem, we can project the points in a 3-dimensional space and then derive a plane which divides the data into two parts. In theory, that’s what a kernel function does without computing the additional coordinates for the higher dimension.

## Support Vector Regression

Let’s talk about Linear Regression first. How to determine the best fit line? The idea is to create a line which minimises the total residual error. The SVR approach is a bit different. Instead of trying to minimise the error, SVR focuses on keeping the error in a fixed range. This approach can be explained using three lines. The first line is the best fit regressor line, and the other two lines are the bordering ones which denote the range of error.
<img src="svr/SVR.png" width="700">

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('ML_df.csv')
df.head()

Unnamed: 0,carat,depth,table,x,y,z,Good,Ideal,Premium,Very Good,...,I,J,IF,SI1,SI2,VS1,VS2,VVS1,VVS2,price
0,-1.198168,-0.174092,-1.099672,-1.587837,-1.536196,-1.571129,0,1,0,0,...,0,0,0,0,1,0,0,0,0,326
1,-1.240361,-1.360738,1.585529,-1.641325,-1.658774,-1.741175,0,0,1,0,...,0,0,0,1,0,0,0,0,0,326
2,-1.198168,-3.385019,3.375663,-1.498691,-1.457395,-1.741175,1,0,0,0,...,0,0,0,0,0,1,0,0,0,327
3,-1.071587,0.454133,0.242928,-1.364971,-1.317305,-1.28772,0,0,1,0,...,1,0,0,0,0,0,1,0,0,334
4,-1.029394,1.082358,0.242928,-1.240167,-1.212238,-1.117674,1,0,0,0,...,0,1,0,0,1,0,0,0,0,335


In [3]:
X = df.drop('price',axis=1)
y = df.price

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=33)

In [6]:
from sklearn.svm import SVR

In [7]:
svr = SVR()
svr.fit(x_train,y_train)

SVR()

In [8]:
svr.score(x_train,y_train)

0.501142374605235

In [9]:
svr.score(x_test,y_test)

0.5025588159165701

In [14]:
param_grid = {
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'degree': [3,5,7],
    'gamma' : ['scale', 'auto'],
    'epsilon':[0.01,0.1,0.15,0.2]
    
}

In [15]:
from sklearn.model_selection import RandomizedSearchCV

In [16]:
rand_search = RandomizedSearchCV(estimator=svr,param_distributions=param_grid,cv=5,n_jobs=-1,verbose=3,random_state=33)
rand_search.fit(x_train,y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


RandomizedSearchCV(cv=5, estimator=SVR(), n_jobs=-1,
                   param_distributions={'degree': [3, 5, 7],
                                        'epsilon': [0.01, 0.1, 0.15, 0.2],
                                        'gamma': ['scale', 'auto'],
                                        'kernel': ['linear', 'poly', 'rbf',
                                                   'sigmoid']},
                   random_state=33, verbose=3)

In [17]:
rand_search.best_params_

{'kernel': 'linear', 'gamma': 'auto', 'epsilon': 0.15, 'degree': 7}

In [18]:
best_random_grid = rand_search.best_estimator_

In [19]:
y_pred = best_random_grid.predict(x_test)

In [20]:
rand_search.score(x_train,y_train)

0.8578717721738864

In [21]:
rand_search.score(x_test,y_test)

0.853593213445033

In [22]:
import pickle 

In [23]:
filename = 'svr.pickle'
pickle.dump(rand_search,open(filename,'wb'))