# Support Vector Regression (SVR) in Python using ScikitLearn

In this program we predict the salaries based on level position.

**Dataset Description**

The dataset to this model is composed by three columns and 1o row. We have one feature in the second column, Level Position. Our response is the last column, Salary. Based on the level, we construct a polynomial regression model to predict the salary for a given level position.

# Importing Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing Dataset

In [2]:

dataset = pd.read_csv('Position_Salaries.csv') #Importing the dataset and creating the dataframe
X = dataset.iloc[:, 1:-1].values # Defining the independente variable
y = dataset.iloc[:, -1].values # Defining the dependent variable

## Data Preprocessing

### Feature Scaling

In this model we need to apply feature scaling, for one simple reason, the SVR model is not a linear combination between  $X$ and $y$. There is an other trick behind the method. Then we need to have all independent variables in the same scale to make a good prediction. Note that,  in all situation that we have a linear combination between $X$ and $y$ we do not need to apply feature scaling.

In [3]:
y = y.reshape(len(y),1) # To convert this 1D array into 2D array

In [4]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

## Building the Model - Support Vector Regression (SVR)

**Definition from Scikit-Learn web site**

A support vector machine constructs a hyper-plane or set of hyper-planes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyper-plane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. The figure below shows the decision function for a linearly separable problem, with three samples on the margin boundaries, called “support vectors”:

In [None]:

# Before proceeding with the feature scaling we must change the array y into a 2D
# array. We must do it because the feature scaling object just transforms 2D arrays





# Training the SVR model
# Due to the correlation between X and y in the dataset, we do not need to split
# the dataset into train and test. So here, we are going to train the model on
# the whole dataset. We call a new class in scikitlearn libraary, sklearn.svm
# and we utilize the object SVR.
# In the object SVR we have many arguments, that are model of distributions.
# The better distribution model for this method is Hyperbolic Tangent Kernel.
# But all dataset is a linear combination and follow a normal distribution.
# For this reason, we utilize Gaussian Radial Basis Function (RBF)

from sklearn.svm import SVR
regressor = SVR(kernel='rbf', degree=6)
regressor.fit(X,y)

# Now we can make some predictions, our model is trained on whole dataset.
# Heads-up!!! We can not make a directly prediction we employed a feature scaling.
# Before any prediction, we need come back to the original scale.

# A single prediction.
print(sc_y.inverse_transform(regressor.predict(sc_X.transform([[10]]))))
# Like the salary is the dependent variable to be predict, in this single prediction
# we used the inverse transformation. But be careful, to visualize the results
# you must aplly the the inverse transformation to the all variables.

# Visualizing the results
X_os = sc_X.inverse_transform(X)
y_os = sc_y.inverse_transform(y)

plt.title('Truth or Bluff (Support Vector Regression)')
plt.scatter(X_os, y_os, color = 'red')
# plotting the model with predictions and your transformation
plt.plot(X_os, sc_y.inverse_transform(regressor.predict(X)), color = 'blue')
plt.xlabel('Position level')
plt.ylabel('Salary')

# Visualising the SVR results (for higher resolution and smoother curve)

X_grid = np.arange(min(X_os), max(X_os), 0.001)
X_grid = X_grid.reshape((len(X_grid), 1))
X_grid_os = sc_X.transform(X_grid)

plt.scatter(X_os,y_os, color = 'red')
plt.plot(X_grid, sc_y.inverse_transform(regressor.predict(X_grid_os)), color = 'blue')
plt.title('Truth or Bluff (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

# =============================================================================
# This program is a simple exemple to demonstre how to use a SVR model.
# The model is different to the linear regression and the theory behind the method
# is very interesting and aid to understand better. I suggest a quickly read of it.
# You most also try to fit a best model using the arguments in the object SVR().
# Enjoy machine learn.
# =============================================================================