### SVM for Non-Linear Data Sets

An example of non-linear data is:

![SVM's for Non-Linear Data Sets](./img/non_linear_svm.png)

In this case we cannot find a straight line to separate apples from lemons. So how can we solve this problem. We will use the Kernel Trick!

The basic idea is that when a data set is inseparable in the current dimensions, add another dimension, maybe that way the data will be separable. 

The example above is in 2D and it is inseparable, but maybe in 3D there is a gap between the apples and the lemons, maybe there is a level difference, so apples are on level one and lemons are on level two. In this case we can easily draw a separating hyperplane (in 3D a hyperplane is a plane) between level 1 and 2.

Let's assume that we add another dimension called X3. Another important transformation is that in the new dimension the points are organized using this formula x1² + x2².

If we plot the plane defined by the x² + y² formula, we will get something like this:

![3d_SVM](./img/3d_svm.png)

Now we have to map the apples and lemons (which are just simple points) to this new space. 

What did we do? We just used a transformation in which we added levels based on distance. 

If you are in the origin, then the points will be on the lowest level. As we move away from the origin, it means that we are climbing the hill (moving from the center of the plane towards the margins) so the level of the points will be higher. 

Now if we consider that the origin is the lemon from the center, we will have something like this:

![Transformed SVM](./img/transformed_svm.png)

Now we can easily separate the two classes. These transformations are called kernels.
Popular kernels are: Polynomial Kernel, Gaussian Kernel, Radial Basis Function (RBF), Laplace RBF Kernel, Sigmoid Kernel, Anove RBF Kernel, etc 

Another example would be:

![](./img/1d_svm.png)

After using the kernel and after all the transformations we will get:

![](./img/transformed_1d_kernel.png)

So after the transformation, we can easily delimit the two classes using just a single line.

In real life applications we won’t have a simple straight line, but we will have lots of curves and high dimensions. In some cases we won’t have two hyperplanes which separates the data with no points between them, so we need some trade-offs, tolerance for outliers. 

Fortunately the SVM algorithm has a so-called regularization parameter to configure the trade-off and to tolerate outliers.

#### Regularisation

The Regularization Parameter (in python it’s called C) tells the SVM optimization how much you want to avoid miss classifying each training example.

If the C is higher, the optimization will choose smaller margin hyperplane, so training data miss classification rate will be lower.

On the other hand, if the C is low, then the margin will be big, even if there will be miss classified training data examples. This is shown in the following two diagrams:

![](./img/reg_svm.png)

As you can see in the image, when the C is low, the margin is higher (so implicitly we don’t have so many curves, the line doesn’t strictly follows the data points) even if two apples were classified as lemons. When the C is high, the boundary is full of curves and all the training data was classified correctly. 


**Note:** even if all the training data was correctly classified, this doesn’t mean that increasing the C will always increase the precision (because of overfitting).

#### Examples of SVM kernels

- Polynomial kernel
It is popular in image processing.
Equation is:

![](./img/polynomial-kernel.png)

where d is the degree of the polynomial.

- Gaussian kernel
It is a general-purpose kernel; used when there is no prior knowledge about the data. Equation is:

![](./img/gaussian-kernel.png)

- Sigmoid kernel
We can use it as the proxy for neural networks. Equation is

![](./img/sigmoid-kernel.png)

#### Exercise:

- Load the wine-quality dataset. 
- Split the data on train,test split on 80-20 ratio.
- Build a Linear Regression model and Support Vector Machine to predict the dependent column.

In [106]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR

In [98]:
df = pd.read_csv('./data/winequality-red.csv') 
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [79]:
df.describe() #get understanding about the data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0
mean,8.319637,0.527821,0.270976,2.538806,0.087467,15.874922,46.467792,0.996747,3.311113,0.658149,10.422983,5.636023
std,1.741096,0.17906,0.194801,1.409928,0.047065,10.460157,32.895324,0.001887,0.154386,0.169507,1.065668,0.807569
min,4.6,0.12,0.0,0.9,0.012,1.0,6.0,0.99007,2.74,0.33,8.4,3.0
25%,7.1,0.39,0.09,1.9,0.07,7.0,22.0,0.9956,3.21,0.55,9.5,5.0
50%,7.9,0.52,0.26,2.2,0.079,14.0,38.0,0.99675,3.31,0.62,10.2,6.0
75%,9.2,0.64,0.42,2.6,0.09,21.0,62.0,0.997835,3.4,0.73,11.1,6.0
max,15.9,1.58,1.0,15.5,0.611,72.0,289.0,1.00369,4.01,2.0,14.9,8.0


In [99]:
y = df.quality
x = df.drop('quality', axis=1)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [111]:
#build a linear regression model
lr = LinearRegression()
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)
mean_squared_error(y_test, y_pred)

0.39002514396395416

In [112]:
lr.score(x_test, y_test)

0.4031803412796231

In [113]:
svr=SVR()
svr.fit(x_train, y_train)

SVR()

In [114]:
y_pred_svr = svr.predict(x_test)
mean_squared_error(y_test, y_pred_svr)

0.532501080260072

In [64]:
def prepared_data(df):
    X = df.iloc[:,:-1]
    y = df.iloc[:,-1]
    return train_test_split(X, y, test_size=0.2, random_state=42)

In [90]:
df.iloc[:,-1]

0       5
1       5
2       5
3       6
4       5
       ..
1594    5
1595    6
1596    6
1597    5
1598    6
Name: quality, Length: 1599, dtype: int64

In [65]:
x_train, x_test, y_train, y_test = prepared_data(df)

In [110]:
lc = LogisticRegression()
lc.fit(x_train, y_train)
y_pred = lc.predict(x_test)
mean_squared_error(y_test, y_pred)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


0.553125

In [67]:

# scaler.fit(df)

In [92]:
def prepared_data_std(df):
    scaler.fit(df.iloc[:,:-1])
    X = pd.DataFrame(scaler.transform(df.iloc[:,:-1]))
    y = df.iloc[:,-1]
    return train_test_split(X, y, test_size=0.2, random_state=42)

In [93]:
# df_std = pd.DataFrame(scaler.transform(df))
x_train_s, x_test_s, y_train_s, y_test_s = prepared_data_std(df)

In [94]:
y_train_s

493     6
354     6
342     6
834     5
705     5
       ..
1130    6
1294    6
860     5
1459    7
1126    6
Name: quality, Length: 1279, dtype: int64

In [96]:
lc = LogisticRegression(max_iter=10)
lc.fit(x_train_s, y_train_s)
y_pred_s = lc.predict(x_test_s)
mean_squared_error(y_test_s, y_pred_s)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


0.490625