# <font color=#6942f5><center>Support Vector Machines</center></font>

Support vector machine is a very popular machine learning algorithm that can be effectively used on classification problems as well as on regression ones. The basic concept is dividing binary classified data into two separate spaces and by the side on which the new data lies decides its belonging. There are more ways to do the division, in which we will look into in the moment. But whichever we use, we are still trying to find the best fitted hyperplane (or multiple hyperplanes) that can most effectively classify our data. 

And what are those support vectors ? When the algo creates the ideal fitted hyperplane in the training data, support vectors are data points that are closest to the hyperplane. The length between the hyperplane and the closest datapoint on each side is called __margin__ - thus we want to fit the hyperplane with the highest margin possible. This is ideal in the nicely divided data which is extremely rare in real life. In most cases you have to count with outliers and a some degree of missclassification. Thus we would like to find the sweet spot in the sensitiveness towards our dataset. For example __maximal margin classifier__ would perform badly with dataset with higher amount of outliers - that is the case generally of course, but due to its principle it applies twice as much here. With this in mind, another very important part is the kernel function used 


### <font color=#6942f5>Kernel functions</font>

Kernel functions decide the way our data is processed. It translates the dataset into the form that will yield us the result we seek. Such mathematical transformations are costly, especially when working with bigger datasets and when diving into more dimensions. So these transofrmations are done through __Kernel trick__ - this allows us to work in the original features space and relieve us the of the calculations in the higher dimensions. Here I will introduce only few often used kernel functions, but there are more and you can even create one yourself:

* __Linear Kernel__ - the basic concept with datasets that are lineary separable 
* __Polynomial Kernel__ - grants additional transformation options to linear kernel function
* __Radiabl Basis Function Kernel__ - general purpose gaussian kernel, it is ideal kernel for data we have no prior knowledge
* __Sigmoid Kernel__ - this kernel is described as equivalent to a two-layer perceptron neural network


### <font color=#6942f5>Parameters</font>

there are many parameters with some specific to each individual kernel. Here we will look into 2 basic ones a'nd as always, I would suggest to look into the other parameters when you will start using kernel you never used

* __Regularization / C (in Python)__ - optimizes the sensitivity for missclassification. 
 * Lower parameter C yields higher generalization - allows higher missclasification in the training dataset and creates smoother division
 * Higher parameter C tries to minimize missclassification - visualy creates more comples division, trying to divide almost every datapoint as a result (this can cause overfitting)


* __Gamma__ - defines how many data points will be considered when learning the model
 * Lower gamma allows further data points from the hyperplane to participate in the learning of the model
 * Higher gamma allows only the closest data points to participate in the learning of the model

##### <font color=#6942f5><center>lets dive into code - classifier</center></font>

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm

In [2]:
datasets_path = ".jupyter\\datasets\\raw\\"
glass_df = pd.read_csv(datasets_path + "glass.csv")

glass_df.head()

Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


In [3]:
glass_df["Type"].unique()

array([1, 2, 3, 5, 6, 7], dtype=int64)

In [4]:
glass_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   RI      214 non-null    float64
 1   Na      214 non-null    float64
 2   Mg      214 non-null    float64
 3   Al      214 non-null    float64
 4   Si      214 non-null    float64
 5   K       214 non-null    float64
 6   Ca      214 non-null    float64
 7   Ba      214 non-null    float64
 8   Fe      214 non-null    float64
 9   Type    214 non-null    int64  
dtypes: float64(9), int64(1)
memory usage: 16.8 KB


Here we have dataset with 6 specific glass types (we just want to build SVM model so we dont need to know which number represents which type, just numbers will do in our case) and composition of each measurement. We will try to test each kernel function with little tweak in the parameters we discussed above and lets see what we can do here. This is very small dataset, where our SVM model could perform well

In [5]:
X = glass_df[["RI", "Na", "Mg", "Al", "Si", "K", "Ca", "Ba", "Fe"]] # I prefer explicit selecting to avoid mistakes than slicing
y = glass_df["Type"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print("X_train size : " + str(X_train.shape) + ", X_test size: " + str(X_test.shape))

X_train size : (171, 9), X_test size: (43, 9)


Alright, we prepared our training and testing data, lets create a loop where we will test kernels and parameters, it will create a lengtly list, but this kind of testing is a good start on our way off understanding how the parameters impact the models overall performence

In [6]:
# here we will create loop, which will create SVM model with kernels and parameters mentioned above
kernels = ["linear", "poly", "rbf", "sigmoid"]
param = [0.1, 0.5, 1, 5] # I will use the same scale for C and for gamma

model = svm.SVC()
for i in range(len(kernels)):
    model.kernel = kernels[i]
    print("\nTesting SVC with " + model.kernel + " kernel:")
    for j in range(len(param)):
        model.C = param[j]
        print("   Parameter C set to " + str(model.C))
        for k in range(len(param)):
            model.gamma = param[k]
            model.fit(X_train, y_train)
            score = model.score(X_test, y_test)
            print("      gamma set to:\t " + str(model.gamma) + "\t score: " + str("{:10.3f}".format(score)))


Testing SVC with linear kernel:
   Parameter C set to 0.1
      gamma set to:	 0.1	 score:      0.488
      gamma set to:	 0.5	 score:      0.488
      gamma set to:	 1	 score:      0.488
      gamma set to:	 5	 score:      0.488
   Parameter C set to 0.5
      gamma set to:	 0.1	 score:      0.674
      gamma set to:	 0.5	 score:      0.674
      gamma set to:	 1	 score:      0.674
      gamma set to:	 5	 score:      0.674
   Parameter C set to 1
      gamma set to:	 0.1	 score:      0.628
      gamma set to:	 0.5	 score:      0.628
      gamma set to:	 1	 score:      0.628
      gamma set to:	 5	 score:      0.628
   Parameter C set to 5
      gamma set to:	 0.1	 score:      0.651
      gamma set to:	 0.5	 score:      0.651
      gamma set to:	 1	 score:      0.651
      gamma set to:	 5	 score:      0.651

Testing SVC with poly kernel:
   Parameter C set to 0.1
      gamma set to:	 0.1	 score:      0.628
      gamma set to:	 0.5	 score:      0.674
      gamma set to:	 1	 score:    

---
If we would run this notebook multiple times, we would get variety of results, depending on the train test split, especialy on small datasets like this. Even though this is a valid way to test your model, there is always some data lost with this way of testing, thats where cross validation can help

##### <font color=#6942f5>cross validation</font>

Cross validation is similar to train_test_split function, but it divides the dataset into n parts and then it run loops, training and testing with each individual part as testing data, we can then mean the values we get back in a list to get the final/average score. This way we are using every data point in the DataFrame. As a drawback, this function is much more computationaly demanding against the split function used above

In [7]:
from sklearn.model_selection import cross_val_score

Now, lets create a function that will run the cross validation on model with specified kernel function and params to loop with, then we will se how different our scores will be. I will try to specify parameters for each kernel according to final results above, if we can get higher score

In [8]:
def test_model (model, params_C, params_gamma, X, y):
    print("Testing SVC with " + model.kernel + " kernel:")
    for i in range(len(params_C)):
        model.C = params_C[i]
        print("   Parameter C set to " + str(model.C))
        for j in range(len(params_gamma)):
            model.gamma = params_gamma[j]
            score = cross_val_score(model, X, y, cv=4)
            print("      gamma set to:\t " + str(model.gamma) + "\t score: " + str("{:10.3f}".format(score.mean())))
    print()

# linear kernel
param_lin_C = [0.5, 1, 3, 5, 10]
param_lin_gamma = [0.5, 1, 5, 15]
lin_model = svm.SVC(kernel="linear")
test_model(lin_model, param_lin_C, param_lin_gamma, X, y)

# polynomial kernel
param_poly_C = [0.01, 0.1, 0.3, 0.7, 1, 10, 20]
param_poly_gamma = [0.1, 0.3, 0.7, 1, 3]
poly_model = svm.SVC(kernel="poly")
test_model(poly_model, param_poly_C, param_poly_gamma, X, y)

# rbf kernel
param_rbf_C = [0.5, 1, 5, 10, 20, 40]
param_rbf_gamma = [0.05, 0.2, 0.5, 1, 3]
rbf_model = svm.SVC(kernel="rbf")
test_model(rbf_model, param_rbf_C, param_rbf_gamma, X, y)


Testing SVC with linear kernel:
   Parameter C set to 0.5
      gamma set to:	 0.5	 score:      0.585
      gamma set to:	 1	 score:      0.585
      gamma set to:	 5	 score:      0.585
      gamma set to:	 15	 score:      0.585
   Parameter C set to 1
      gamma set to:	 0.5	 score:      0.589
      gamma set to:	 1	 score:      0.589
      gamma set to:	 5	 score:      0.589
      gamma set to:	 15	 score:      0.589
   Parameter C set to 3
      gamma set to:	 0.5	 score:      0.622
      gamma set to:	 1	 score:      0.622
      gamma set to:	 5	 score:      0.622
      gamma set to:	 15	 score:      0.622
   Parameter C set to 5
      gamma set to:	 0.5	 score:      0.608
      gamma set to:	 1	 score:      0.608
      gamma set to:	 5	 score:      0.608
      gamma set to:	 15	 score:      0.608
   Parameter C set to 10
      gamma set to:	 0.5	 score:      0.603
      gamma set to:	 1	 score:      0.603
      gamma set to:	 5	 score:      0.603
      gamma set to:	 15	 score:  

Ok, just by eyeing through both cells you can see that cross validation us smoother. I tested my patience and run the notebook few times just to confirm for myself - standard split had various results, sometimes even pretty high differences, but cross validation performed the same. Its obvious due to the math behind it, but you know that feeling, you know it, but still have to test it anyway ! This way you can test your way to the best fitted model you can find. You are never experienced enough to just rely on the first model you create and go on

### <font color=#6942f5><center>Final words</center></font>
Already ? Yea, already, just 8 code cells, thats my new record ! ... I just wanted to point out the basic concept behind the SVM and we used the classifier version + we tried 2 ways to validate our model. When I will find the time, I will come back and add some more models - other than classifier. But for now thats all and as always, big thanks to anybody who took the time to read through this notebook ! Have a great day !