# Info
Name: Seyed Ali Mirferdos

Student ID: 99201465

# 0. Importing the necessary modules

In [33]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC

# 1. Creating the x and y datasets

In [34]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
colnames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
iris_data = pd.read_csv(url, names=colnames)

In [35]:
X = iris_data.drop(['Class'], axis=1)
y = iris_data['Class']

# 2. Splitting the data

In [36]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=20)

# 3, 4, 5. Creating the models

In [37]:
def create_svc_model(X_train, X_test, y_train, y_test, kernel, **kwargs):
  clf = SVC(kernel=kernel, **kwargs)
  clf.fit(X_train, y_train)
  
  y_pred = clf.predict(X_test)
  
  print(f'{kernel}:')
  print('Classification Report:')
  print(classification_report(y_test, y_pred))
  print('Confusion Matrix:')
  print(confusion_matrix(y_test, y_pred, normalize='true'))

## Linear

The first class is predicted with excellent precision, recall and f1-score but we can see from the confusion matrix that some of points belonging to the second class are incorrectly predicted as the 3rd class.

In [38]:
create_svc_model(X_train, X_test, y_train, y_test, kernel='linear')

linear:
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.92      1.00      0.96        11
 Iris-virginica       1.00      0.91      0.95        11

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.97        30
   weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[1.         0.         0.        ]
 [0.         1.         0.        ]
 [0.         0.09090909 0.90909091]]


## RBF

All the classes have excellent predictions and we have yeilded maxmium accuracy.

In [39]:
create_svc_model(X_train, X_test, y_train, y_test, kernel='rbf')

rbf:
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       1.00      1.00      1.00        11
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

Confusion Matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Sigmoid

This classifier has predicted the label of the first class for all of the data points, thus the precision and recall for the 2nd and 3rd classes are zero. The accuracy is also very low.

In [40]:
create_svc_model(X_train, X_test, y_train, y_test, kernel='sigmoid')

sigmoid:
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       0.27      1.00      0.42         8
Iris-versicolor       0.00      0.00      0.00        11
 Iris-virginica       0.00      0.00      0.00        11

       accuracy                           0.27        30
      macro avg       0.09      0.33      0.14        30
   weighted avg       0.07      0.27      0.11        30

Confusion Matrix:
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]


  _warn_prf(average, modifier, msg_start, len(result))


## Polynomial

By looping over different values for the degree variable, we can see that the best result is obtained from degree=2. The results are completely accurate for this case.

In [41]:
for d in range(1, 11):
  print(d)
  create_svc_model(X_train, X_test, y_train, y_test, kernel='poly', degree=d)
  print('--------------')

1
poly:
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.92      1.00      0.96        11
 Iris-virginica       1.00      0.91      0.95        11

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.97        30
   weighted avg       0.97      0.97      0.97        30

Confusion Matrix:
[[1.         0.         0.        ]
 [0.         1.         0.        ]
 [0.         0.09090909 0.90909091]]
--------------
2
poly:
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       1.00      1.00      1.00        11
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00    

# 6. Comparison of different kernels

First, let's have another look at all the obtained results in one place:

**Linear:**

Classification Report:

```

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       0.92      1.00      0.96        11
 Iris-virginica       1.00      0.91      0.95        11

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.97        30
   weighted avg       0.97      0.97      0.97        30
```

Confusion Matrix:
```
[[1.         0.         0.        ]
 [0.         1.         0.        ]
 [0.         0.09090909 0.90909091]]
 ```


---



**RBF:**

Classification Report:

```

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       1.00      1.00      1.00        11
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30
```

Confusion Matrix:
```
[[1.         0.         0.]
 [0.         1.         0.]
 [0.         0.         1.]]
 ```


---


**Sigmoid:**

Classification Report:

```

                 precision    recall  f1-score   support

    Iris-setosa       0.27      1.00      0.42         8
Iris-versicolor       0.00      0.00      0.00        11
 Iris-virginica       0.00      0.00      0.00        11

       accuracy                           0.27        30
      macro avg       0.09      0.33      0.14        30
   weighted avg       0.07      0.27      0.11        30
```

Confusion Matrix:
```
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]
 ```


---



**Polynomial with degree=2:**

Classification Report:

```

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         8
Iris-versicolor       1.00      1.00      1.00        11
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30
```

Confusion Matrix:
```
[[1.         0.         0.]
 [0.         1.         0.]
 [0.         0.         1.]]
 ```


---

Firstly, the sigmoid kernel works as a piece of trash for this example.

The linear kernel is not very bad but we can see that the RBF and polynomial kernels work best. We can also consider the amount of calculation and complexity of the models. In this case, the polynomial is a better choice than the RBF kernel.

# 7. When to use each kernel

In this section, I'll discuss different kernels and define when each one is preferred.

I'll be using the following resources:
*   [SEVEN MOST POPULAR SVM KERNELS](https://dataaspirant.com/svm-kernels/#t-1608054630726)
*   [Kernel functions](https://scikit-learn.org/stable/modules/svm.html#kernel-functions)
*   [How to Select Support Vector Machine Kernels](https://www.kdnuggets.com/2016/06/select-support-vector-machine-kernels.html)
*   [Support Vector Machine — Simply Explained](https://towardsdatascience.com/support-vector-machine-simply-explained-fee28eba5496)
*   [SVM Kernels: What Do They Actually Do?](https://towardsdatascience.com/svm-kernels-what-do-they-actually-do-56ce36f4f7b8)
*   [How to select kernel for SVM?](https://stats.stackexchange.com/questions/18030/how-to-select-kernel-for-svm?rq=1)

## Linear kernel

*   The formula used for this kernel is $\langle x, x'\rangle$ which can also be written as sum(x.x').

*   Usually it is used as a baseline to check if the data is linearly seperable.

*   In cases, we have similar results with other kernels, this kernel is preferred as it is a simpler model prune to overfitting and also has faster calculations.

*   Usually when there lots of features, it would perform better. The linear kernel is mostly preferred for text-classification problems as most of these kinds of classification problems can be linearly separated.

*   Linear kernel functions are faster than other functions. 

## RBF Kernel

*   The formula used for this kernel is $\exp(-\gamma \|x-x'\|^2)$.

*   It is usually chosen for non-linear data. 

*   It helps to make proper separation when there is no prior knowledge of data. 

*   Generally, this kernel and linear kernel are the best first choices.

## Sigmoid Kernel

*   The formula used for this kernel is $\tanh(\gamma \langle x,x'\rangle + r)$.

*    It is mostly preferred for neural networks. This kernel function is similar to a two-layer perceptron model of the neural network, which works as an activation function for neurons.

*   I haven't really seen this kernel being used in practice. It has a lot of complexity and not preferred for a simple task using SVM.

## Polynomial Kernel

*   The formula used for this kernel is $(\gamma \langle x, x'\rangle + r)^d$.

*    It is actually a more generalized version of the linear kernel. 

*    It is not usually preferred as it is less efficient and accurate.

*    It can be used for both linear and non-linear tasks but as we need a lot of tuning to get a good result, it's usually discarded.