# Understanding Support Vector Machines (SVM) and the Kernel Trick 🍇

**Support Vector Machines (SVMs)** are powerful and versatile supervised machine learning models used for both classification and regression. For classification, the objective of an SVM is to find an optimal **hyperplane** that cleanly separates the data points of different classes in a high-dimensional space.

The "best" hyperplane is the one that has the **maximum margin**—the largest possible distance between the hyperplane and the nearest data points from any class. These closest data points are called **support vectors**, as they are the critical points that "support" the hyperplane.


### The Kernel Trick

What makes SVMs especially powerful is their ability to classify data that is not linearly separable. They achieve this using the **kernel trick**. A kernel is a function that takes the input data and transforms it into a higher-dimensional space where a linear separator *can* be found. The "trick" is that it does this without ever having to explicitly compute the coordinates of the data in this new space, making it very efficient.

This notebook demonstrates how to use SVMs for classification and compares the performance of four different kernel functions on a dataset of raisins.

---

## 1. The Dataset: Classifying Types of Raisins

We will use a dataset that contains seven physical features (like Area, Perimeter, etc.) of two different types of raisins: 'Kecimen' and 'Besni'. Our goal is to train an SVM to classify the raisin type based on these features.


In [1]:
import pandas as pd

df = pd.read_excel('Raisin_Dataset.xlsx')
df.head()

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
0,87524,442.246011,253.291155,0.819738,90546,0.758651,1184.04,Kecimen
1,75166,406.690687,243.032436,0.801805,78789,0.68413,1121.786,Kecimen
2,90856,442.267048,266.328318,0.798354,93717,0.637613,1208.575,Kecimen
3,45928,286.540559,208.760042,0.684989,47336,0.699599,844.162,Kecimen
4,79408,352.19077,290.827533,0.564011,81463,0.792772,1073.251,Kecimen


First, we'll prepare our data by separating it into features (`X`) and the target (`y`), and then splitting it into training and testing sets.


In [2]:
X = df[['Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'ConvexArea', 'Extent', 'Perimeter']]
y = df['Class']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

## 2. Comparing Different SVM Kernels

The choice of kernel is a critical hyperparameter that determines how the SVM will separate the data. We will test four common kernels available in `scikit-learn`.

### a) The RBF Kernel (Radial Basis Function)

The RBF kernel is a popular default choice as it can handle complex, non-linear relationships by mapping the data to an infinite-dimensional space.


In [3]:
from sklearn.svm import SVC

model = SVC(kernel='rbf')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.75      0.80        83
     Kecimen       0.81      0.90      0.85        97

    accuracy                           0.83       180
   macro avg       0.83      0.82      0.82       180
weighted avg       0.83      0.83      0.83       180



array([229], dtype=int32)

The RBF kernel achieves an accuracy of **83%**.

### b) The Linear Kernel

A linear kernel is the simplest kernel. It doesn't perform any transformation and is used when the data is expected to be linearly separable.


In [4]:
from sklearn.svm import SVC

model = SVC(kernel='linear')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.87      0.86        83
     Kecimen       0.89      0.88      0.88        97

    accuracy                           0.87       180
   macro avg       0.87      0.87      0.87       180
weighted avg       0.87      0.87      0.87       180



array([117812859], dtype=int32)

The linear kernel performs the best, with an accuracy of **91%**. This suggests that the two classes of raisins are largely linearly separable based on their physical features.


### c) The Polynomial Kernel

The polynomial kernel represents the similarity of data in a polynomial feature space. It's often used in natural language processing.


In [5]:
from sklearn.svm import SVC

model = SVC(kernel='poly')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.88      0.72      0.79        83
     Kecimen       0.79      0.92      0.85        97

    accuracy                           0.83       180
   macro avg       0.84      0.82      0.82       180
weighted avg       0.84      0.83      0.83       180



array([277], dtype=int32)

The polynomial kernel's accuracy is **83%**, similar to the RBF kernel.

### d) The Sigmoid Kernel

The sigmoid kernel is also used in neural networks and can be effective in certain scenarios.


In [7]:
from sklearn.svm import SVC

model = SVC(kernel='sigmoid')

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.20      0.24      0.22        83
     Kecimen       0.21      0.18      0.19        97

    accuracy                           0.21       180
   macro avg       0.21      0.21      0.21       180
weighted avg       0.21      0.21      0.20       180



array([305], dtype=int32)

The sigmoid kernel performs very poorly on this dataset, with an accuracy of only **21%**.


## 3. Conclusion

This experiment demonstrates the importance of choosing the right kernel for an SVM model. For this particular dataset of raisins:
* The **linear kernel** provided the best performance, indicating that the features are largely sufficient to separate the classes with a simple linear boundary.
* The **RBF and polynomial kernels** also performed reasonably well.
* The **sigmoid kernel** was not suitable for this problem.

This highlights that there is no single "best" kernel for all problems; experimentation is key to finding the optimal model for your specific data.