<h2 align="center">Support Vector Machine Tutorial</h2>

### Problem Statement:  Classify raisins into one of the two categories,
1. Kecimen
1. Besni

### Dataset Citation
This dataset is used under citation guidelines from the original authors. For detailed study and dataset description, see the following references:

- **Citation**: Cinar, I., Koklu, M., & Tasdemir, S. (2020). Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods. *Gazi Journal of Engineering Sciences, 6*(3), 200-209. DOI: [10.30855/gmbd.2020.03.03](https://doi.org/10.30855/gmbd.2020.03.03)
- **Dataset available at**: [Murat Koklu's Dataset Page](https://www.muratkoklu.com/datasets/)
- **Article download**: [DergiPark](https://dergipark.org.tr/tr/download/article-file/1227592)


In [4]:
!pip install openpyxl



In [5]:
import pandas as pd

df = pd.read_excel(r"D:\Coding\Machine Learning\dataset\Raisin_Dataset.xlsx")
df.sample(5)

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
828,84855,420.350624,275.881012,0.75449,90768,0.673805,1172.642,Besni
182,63968,333.012421,247.888546,0.667754,65403,0.757,953.445,Kecimen
729,80274,404.302495,256.062476,0.773871,84523,0.663542,1153.618,Besni
312,65062,344.309794,242.683633,0.709365,65825,0.769218,948.889,Kecimen
372,87937,365.836992,307.911698,0.540002,89581,0.71101,1099.568,Kecimen


In [6]:
df.shape

(900, 8)

There are total 900 records and using all the features that we have available, we will build a classification model by using support vector machine 

### Train Test Split

In [7]:
X = df[["Area", "MajorAxisLength", "MinorAxisLength", "Eccentricity", "ConvexArea", "Extent", "Perimeter"]]
y = df["Class"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

### Scale the Data

In [8]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

### Model Training Using SVM: RBF Kernel: No Scaling

In [9]:
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.75      0.80        83
     Kecimen       0.81      0.90      0.85        97

    accuracy                           0.83       180
   macro avg       0.83      0.82      0.82       180
weighted avg       0.83      0.83      0.83       180



array([229], dtype=int32)

### Model Training Using SVM: RBF Kernel: With Scaling

In [10]:
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.91      0.83      0.87        83
     Kecimen       0.87      0.93      0.90        97

    accuracy                           0.88       180
   macro avg       0.89      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([382], dtype=int32)

As you can see above, after scaling the model performance (accuracy, precision, recall) improves. With scaled data training requires a few more iterations but it is not too big, on the other hand precision, recall, accuracy improvements can be very valuable

### Model Training Using SVM: Linear Kernel: No Scaling

In [11]:
from sklearn.svm import SVC

model = SVC(kernel="linear")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.91      0.88      0.90        83
     Kecimen       0.90      0.93      0.91        97

    accuracy                           0.91       180
   macro avg       0.91      0.90      0.90       180
weighted avg       0.91      0.91      0.91       180



array([85005907], dtype=int32)

### Model Training Using SVM: Linear Kernel: With Scaling

In [12]:
from sklearn.svm import SVC

model = SVC(kernel="linear")
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.90      0.84      0.87        83
     Kecimen       0.87      0.92      0.89        97

    accuracy                           0.88       180
   macro avg       0.88      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([1214], dtype=int32)

For linear kernal, scaling the data reduced the number of iterations drastically (down to 1214 from 85005907). The model performance on the other hand is more or less the same. There is some difference but nothing major.