# What is SVM?

SVM (Support Vector Machine) is a type of machine learning algorithm used mainly for classification. Classification means categorizing data into different classes or groups.

Imagine you have a set of points on a 2D plane, and you want to separate them into two different categories (for example, red points and blue points). SVM helps you find the best line (or boundary) that divides these two groups of points.

## How does it work?

### Separating the data:
SVM tries to find a line (in 2D) or a plane (in higher dimensions) that best divides the two classes. This line is called a **hyperplane**.

### Maximizing the margin:
The key idea behind SVM is to make sure that the line (hyperplane) is placed in such a way that it has the largest possible distance (**margin**) between the closest points from each class. These closest points are called **support vectors**. So, SVM focuses on the points that are hardest to classify, and it uses these points to decide where to place the boundary.

### Linear or Non-linear:

- **Linear SVM**: If the data can be easily separated by a straight line (or flat surface in higher dimensions), SVM will find that.
- **Non-linear SVM**: If the data cannot be separated by a straight line (for example, if the points form a circle or other complex shape), SVM can use something called a **kernel trick** to transform the data into a higher-dimensional space where it can be separated by a straight line.

## How does SVM choose the best line?
SVM’s goal is to place the line in the middle of the gap between the two classes, ensuring that it maximizes the margin (the distance between the closest data points of each class and the line). This way, the classifier is more likely to correctly classify new data points in the future.

## What is a support vector?
The **support vectors** are the data points that are closest to the hyperplane. These points are important because they are the ones that "support" or determine the position of the dividing line.

## What if the data isn't perfectly separable?
In real life, the data might not always be perfectly separated by a line (or plane). SVM can still handle this by allowing some points to be on the wrong side of the line. This is done by introducing a **soft margin**, which allows some mistakes but still tries to keep the margin as wide as possible. The **penalty parameter** \( C \) helps control how many mistakes the algorithm is allowed to make.

## Advantages of SVM:
- **Effective in high-dimensional spaces**: SVM works well when there are many features or dimensions in your data.
- **Good at handling complex data**: It can handle non-linear data using kernels.
- **Works well with small to medium-sized datasets**.

## Disadvantages of SVM:
- **Computationally expensive**: It can take a long time to train the model on large datasets.
- **Difficult to interpret**: The results aren’t as easy to interpret as some other algorithms (like decision trees).

## Summary:
SVM is a powerful tool for classifying data by finding the best boundary (line or plane) that separates different classes. It works by maximizing the margin between the classes and can handle both simple linear cases and more complex non-linear cases using kernels.


In [13]:
import pandas as pd

df = pd.read_excel("D:\\utils\\DataSets\\Raisin_Dataset.xlsx")
df.head(3)

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
0,87524,442.246011,253.291155,0.819738,90546,0.758651,1184.04,Kecimen
1,75166,406.690687,243.032436,0.801805,78789,0.68413,1121.786,Kecimen
2,90856,442.267048,266.328318,0.798354,93717,0.637613,1208.575,Kecimen


In [14]:
X = df[["Area", "MajorAxisLength", "MinorAxisLength", "Eccentricity", "ConvexArea", "Extent", "Perimeter"]]
y = df["Class"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

In [15]:
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.75      0.80        83
     Kecimen       0.81      0.90      0.85        97

    accuracy                           0.83       180
   macro avg       0.83      0.82      0.82       180
weighted avg       0.83      0.83      0.83       180



array([229], dtype=int32)

In [16]:
from sklearn.svm import SVC
#You can notice that when you use RBF kernel, the number of iterations is less but linear takes more iterations to converge.
#make sure before running you have that computation power to run the linear kernel.

model = SVC(kernel="linear")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.91      0.88      0.90        83
     Kecimen       0.90      0.93      0.91        97

    accuracy                           0.91       180
   macro avg       0.91      0.90      0.90       180
weighted avg       0.91      0.91      0.91       180



array([85005907], dtype=int32)

# Feature Scaling


## Min-Max Scaling (Normalization)

Min-max scaling, also known as min-max normalization, is a data preprocessing technique used to transform numerical data within a specific range.

**Purpose:**

* It's primarily used to bring all numerical features in a dataset to a common scale. This is important because many machine learning algorithms perform better when numerical input variables are scaled to a standard range.
* This is especially helpful for algorithms that are sensitive to the magnitude of features, such as:
    * K-nearest neighbors (KNN)
    * Support vector machines (SVM)
    * Neural networks

**How it Works:**

* The process involves linearly transforming the original data.
* Typically, the data is scaled to a range between 0 and 1.
* The formula used is:
    * `X_scaled = (X - X_min) / (X_max - X_min)`
        * Where:
            * `X_scaled` is the scaled value.
            * `X` is the original value.
            * `X_min` is the minimum value of the feature.
            * `X_max` is the maximum value of the feature.

**Key Considerations:**

* **Sensitivity to Outliers:** Min-max scaling is sensitive to outliers. If your data contains extreme values, they can significantly affect the scaling, leading to a compressed range for the majority of the data.
* **Preservation of Distribution:** It's important to understand that min-max scaling does not change the distribution of the data; it simply rescales it.

In essence, min-max scaling is a valuable tool for ensuring that all features contribute equally to the learning process of a machine learning model.

## Standard Scaler (Standardization)

Standard scaler, or standardization, is another common data preprocessing technique used to transform numerical data. Unlike min-max scaling, standard scaler focuses on transforming data to have a mean of 0 and a standard deviation of 1.

**Purpose:**

* It aims to center the data around zero and scale it to unit variance.
* This is crucial for algorithms that assume data is normally distributed or those sensitive to feature variances (e.g., support vector machines, linear regression with regularization).
* It can help improve the stability and performance of machine learning models.

**How it Works:**

* Standard scaler transforms the data by subtracting the mean and dividing by the standard deviation.
* The formula is:

    $$
    X_{scaled} = \frac{X - \mu}{\sigma}
    $$

    * Where:
        * $X_{scaled}$ is the scaled value.
        * $X$ is the original value.
        * $\mu$ is the mean of the feature.
        * $\sigma$ is the standard deviation of the feature.

**Key Considerations:**

* **Less Sensitive to Outliers:** Standard scaler is generally less sensitive to outliers than min-max scaling because it uses the mean and standard deviation, which are less affected by extreme values. However, very large outliers can still have some influence.
* **Preserves Distribution (Approximately):** While it doesn't guarantee a perfect normal distribution, it attempts to center the data and scale it to a standard deviation of 1, which can make the data more closely resemble a standard normal distribution.
* **Unbounded Range:** The scaled values from standard scaler are not confined to a specific range (like 0 to 1 in min-max scaling). They can be positive or negative and have values beyond the typical -3 to +3 range, particularly if there are outliers.
* **When to Use:** Use standard scaler when your data is approximately normally distributed, or when your model assumes normality. It is also good practice when you are unsure which scaling method to use.



In [17]:


from sklearn.preprocessing import MinMaxScaler,StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)


min_max_scaler = MinMaxScaler()
min_max_scaler.fit(X)


standard_scaler = StandardScaler()
standard_scaler.fit(X)


X_train_mm = min_max_scaler.fit_transform(X_train)
X_test_mm = min_max_scaler.fit_transform(X_test)

X_train_ss = standard_scaler.fit_transform(X_train)
X_test_ss = standard_scaler.fit_transform(X_test)


In [18]:
#scaling using min max scaler
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train_mm, y_train)

y_pred = model.predict(X_test_mm)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.82      0.87      0.84        83
     Kecimen       0.88      0.84      0.86        97

    accuracy                           0.85       180
   macro avg       0.85      0.85      0.85       180
weighted avg       0.85      0.85      0.85       180



array([219], dtype=int32)

In [19]:
#scaling using min max scaler
from sklearn.svm import SVC

model = SVC(kernel="linear")
model.fit(X_train_mm, y_train)

y_pred = model.predict(X_test_mm)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.78      0.87      0.82        83
     Kecimen       0.88      0.79      0.83        97

    accuracy                           0.83       180
   macro avg       0.83      0.83      0.83       180
weighted avg       0.83      0.83      0.83       180



array([196], dtype=int32)

In [20]:
#scaling using standard scaler
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train_ss, y_train)

y_pred = model.predict(X_test_ss)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.88      0.86      0.87        83
     Kecimen       0.88      0.90      0.89        97

    accuracy                           0.88       180
   macro avg       0.88      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([382], dtype=int32)

In [21]:
#scaling using standard scaler
from sklearn.svm import SVC

model = SVC(kernel="linear")
model.fit(X_train_ss, y_train)

y_pred = model.predict(X_test_ss)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.85      0.84      0.85        83
     Kecimen       0.87      0.88      0.87        97

    accuracy                           0.86       180
   macro avg       0.86      0.86      0.86       180
weighted avg       0.86      0.86      0.86       180



array([1214], dtype=int32)

In [22]:
# Easy process
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("svm", SVC())
])

pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.91      0.83      0.87        83
     Kecimen       0.87      0.93      0.90        97

    accuracy                           0.88       180
   macro avg       0.89      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([1214], dtype=int32)