# The Importance of Feature Scaling in Machine Learning ⚖️

**Feature Scaling** is a critical preprocessing step in many machine learning workflows. It involves standardizing the range of the independent variables or features of the data.

### Why is Scaling Important?

Many machine learning algorithms, especially those that are **distance-based** like Support Vector Machines (SVMs), K-Nearest Neighbors (KNN), and Principal Component Analysis (PCA), are sensitive to the scale of the input features.

Consider our Raisin dataset, where `Area` can be in the tens of thousands, while `Eccentricity` is a value between 0 and 1. Without scaling, the `Area` feature would completely dominate any distance-based calculation, and the model might incorrectly assume it's a much more important feature than `Eccentricity`, simply because its numbers are larger.

Scaling brings all features to a similar magnitude, ensuring that each one contributes fairly to the model's learning process.

In this notebook, we will use **`StandardScaler`**, a common technique that transforms the data so that it has a **mean of 0** and a **standard deviation of 1**. The formula for this is:
$$ z = \frac{x - \mu}{\sigma} $$
where $\mu$ is the mean and $\sigma$ is the standard deviation of the feature.

---

## 1. Preparing the Raisin Dataset

First, we load the Raisin dataset and perform our standard train-test split. The data has features with very different scales.


In [1]:
import pandas as pd

df = pd.read_excel('Raisin_Dataset.xlsx')
df.sample(5)

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
696,91464,433.219793,273.255461,0.775982,93852,0.717702,1182.21,Besni
416,33615,254.47223,171.00105,0.740566,35376,0.788049,719.935,Kecimen
174,51941,349.22617,191.817334,0.835649,53893,0.708995,912.259,Kecimen
56,57127,311.644578,238.641921,0.643138,59943,0.693626,952.023,Kecimen
256,61463,369.399745,213.61962,0.815832,63117,0.786777,966.493,Kecimen


### Train-Test Split

In [2]:
X = df[['Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'ConvexArea', 'Extent', 'Perimeter']]
y = df['Class']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

## 2. Applying StandardScaler

Now, we'll use `StandardScaler` from `scikit-learn` to scale our feature data.

> **Note on Best Practice:** In this example, the scaler is fit on the entire dataset (`X`). The ideal practice is to **fit the scaler only on the training data (`X_train`)** and then use that same fitted scaler to transform both `X_train` and `X_test`. This prevents any information from the test set from "leaking" into the training process.



In [3]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler

scaler = StandardScaler()
scaler.fit(X)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

Here's a look at our scaled test data. Notice how the values are now centered around zero.


In [4]:
X_test_scaled

array([[-0.0426657 , -0.14506073,  0.24858118, ..., -0.08106412,
         0.35989352, -0.07215554],
       [-0.43652993, -0.58876265, -0.10176729, ..., -0.47199211,
         0.87296583, -0.59258193],
       [-0.85804906, -0.58924598, -1.23173306, ..., -0.8633864 ,
         0.00766612, -0.80434516],
       ...,
       [-0.91228139, -0.85779301, -0.95531349, ..., -0.86908013,
         0.27931669, -0.84557505],
       [ 0.49085832,  0.58207752,  0.4023557 , ...,  0.46462512,
         1.3787713 ,  0.43838459],
       [-0.89270747, -0.84114895, -0.95288942, ..., -0.87688445,
        -0.14619709, -0.83890868]])

## 3. Training SVMs on Scaled Data

We will now re-train our SVM models using this new scaled data to see how it affects their performance.

### a) RBF Kernel with Scaled Data

The RBF kernel is highly sensitive to the distance between points, so we expect scaling to have a positive impact.


In [5]:
from sklearn.svm import SVC

model = SVC(kernel='rbf')
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.91      0.83      0.87        83
     Kecimen       0.87      0.93      0.90        97

    accuracy                           0.88       180
   macro avg       0.89      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([419], dtype=int32)

**Result:** The accuracy improved from **83% (unscaled) to 88% (scaled)**. Scaling the features allowed the RBF kernel to better distinguish between the classes.


### b) Linear Kernel with Scaled Data

Let's see how scaling affects the linear kernel, which was our best performer on the unscaled data.


In [6]:
from sklearn.svm import SVC

model = SVC(kernel='linear')
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

report = classification_report(y_test, y_pred)
print(report)

model.n_iter_

              precision    recall  f1-score   support

       Besni       0.90      0.84      0.87        83
     Kecimen       0.87      0.92      0.89        97

    accuracy                           0.88       180
   macro avg       0.88      0.88      0.88       180
weighted avg       0.88      0.88      0.88       180



array([2164], dtype=int32)

**Result:** The accuracy slightly decreased from **91% (unscaled) to 88% (scaled)**. This is an interesting outcome. While scaling is generally a best practice, it doesn't *guarantee* a better score for every model. In this case, the unscaled data was already very well-separated by a linear hyperplane, and scaling slightly changed the geometry, making the separation marginally less effective. However, the performance is still very strong.


## 4. Conclusion

* Feature scaling is a critical preprocessing step, especially for distance-based algorithms like SVM with an RBF kernel.
* For the **RBF kernel**, scaling significantly improved the model's accuracy.
* For the **linear kernel**, the impact was minimal on this specific dataset, but scaling is still recommended to ensure all features are treated equally.
* Always **fit your scaler on the training data only** to prevent data leakage and ensure a robust evaluation of your model.