# Standard Scaler

It's a preprocessing technique used to standardize features by removing the mean and scaling to unit variance. It ensures that all features are on a similar scale and have similar variances, which can help improve the performance of the algorithms (especially those that are based on distance measures, such as KNN and SVM, while others like RandomForest may not benefit as much).

It can also have some drawbacks, like when there is an anomaly in the data (random high values), it could cause big inbalances.

In [13]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

In [7]:
iris = load_iris()
X = iris.data
y = iris.target
scaler = StandardScaler()

In [14]:
def predict(X, y, scaling=False):
    if(scaling):
        X = scaler.fit_transform(X)
    
    model = KNeighborsClassifier()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    model.fit(X_train, y_train)
    
    y_pred = model.predict(X_test)
    
    print(confusion_matrix(y_test, y_pred))
    print(classification_report(y_test, y_pred))

In [16]:
predict(X, y, False)

[[14  0  0]
 [ 0 11  1]
 [ 0  0 19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.92      0.96        12
           2       0.95      1.00      0.97        19

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.98        45
weighted avg       0.98      0.98      0.98        45

