# The Real Difference Between Scaling, Normalization and Log Transformations with Sklearn
## Become a pro in preprocessing your numeric features
![](./images/pexels.jpg)
<figcaption style="text-align: center;">
    <strong>
        Photo by 
        <a href='https://www.pexels.com/@eye4dtail?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>George Becker</a>
        on 
        <a href='https://www.pexels.com/photo/grayscale-photography-of-three-wise-monkey-figurines-134402/?utm_content=attributionCopyText&utm_medium=referral&utm_source=pexels'>Pexels</a>
    </strong>
</figcaption>

### Setup

In [4]:
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(1121218)
warnings.filterwarnings("ignore")

### What does it mean to conform to statistical approaches?

You have probably came across this in courses or articles: 

> The features in the dataset should conform to the statistical assumptions of the models.

What does it mean to conform to statistical assumptions? Many models implemented in Sklearn might perform poorly if the numeric features do not more or less follow a standard Gaussian (normal) distribution. With the exception of tree-based models, the objective function of Sklearn algorithms *assume* the features follow a normal distribution. 

Actually, using the word *assume* would be putting things lightly. For models like K-Nearest-Neighbors, feature transformation is a requirement for the algorithm to perform expectedly:

In [5]:
### HIDE
from sklearn.model_selection import train_test_split

ansur_num = (
    pd.read_csv("data/ansur_male.csv", encoding="latin")
    .select_dtypes(include="number")
    .drop("weightkg", axis=1)
)

X, y = ansur_num.iloc[:, :-1], ansur_num.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [6]:
from sklearn.neighbors import KNeighborsRegressor

# Before feature transforming
knn = KNeighborsRegressor()
knn.fit(X_train, y_train)
print(
    "Test score of KNN before feature transformation: {}".format(
        knn.score(X_test, y_test)
    )
)

Test score of KNN before feature transformation: 0.8663552265106172


In [7]:
from sklearn.preprocessing import StandardScaler

# Transform
ss = StandardScaler()
ss.fit(X_train)
X_train_scaled = ss.transform(X_train)
X_test_scaled = ss.transform(X_test)

# After feature transforming
knn = KNeighborsRegressor()
knn.fit(X_train_scaled, y_train)
print(
    "Test score of KNN after feature transformation: {}".format(
        knn.score(X_test_scaled, y_test)
    )
)

Test score of KNN after feature transformation: 0.9166671893134017


In practice, you may even face scenarios where feature transformations have even larger effect than 5% increase in performance.

There are many techniques you can apply to make your features more or less follow a normal distribution. They differ based on the underlying distributions of each feature. 

In this article, you will learn about 4 of such techniques: Scaling, normalization, logarithmic tranformers and outlier-based scaling. You will develop a practical understanding of their differences and when to apply them in your own workflow.