### Standardization (Z-score normalization):

In standardization, also known as Z-score normalization, each feature is scaled such that it has a mean of 0 and a standard deviation of 1.


$$z := \frac {X-\mu} {\sigma}$$ 

where X is the original value,  μ is the mean, and σ is the standard deviation

This is useful when features have different scales and you want to bring them to a common scale.

In [2]:
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Create DataFrame
data = {'total_spent': [337.1, 128.9, 132.4, 251.3, 250.],
        'sales': [22.1, 10.4, 9.3, 18.5, 12.9]}
df = pd.DataFrame(data)

# Extract features
X = df[['total_spent']]

# Standardization
scaler_standard = StandardScaler()
X_standardized = scaler_standard.fit_transform(X)

# Display the results
print("Original Data:")
print(df)
print("\nStandardized Data:")
print(pd.DataFrame(X_standardized, columns=['total_spent']))

Original Data:
   total_spent  sales
0        337.1   22.1
1        128.9   10.4
2        132.4    9.3
3        251.3   18.5
4        250.0   12.9

Standardized Data:
   total_spent
0     1.474555
1    -1.145814
2    -1.101763
3     0.394692
4     0.378330


### Scaling:

Scaling refers to the process of transforming the values of features to a specific range. The range can vary, and common scaling methods include Min-Max scaling.
Min-Max scaling transforms features to a specified range, often [0, 1].

The formula for Min-Max scaling is: $X_{scaled} := \frac {X-{min(X)}} {\max(X)-min(X)} $

In [3]:
from sklearn.preprocessing import MinMaxScaler

# Create DataFrame
data = {'total_spent': [337.1, 128.9, 132.4, 251.3, 250.],
        'sales': [22.1, 10.4, 9.3, 18.5, 12.9]}
df = pd.DataFrame(data)

# Extract features
X = df[['total_spent']]

# Min-Max scaling
scaler_minmax = MinMaxScaler()
X_minmax = scaler_minmax.fit_transform(X)

# Display the results
print("Original Data:")
print(df)
print("\nMin-Max Scaled Data:")
print(pd.DataFrame(X_minmax, columns=['total_spent']))

Original Data:
   total_spent  sales
0        337.1   22.1
1        128.9   10.4
2        132.4    9.3
3        251.3   18.5
4        250.0   12.9

Min-Max Scaled Data:
   total_spent
0     1.000000
1     0.000000
2     0.016811
3     0.587896
4     0.581652


### Normalization:

Normalization is a broader term that can encompass various techniques to scale or transform features. Min-Max scaling is one form of normalization.
In a more general sense, normalization can also refer to transforming features to have a unit norm (e.g., L1 or L2 normalization).

L1 normalization scales features such that the sum of the absolute values is 1: $X_{Normalized} := \frac {X} {\sum|X|} $


L2 normalization scales features such that the Euclidean norm (magnitude) is 1: $X_{Normalized} := \frac {X} {\sqrt\sum|X^2|} $

In [4]:
from sklearn.preprocessing import normalize

# Create DataFrame
data = {'total_spent': [337.1, 128.9, 132.4, 251.3, 250.],
        'sales': [22.1, 10.4, 9.3, 18.5, 12.9]}
df = pd.DataFrame(data)

# Extract features
X = df[['total_spent']]

# L1 normalization
X_normalized_l1 = normalize(X, norm='l1')

# L2 normalization
X_normalized_l2 = normalize(X, norm='l2')

# Display the results
print("Original Data:")
print(df)
print("\nL1 Normalized Data:")
print(pd.DataFrame(X_normalized_l1, columns=['total_spent']))
print("\nL2 Normalized Data:")
print(pd.DataFrame(X_normalized_l2, columns=['total_spent']))

Original Data:
   total_spent  sales
0        337.1   22.1
1        128.9   10.4
2        132.4    9.3
3        251.3   18.5
4        250.0   12.9

L1 Normalized Data:
   total_spent
0          1.0
1          1.0
2          1.0
3          1.0
4          1.0

L2 Normalized Data:
   total_spent
0          1.0
1          1.0
2          1.0
3          1.0
4          1.0
