# Feature Scaling: Standardization, Normalization, & Others

In this notebook, we discuss feature scaling: standardization and normalization. 

Ref: https://scikit-learn.org/stable/modules/preprocessing.html


Feature Scaling is a necessary, or even essential, step in before training any machine learning models. Why? Many of the models based on mathematical formulas which may not work well with variables of different scales. For example, an attribute such as age is drawn on a very different scale than an attribute such as salary. The latter attribute is typically orders of magnitude larger than the former. As a result, any aggregate function computed on the different features (e.g., Euclidean distances) will be dominated by the attribute of larger magnitude.

Read the following article to learn more about feature scaling. 

https://sebastianraschka.com/Articles/2014_about_feature_scaling.html


## Standardization

Standardization means scaling data with zero mean and unit varience. 

In [1]:
from sklearn import preprocessing
import numpy as np

x_train  = np.array([[1.0, -1.0, 2.0],
                     [2.0, 0.0, 0.0],
                     [0.0, 1.0, -1.0]])


scaler = preprocessing.StandardScaler().fit(x_train)

print("Mean of the dataset features")
print(scaler.mean_)
print("Variance of data")
print(scaler.scale_)
print("-" * 25)
x_scaled = scaler.transform(x_train)
print("# Scaled data:")
print(x_scaled)


print("# Mean of scaled data")
print(x_scaled.mean(axis=0))
print("# Variance of scaled data")
print(x_scaled.std(axis=0))

Mean of the dataset features
[1.         0.         0.33333333]
Variance of data
[0.81649658 0.81649658 1.24721913]
-------------------------
# Scaled data:
[[ 0.         -1.22474487  1.33630621]
 [ 1.22474487  0.         -0.26726124]
 [-1.22474487  1.22474487 -1.06904497]]
# Mean of scaled data
[0. 0. 0.]
# Variance of scaled data
[1. 1. 1.]


## Normalization

Normalization means scaling data to have unit norm. By default, L2-norm is considered. What is a norm? A norm a function of vector/matrix and it sometimes has geometric interpretation. E.g., L2-norm of a vector is essentially an Euclidean distance between a vector from the origin and L1-norm is the sum of the tmagnitudes of the vector elements (aka Manhattan distance or taxicab norm). 

In [2]:
from sklearn import preprocessing
import numpy as np

x_train  = np.array([[1.0, -1.0, 2.0],
                     [2.0, 0.0, 0.0],
                     [0.0, 1.0, -1.0]])

x_normalized = preprocessing.normalize(x_train, axis = 0, norm='l2')  # axis = 0 means column-wise

print('# Scaled & normalized values:')
print(x_normalized)

print("# All the line has unit norm")
print(np.linalg.norm(x_normalized, axis=0) )

# Scaled & normalized values:
[[ 0.4472136  -0.70710678  0.89442719]
 [ 0.89442719  0.          0.        ]
 [ 0.          0.70710678 -0.4472136 ]]
# All the line has unit norm
[1. 1. 1.]


In [3]:
0.4472136 ** 2 + 0.89442719 ** 2

1.000000002236256

## MinMax Scaler

MinMax Scaler converts data to the range [0,1]. We can also use **MaxAbsScalar** for scaling data in the range of [-1,1]

In [4]:
from sklearn import preprocessing
import numpy as np

x_train  = np.array([[1.0, -1.0, 2.0],
                     [2.0, 0.0, 0.0],
                     [0.0, 1.0, -1.0]])


processor = preprocessing.MinMaxScaler()

range_scaled = processor.fit_transform(x_train)

print("Range and Scaled converstion of x_train data")
print(range_scaled)

Range and Scaled converstion of x_train data
[[0.5        0.         1.        ]
 [1.         0.5        0.33333333]
 [0.         1.         0.        ]]
