# Standardization and Scaling in Data Preprocessing

## Standardization

The basic scaling of the data is to do make it standard so that all the values will be in common range. In standardization, the mean and the variance of the data is zero and one respectively. It always tries to make a normal distribution of the data.

The formula of the standardization is shown below:

z = (column’s value — mean)/standard deviation

<img src="pic/1_ooobvod71cCc4m4tma3rmA.webp">

In [1]:
from sklearn import preprocessing
import numpy as np
#creating a training data
X_train = np.array([[ 4., -3.,  2.], 
                    [ 2.,  2.,  0.], 
                    [ 0.,  -6., 7.]])
#fit the training data
scaler = preprocessing.StandardScaler().fit(X_train)

In [2]:
scaler.mean_

array([ 2.        , -2.33333333,  3.        ])

In [3]:
scaler.scale_

array([1.63299316, 3.29983165, 2.94392029])

In [5]:
X_scaled = scaler.transform(X_train)
X_scaled

array([[ 1.22474487, -0.20203051, -0.33968311],
       [ 0.        ,  1.31319831, -1.01904933],
       [-1.22474487, -1.1111678 ,  1.35873244]])

In [6]:
X_scaled.mean(axis=0)

array([0., 0., 0.])

In [7]:
X_scaled.std(axis=0)

array([1., 1., 1.])

## MinMaxScaler

The MinMaxScaler is another method to scale the data within a range of [0,1]. It keeps the data in original shape.

In [8]:
from sklearn import preprocessing
import numpy as np
#creating a training data
X_train = np.array([[ 4., -3.,  2.], 
                    [ 2.,  2.,  0.], 
                    [ 0.,  -6., 7.]])
min_max_scaler = preprocessing.MinMaxScaler()
X_train_minmax = min_max_scaler.fit_transform(X_train)
X_train_minmax

array([[1.        , 0.375     , 0.28571429],
       [0.5       , 1.        , 0.        ],
       [0.        , 0.        , 1.        ]])

We can see after scaling with MinMaxScaler the data in the range of ‘0’ to ‘1’.

## MaxAbsScaler

This is another scaling method in which the data is in the range of [-1,1]. The benefit of this scaling is that it doesn’t shift or center the data and keeps the sparsity of the data.

In [9]:
from sklearn import preprocessing
import numpy as np
#creating a training data
X_train = np.array([[ 4., -3.,  2.], 
                    [ 2.,  2.,  0.], 
                    [ 0.,  -6., 7.]])
max_abs_scaler = preprocessing.MaxAbsScaler()
X_train_maxabs = max_abs_scaler.fit_transform(X_train)
X_train_maxabs

array([[ 1.        , -0.5       ,  0.28571429],
       [ 0.5       ,  0.33333333,  0.        ],
       [ 0.        , -1.        ,  1.        ]])

We can see after scaling with MaxAbsScaler the data in the range of ‘-1’ to ‘1’.