# Normalize data


## Common modules 
* `MinMaxScaler`: $$\frac{(X - X_{min}) }{(X_{max} - X_{min})}$$
* `StandardScaler`: $$\frac{(x - \mu)}{\sigma}$$
* `Normalizer(norm=l2)`:  Normalize samples individually to unit norm
Consider a vector $(x_1,x_2,...,x_n)$,
    - $l_1$: $$\frac{x_i}{\sum(|x_1|+|x_2|+...+|x_n|)}$$ 
    - $l_2$: $$\frac{x_i}{\sqrt{\sum(x_1^2+x_2^2+...+x_n^2)}}$$
    - max : $$\frac{x_i}{\max(|x_1|,|x_2|,...,|x_n|)}$$ 

## Methods
* fit(X[, y]): Compute the minimum and maximum to be used for later scaling.
* transform(X): Scale features of X according to feature_range.
* fit_transform(X[, y]):Fit to data, then transform it.


In [1]:
from sklearn.preprocessing import MinMaxScaler,Normalizer,StandardScaler


In [29]:
import numpy as np
X = np.array([[4, 1, 2, 2],
     [1, 3, 9, 3],
      [5, 7, 5, 1]])
X

array([[4, 1, 2, 2],
       [1, 3, 9, 3],
       [5, 7, 5, 1]])

In [28]:
X[:,0]

array([4, 1, 5])

### MinMaxScaler

In [20]:
# col1: 
xmin,xmax= 1, 5

In [30]:
(X[:,0]-xmin)/(xmax-xmin)

array([0.75, 0.  , 1.  ])

In [39]:
xmin,xmax= 1, 7

In [40]:
(X[:,1]-xmin)/(xmax-xmin)

array([0.        , 0.33333333, 1.        ])

In [14]:
MinMaxScaler().fit_transform(X)

array([[0.75      , 0.        , 0.        , 0.5       ],
       [0.        , 0.33333333, 1.        , 1.        ],
       [1.        , 1.        , 0.42857143, 0.        ]])

### StandardScaler

In [42]:
(X[:,0]-np.mean(X[:,0]))/np.std(X[:,0])

array([ 0.39223227, -1.37281295,  0.98058068])

In [41]:
StandardScaler().fit_transform(X)

array([[ 0.39223227, -1.06904497, -1.16247639,  0.        ],
       [-1.37281295, -0.26726124,  1.27872403,  1.22474487],
       [ 0.98058068,  1.33630621, -0.11624764, -1.22474487]])

### Normalizer

In [43]:
np.sqrt(np.sum(np.power(X.T[:,0],2)))

5.0

In [44]:
X.T[:,0]/np.sqrt(np.sum(np.power(X.T[:,0],2)))

array([0.8, 0.2, 0.4, 0.4])

In [46]:
Normalizer(norm='l2').fit_transform(X)

array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])

In [47]:
Normalizer(norm='l1').fit_transform(X)

array([[0.44444444, 0.11111111, 0.22222222, 0.22222222],
       [0.0625    , 0.1875    , 0.5625    , 0.1875    ],
       [0.27777778, 0.38888889, 0.27777778, 0.05555556]])

In [48]:
Normalizer(norm='max').fit_transform(X)

array([[1.        , 0.25      , 0.5       , 0.5       ],
       [0.11111111, 0.33333333, 1.        , 0.33333333],
       [0.71428571, 1.        , 0.71428571, 0.14285714]])