## Preprocessing

In [1]:
from sklearn import preprocessing

import numpy as np
import pandas as pd

In [2]:
data = np.array([[ 1., -1.,  2.],
                 [ 2.,  0.,  0.],
                 [ 0.,  1., -1.]])
data

array([[ 1., -1.,  2.],
       [ 2.,  0.,  0.],
       [ 0.,  1., -1.]])

###  A) Feature Preprocessing :

In [86]:
scaler = preprocessing.StandardScaler()
scaler.fit_transform(data)

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

### 1. Scaling:

In [11]:
# help(preprocessing.scale)

In [94]:
preprocessing.scale(data)

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

- **"fit"** computes the mean and std to be used for later scaling. (jsut a computation), nothing is given to you. 

- **"transform"** uses a previously computed mean and std to autoscale the data (subtract mean from all values and then divide it by std).
- **"fit_transform"** does both at the same time.


### 2. MinMaxScalar :

In [6]:
# help(preprocessing.MinMaxScaler)

### $$\begin{align*} \frac{MinRange+(MaxRange-MinRange)*({x}_{i}-{x}_{min})}{(x_{max}-x_{min})}\end{align*}$$

In [12]:
(data - data.min(axis=0)) / (data.max(axis=0) - data.min(axis=0))

array([[0.5       , 0.        , 1.        ],
       [1.        , 0.5       , 0.33333333],
       [0.        , 1.        , 0.        ]])

In [13]:
scaler = preprocessing.MinMaxScaler()
scaler = preprocessing.MinMaxScaler(feature_range=(1,5))
scaler.fit_transform(data)

array([[3.        , 1.        , 5.        ],
       [5.        , 3.        , 2.33333333],
       [1.        , 5.        , 1.        ]])

### 2. MaxAbsScaler :
## $$\begin{align*}\frac{x_i}{abs(x_{max})}\end{align*}$$

In [14]:
sc=preprocessing.MaxAbsScaler()
sc.fit_transform(data)

array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])

### 3. Standard Scalar :
 ## <center>$\begin{align*}\frac{x_i-x_{mean}}{std\ of\ feature}\end{align*}$</center>
 

In [15]:
sc=preprocessing.StandardScaler()
sc.fit_transform(data)

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

### 4. Normalizer :

**it works with row**

## <center>$\begin{align*}\frac{x_i}{(sum\ of\ square\ each\ element\ in\ row)^{2}}\end{align*}$</center>


In [16]:
sc = preprocessing.Normalizer()
sc.fit_transform(data)

array([[ 0.40824829, -0.40824829,  0.81649658],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.        ,  0.70710678, -0.70710678]])

### 5. Binarizer :
all values above threshold will be 1 and less or same will be 0

In [17]:
sc = preprocessing.Binarizer()
sc.fit_transform(data)

array([[1., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.]])

![](mg/img-FeatureScalaingPreprocessing/FS.png)