# Numeric Transformers

1. Feature Scaling
2. Polynomial Transformations
3. Discretization

## Feature Scaling

Numerical Features with different scales lead to slower convergence of iterative optimisation procedures >> Thus good practice to do scaling

1. `StandardScaler`
2. `MaxAbsScaler`
3. `MinMaxScaler`

### Standard Scaler
 
 i.e. Standardization (X-mu)/sigma

In [6]:
import numpy as np
data = np.array([[4],[3],[2],[5],[6]])
print(data)

[[4]
 [3]
 [2]
 [5]
 [6]]


In [7]:
# standardization
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
ss.fit_transform(data)

array([[ 0.        ],
       [-0.70710678],
       [-1.41421356],
       [ 0.70710678],
       [ 1.41421356]])

### MinMaxScaler

x- min / max - min

In [8]:
import numpy as np
data = np.array([[15],[2],[5],[-2],[-5]])
print(data)

[[15]
 [ 2]
 [ 5]
 [-2]
 [-5]]


In [10]:
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler()
mms.fit_transform(data)

# largest value is transformed to one and smallest is transformed to 0

array([[1.  ],
       [0.35],
       [0.5 ],
       [0.15],
       [0.  ]])

### MaxAbsScaler

transforms so that all values lie in interval [-1,1]   >>>>       `x/MaxAbsValue`

MaxAbsValue = max(mod(x.max/x.min))

In [14]:
import numpy as np
data = np.array([[4],[2],[5],[-2],[-100]])
print(data)

[[   4]
 [   2]
 [   5]
 [  -2]
 [-100]]


In [16]:
from sklearn.preprocessing import MaxAbsScaler
ma = MaxAbsScaler()
ma.fit_transform(data)


array([[ 0.04],
       [ 0.02],
       [ 0.05],
       [-0.02],
       [-1.  ]])

## Function Transformer

transformation using user defined functions

In [21]:
import numpy as np
data = np.array([[128,2],[2,256],[4,1],[512,64]])
print(data)

[[128   2]
 [  2 256]
 [  4   1]
 [512  64]]


In [27]:
# log2 transformation
from sklearn.preprocessing import FunctionTransformer
ft = FunctionTransformer(np.log2)
ft.fit_transform(data)

array([[7., 1.],
       [1., 8.],
       [2., 0.],
       [9., 6.]])

## Polynomial Transformations

In [36]:
# generates new feature matrix consisting of all the polynomial combinations of features with degree less than or equal to specified degree

# create feature matrix X = [x1 x2] with two features

import numpy as np
x = np.array([[1,2], [3,4]])
print(x)

[[1 2]
 [3 4]]


In [37]:
from sklearn.preprocessing import PolynomialFeatures
pf = PolynomialFeatures(degree = 2)
pf.fit_transform(x)


array([[ 1.,  1.,  2.,  1.,  2.,  4.],
       [ 1.,  3.,  4.,  9., 12., 16.]])

`[1, x1, x2, sq(x1), x1x2, sq(x2)]`

## KbinsDiscretizer

1. Divides continuous variable into bins
2. OneHotEncoding  or ordinal encoding is further applied to the bins

In [38]:
import numpy as np
x = np.array([[0],[0.125],[0.25],[0.375],[0.5],[0.675],[0.75],[0.875],[1]])
print(x)

[[0.   ]
 [0.125]
 [0.25 ]
 [0.375]
 [0.5  ]
 [0.675]
 [0.75 ]
 [0.875]
 [1.   ]]


In [42]:
from sklearn.preprocessing import KBinsDiscretizer
kbd = KBinsDiscretizer(n_bins = 5, 
                      strategy = 'uniform',
                      encode = 'ordinal'
                      )
kbd.fit_transform(x)

array([[0.],
       [0.],
       [1.],
       [1.],
       [2.],
       [3.],
       [3.],
       [4.],
       [4.]])