# FEATURE SCALING USING SKLEARN

Feature scaling is a technique of normalising large numerical values in small ranges to avoid inconsistencies.

In [1]:
import numpy as np
from sklearn import preprocessing  #for feature scaling

In [3]:
X=np.array([[1,-1,2],
           [2,0,0],
          [0,1,-1]])

In [9]:
x_scaled=preprocessing.scale(X)  #this will do feature scaling in such a way that mean is zero and standard deviation becomes 1.
x_scaled.mean(axis=0).round()

array([0., 0., 0.])

In [8]:
x_scaled.std(axis=0)

array([1., 1., 1.])

In [10]:
#But a big drawback of feature scaling is we have to do scaling of train and well as test data,otherwise there will be inconsistent test data.

# The preprocessing module provides the StandardScaler utility class, which is a quick and easy way to perform the following operation on an array-like dataset

In [11]:
scaler=preprocessing.StandardScaler()  #standard scaling
scaler.fit(X)

StandardScaler()

In [12]:
scaler.transform(X)

array([[ 0.        , -1.22474487,  1.33630621],
       [ 1.22474487,  0.        , -0.26726124],
       [-1.22474487,  1.22474487, -1.06904497]])

In [13]:
#lets take random input
x_test=[[1,1,0]]
scaler.transform(x_test)

array([[ 0.        ,  1.22474487, -0.26726124]])

# Scaling features to a range

An alternative standardization is scaling features to lie between a given minimum and maximum value, often between zero and one, or so that the maximum absolute value of each feature is scaled to unit size. This can be achieved using MinMaxScaler or MaxAbsScaler, respectively.

# min-max scaling

In [15]:
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit(data))
print(scaler.data_max_)
print(scaler.transform(data))
print(scaler.transform([[2, 2]]))


MinMaxScaler()
[ 1. 18.]
[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]]
[[1.5 0. ]]
