# When to perform feature scaling? - Machine Learning

Implementing Feature Scaling using Sklearn

It is a technique used to bring all the features to the same scale.

Let's say we have 2 features:

Money
Quantity of Milk

The quantity of milk will be in litres which will mostly be in the range of 1–20.
For the same quantity of milk, the money will be in the 100s.<br>
We, has humans understand the scale between these quantities but how do we make the machine understand. The model considers all the values to be on the same scale and gives higher weightage to higher values and lower values in the above example quantity of milk values might be ignored if it's too low.<br>
So in machine learning, we try to bring both the parameters to the same scale and this process is called Feature Scaling.<br>

## Why feature scaling?

In ML algorithms like linear regression, logistic regression which use gradient descent is an optimization technique with different scales of value that will cause different step sizes for each feature.

To ensure it reaches minima smoothly and steps are updated smoothly for all features at the same rate we should scale the data.

Using sklearn it can be done in 3 ways:

Normal Scaling
Standard Scaler
Minmax Scaler

Normal Scaling

In [1]:
import pandas as pd
import numpy as np

In [2]:
from sklearn import datasets
wine = datasets.load_wine()

In [3]:
columnNames = ['Alcohol' , 'Malic acid' , 'Ash' , 'Alcalinity of ash' , 'Magnesium' ,'Total phenols' , 'Flavanoids','Nonflavanoid phenols' ,'Proanthocyanins','Color intensity' ,'Hue' ,'OD280/OD315 of diluted wines' ,'Proline'     ]

In [4]:
X = wine.data
X

array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
        1.185e+03],
       ...,
       [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
        8.350e+02],
       [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
        8.400e+02],
       [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
        5.600e+02]])

In [5]:
wine_df = pd.DataFrame(X, columns = columnNames)
wine_df.head()

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


The wine dataset is a very simple classification dataset. There are 3 types pf wines, (0, 1, 2) and we have 13 features on basis of which they are classified. You may view the names of features by


In [6]:
wine.feature_names

['alcohol',
 'malic_acid',
 'ash',
 'alcalinity_of_ash',
 'magnesium',
 'total_phenols',
 'flavanoids',
 'nonflavanoid_phenols',
 'proanthocyanins',
 'color_intensity',
 'hue',
 'od280/od315_of_diluted_wines',
 'proline']

In [7]:
from sklearn import preprocessing
wine_scaled = preprocessing.scale(X)
wine_scaled

array([[ 1.51861254, -0.5622498 ,  0.23205254, ...,  0.36217728,
         1.84791957,  1.01300893],
       [ 0.24628963, -0.49941338, -0.82799632, ...,  0.40605066,
         1.1134493 ,  0.96524152],
       [ 0.19687903,  0.02123125,  1.10933436, ...,  0.31830389,
         0.78858745,  1.39514818],
       ...,
       [ 0.33275817,  1.74474449, -0.38935541, ..., -1.61212515,
        -1.48544548,  0.28057537],
       [ 0.20923168,  0.22769377,  0.01273209, ..., -1.56825176,
        -1.40069891,  0.29649784],
       [ 1.39508604,  1.58316512,  1.36520822, ..., -1.52437837,
        -1.42894777, -0.59516041]])

In [8]:
wine_df_scaled = pd.DataFrame(wine_scaled, columns = columnNames)
wine_df_scaled.head()

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,1.518613,-0.56225,0.232053,-1.169593,1.913905,0.808997,1.034819,-0.659563,1.224884,0.251717,0.362177,1.84792,1.013009
1,0.24629,-0.499413,-0.827996,-2.490847,0.018145,0.568648,0.733629,-0.820719,-0.544721,-0.293321,0.406051,1.113449,0.965242
2,0.196879,0.021231,1.109334,-0.268738,0.088358,0.808997,1.215533,-0.498407,2.135968,0.26902,0.318304,0.788587,1.395148
3,1.69155,-0.346811,0.487926,-0.809251,0.930918,2.491446,1.466525,-0.981875,1.032155,1.186068,-0.427544,1.184071,2.334574
4,0.2957,0.227694,1.840403,0.451946,1.281985,0.808997,0.663351,0.226796,0.401404,-0.319276,0.362177,0.449601,-0.037874


This is the simplest scaling solution in sklearn.
Features:
This scales in such a way that mean of the entire data is 0.
the standard deviation becomes 1 of the entire data.

In [9]:
wine_scaled.std(axis = 0)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

The problem with this method is that, we have to combine both testing and training data to perform the scale. This will bias the model evaluation because information would have leaked from the test set to the training set.

Also, if we receive new training points in future, they will be scaled according to their own features, irrespective of the value of old data. This will lead to huge problems in your ML model. We need some way to store the method with which scaling was done.

Thus, we usually do not use this.

**Min-Max Scaler**

Normalization requires that you know or are able to accurately estimate the minimum and maximum observable values. 
Transform features by scaling each feature to a given range.

Min-Max Scaler estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

In [10]:
min_max_scaler_object = preprocessing.MinMaxScaler()

In [11]:
min_max_scaler_object.fit(X)
wine_min_max = min_max_scaler_object.transform(X)
wine_min_max

array([[0.84210526, 0.1916996 , 0.57219251, ..., 0.45528455, 0.97069597,
        0.56134094],
       [0.57105263, 0.2055336 , 0.4171123 , ..., 0.46341463, 0.78021978,
        0.55064194],
       [0.56052632, 0.3201581 , 0.70053476, ..., 0.44715447, 0.6959707 ,
        0.64693295],
       ...,
       [0.58947368, 0.69960474, 0.48128342, ..., 0.08943089, 0.10622711,
        0.39728959],
       [0.56315789, 0.36561265, 0.54010695, ..., 0.09756098, 0.12820513,
        0.40085592],
       [0.81578947, 0.66403162, 0.73796791, ..., 0.10569106, 0.12087912,
        0.20114123]])

In [12]:
wine1 = pd.DataFrame(wine_min_max , columns = columnNames)
wine1.head()

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline
0,0.842105,0.1917,0.572193,0.257732,0.619565,0.627586,0.57384,0.283019,0.59306,0.372014,0.455285,0.970696,0.561341
1,0.571053,0.205534,0.417112,0.030928,0.326087,0.575862,0.510549,0.245283,0.274448,0.264505,0.463415,0.78022,0.550642
2,0.560526,0.320158,0.700535,0.412371,0.336957,0.627586,0.611814,0.320755,0.757098,0.375427,0.447154,0.695971,0.646933
3,0.878947,0.23913,0.609626,0.319588,0.467391,0.989655,0.664557,0.207547,0.55836,0.556314,0.308943,0.798535,0.857347
4,0.581579,0.365613,0.807487,0.536082,0.521739,0.627586,0.495781,0.490566,0.444795,0.259386,0.455285,0.608059,0.325963


You now may use the same object, to transform any data in the similar manner.

### **Standard Scaler**

StandardScaler follows Standard Normal Distribution (SND). 

 standardize features by removing the mean and scaling to unit variance. 



In [13]:
standard_scaler_object = preprocessing.StandardScaler()

In [14]:
standard_scaler_object.fit(X)
wine_standard = standard_scaler_object.transform(X)
wine_standard

array([[ 1.51861254, -0.5622498 ,  0.23205254, ...,  0.36217728,
         1.84791957,  1.01300893],
       [ 0.24628963, -0.49941338, -0.82799632, ...,  0.40605066,
         1.1134493 ,  0.96524152],
       [ 0.19687903,  0.02123125,  1.10933436, ...,  0.31830389,
         0.78858745,  1.39514818],
       ...,
       [ 0.33275817,  1.74474449, -0.38935541, ..., -1.61212515,
        -1.48544548,  0.28057537],
       [ 0.20923168,  0.22769377,  0.01273209, ..., -1.56825176,
        -1.40069891,  0.29649784],
       [ 1.39508604,  1.58316512,  1.36520822, ..., -1.52437837,
        -1.42894777, -0.59516041]])

In [15]:
wine2 = pd.DataFrame(wine_standard , columns = columnNames)

In [16]:
wine_standard.std(axis = 0)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

**Most of the time, we implement the standard scalar.**


However both Standard Scalar and Min Max Scaler are sensitive to outliers.
There is another scaler, the Robust Scaler, which gives better results when data has outliers.