# Feature Scaling

<b>Feature Scaling</b> is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

Example: If an algorithm is not using feature scaling method then it can consider the value 3000 meter to be greater than 5 km but that’s actually not true and in this case, the algorithm will give wrong predictions. So, we use Feature Scaling to bring all values to same magnitudes and thus, tackle this issue.

Techniques to perform Feature Scaling
Consider the two most important ones:

<b>Min-Max Normalization</b>: This technique re-scales a feature or observation value with distribution value between 0 and 1.

<img src="min-max-normalisation.jpg"  width="200" height="100" >

<b>Standardization</b>: It is a very effective technique which re-scales a feature value so that it has distribution with 0 mean value and variance equals to 1.

<img src="standardisation.jpg"  width="200" height="100" >

In [1]:
import numpy as np
import pandas as pd

In [2]:
data=pd.read_csv('Feature scaling.csv')

In [3]:
data

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44,72000,No
1,Spain,27,48000,Yes
2,Germany,30,54000,No
3,Spain,38,61000,No
4,Germany,40,1000,Yes
5,France,35,58000,Yes
6,Spain,78,52000,No
7,France,48,79000,Yes
8,Germany,50,83000,No
9,France,37,67000,Yes


In [4]:
x=data.iloc[:,1:3].values
x

array([[   44, 72000],
       [   27, 48000],
       [   30, 54000],
       [   38, 61000],
       [   40,  1000],
       [   35, 58000],
       [   78, 52000],
       [   48, 79000],
       [   50, 83000],
       [   37, 67000]], dtype=int64)

### Feature Scaling using MinMaxScaler

In [5]:
from sklearn.preprocessing import MinMaxScaler

In [6]:
min_max_scaler= MinMaxScaler()

In [7]:
x_after_min_max_scaler= min_max_scaler.fit_transform(x)

In [8]:
print ("\nAfter min max Scaling : \n", x_after_min_max_scaler)


After min max Scaling : 
 [[0.33333333 0.86585366]
 [0.         0.57317073]
 [0.05882353 0.64634146]
 [0.21568627 0.73170732]
 [0.25490196 0.        ]
 [0.15686275 0.69512195]
 [1.         0.62195122]
 [0.41176471 0.95121951]
 [0.45098039 1.        ]
 [0.19607843 0.80487805]]


### Feature Scaling using StandardScaler

In [9]:
from sklearn.preprocessing import StandardScaler

In [10]:
standard_scaler=StandardScaler()

In [11]:
x_after_standard_scaler= standard_scaler.fit_transform(x)

In [12]:
print ("\nAfter Standardisation : \n", x_after_standard_scaler)


After Standardisation : 
 [[ 0.09536935  0.66527061]
 [-1.15176827 -0.43586695]
 [-0.93168516 -0.16058256]
 [-0.34479687  0.16058256]
 [-0.1980748  -2.59226136]
 [-0.56487998  0.02294037]
 [ 2.58964459 -0.25234403]
 [ 0.38881349  0.98643574]
 [ 0.53553557  1.16995867]
 [-0.41815791  0.43586695]]


<b>Conclusion : </b><br>

1. K-Means uses the Euclidean distance measure here feature scaling matters.
2. K-Neareas Neighbors also require feature scaling.
3. PCA tries to get the feature with minimum variance, thus feature scaling is required here too.
4. Probelm where there is Gradient Descent concept or Euclidean distance concept, feature scaling is necessary.
5. Ensemble techniques such as Random Forest, Decision Trees, XGBoost, here we dont need feature scaling.
6. In a nutshell, any algorithm which is not based on <b>Distance</b> is not affected by feature scaling. 

<b>Refernces:</b>

1. https://www.youtube.com/watch?v=goMoUHl8q6c
2. https://www.geeksforgeeks.org/ml-feature-scaling-part-2/?ref=rp