# Feature Scaling in Machine Learning

In Data Mining after extraction and cleaning of data, still there are some problems in the data which creates a 
lot of problems during the implementations of the Machine Learning Algorithm or to predict the output.
One of them is the difference between the values of the features of data.
For example, If we take a dataset of real estate to predict the price of the house using area and number of bedrooms.


Here we can say, there is a lot difference between the values of the features and during the implementation 
of the algorithm, one of the features will hide the importance of other one. Many machine learning algorithm that 
are using euclidean distance as a metric to calculate the similarites will fail to give a reasonable recognition to 
the smaller feature.

Here we introduce the new concept to overcome the above problem by using Feature Scaling. It is one of the 
such process in which we transform the data into better version. Feature scaling is done to normalise the features 
in the data set into the finite range.

There are several ways to do feature scaling:-
1. Absolute Maximum Scaling
2. Min-Max Scaling
3. Normalisation
4. Standardization
5. Robust Scaling


## Absolute Maximum Scaling

import numpy as np

#create an feature let area
area = np.array([1600,1800,2500,1200,1200])

#find the maximum in area
max_in_area = area.max()

#divide the whole feature by the maximum
area  = area/max_in_area

#Hence the range of area is in between 0-1
print(area)

## Min-Max Scaling

In [52]:
import numpy as np

#create an feature let area
area = np.array([1600,1800,2500,1200,1200])

#find the maximum in area
max_in_area = area.max()

#find the minimum in area
min_in_area = area.min()

#using formula of Min_Max_Scaling
area  = (area-min_in_area)/(max_in_area-min_in_area)

#Hence the range of area is in between 0-1
print(area)

[0.30769231 0.46153846 1.         0.         0.        ]


## Normalization Method

In [53]:
import numpy as np

#create an feature let area
area = np.array([1600,1800,2500,1200,1200])

#find the maximum in area
max_in_area = area.max()

#find the minimum in area
min_in_area = area.min()

#find the mean in area
mean_in_area = area.mean()

#using formula of Normalization Method
area  = (area-mean_in_area)/(max_in_area-min_in_area)

#Hence the range of area is in between -1 and 1
print(area)

[-0.04615385  0.10769231  0.64615385 -0.35384615 -0.35384615]


## Standard Normalization Method

In [55]:
import numpy as np

#create an feature let area
area = np.array([1600,1800,2500,1200,1200])

#find the mean in area
mean_in_area = area.mean()

#find the standard variance in area
var_of_area = np.var(area)

#using formula of Normalization Method
area  = (area-mean_in_area)/(var_of_area)

#Hence the range of area is in between -1 and 1
print(area)

[-0.00026042  0.00060764  0.00364583 -0.00199653 -0.00199653]


## Robust Scaling

In [None]:
import numpy as np

#create an feature let area
area = np.array([1600,1800,2500,1200,1200])

#find the median in area
median_in_area = area.median()

#find the standard variance in area
var_of_area = np.var(area)

#using formula of Normalization Method
area  = (area-mean_in_area)/(var_of_area)

#Hence the range of area is in between -1 and 1
print(area)