<div align="justify">

Feature Scaling is a technique to standardize the independent features present in the data. It is performed during the data pre-processing to handle highly varying values. If feature scaling is not done then __machine learning algorithm tends to use greater values as higher and consider smaller values as loIr regardless of the unit of the values__. For example it will take 10 m and 10 cm both as same regardless of their unit. In this article I will learn about different techniques which are used to perform feature scaling.

</div>

### __1. Absolute Maximum Scaling__

<div align="justify">

This method of scaling requires two-step:

1. I should first select the maximum absolute value out of all the entries of a particular measure.
2. Then after this I divide each entry of the column by this maximum value.

<div align="center">

![](../images/absolute-maximum-scaling.png)

</div>

After performing the above-mentioned two steps I will observe that each entry of the column lies in the range of -1 to 1. But this method is not used that often the reason behind this is that it is too sensitive to the outliers. And while dealing with the real-world data presence of outliers is a very common thing. 

For the demonstration purpose I will use the __SampleFile dataset__ which store in dataset folders as `SampleFile.csv`. This dataset is a simpler version of the original house price prediction dataset having only two columns from the original dataset. The first five rows of the original data are shown below:

</div>

In [2]:
import pandas as pd

df = pd.read_csv('../datasets/SampleFile.csv')
print(df.head())

   LotArea  MSSubClass
0     8450          60
1     9600          20
2    11250          60
3     9550          70
4    14260          60


<div align="justify">

Now let's apply the first method which is of the absolute maximum scaling. For this first, I are supposed to evaluate the absolute maximum values of the columns.

</div>

In [6]:
import numpy as np

max_vals = np.max(np.abs(df), axis=0)
max_vals

LotArea       215245
MSSubClass       190
dtype: int64

<div align="justify">

Now we are supposed to subtract these values from the data and then divide the results from the maximum values as well. 

</div>

In [7]:
print((df - max_vals) / max_vals)

       LotArea  MSSubClass
0    -0.960742   -0.684211
1    -0.955400   -0.894737
2    -0.947734   -0.684211
3    -0.955632   -0.631579
4    -0.933750   -0.684211
...        ...         ...
1455 -0.963219   -0.684211
1456 -0.938791   -0.894737
1457 -0.957992   -0.631579
1458 -0.954856   -0.894737
1459 -0.953834   -0.894737

[1460 rows x 2 columns]


### __2. Min-Max Scaling__

<div align="justify">

This method of scaling requires below two-step:

1. First we are supposed to find the minimum and the maximum value of the column.
2. Then we will subtract the minimum value from the entry and divide the result by the difference between the maximum and the minimum value.

<div align="center">

![](../images/min-max-scaling.png)

</div>

As we are using the maximum and the minimum value this method is also prone to outliers but the range in which the data will range after performing the above two steps is between 0 to 1.

</div>

In [8]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.03342,0.235294
1,0.038795,0.0
2,0.046507,0.235294
3,0.038561,0.294118
4,0.060576,0.235294


### __3. Normalization__

<div align="justify">

Normalization is the process of adjusting the values of data points so that they all have the same length or size, specifically a length of 1. This is done by dividing each data point by the "length" (called as Euclidean norm) of that data point. Think of it like adjusting the size of a vector so that it fits within a standard size of 1.

The formula for Normalization looks like this:

<div align="center">

![](../images/normalization.png)

</div>

Where:

- $X_i$ is each individual value.
- $\|X\|$ represents the Euclidean norm (or length) of the vector $X$.

</div>

In [9]:
from sklearn.preprocessing import Normalizer

scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.999975,0.0071
1,0.999998,0.002083
2,0.999986,0.005333
3,0.999973,0.00733
4,0.999991,0.004208
