# Feature Scaling

## What is **Feature Scaling**?
Feature scaling is a method used to normalize/scale the range of independent variables or features of data. [2]
The two most discussed scaling methods are Normalization and Standardization. [1]
+ Normalization typically means rescales the values into a range of [0,1]. 
+ Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance). 

## Why **Feature Scaling**?


## Different Scaling Methods
1. Standard Scaling
+ Explanation: Substract data points with mean and divided by the variance.
+ Main Effect: Removes the effect of mean and scales the data to unit variance. 
+ The scaling shrinks the range of the feature values. 
+ Influence by Outliers: However, the outliers have an influence when computing the empirical mean and standard deviation. Note in particular that because the outliers on each feature have different magnitudes, the spread of the transformed data on each feature could be very different. StandardScaler therefore **cannot guarantee balanced feature scales in the presence of outliers**.

2. Min-max Scaling
+ Explanation: Substract data points with minimun values and divided by the range.
+ Main Effect: It is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1]
+ The maximum range of the scaled data is **certain**.
+ Influence by Outliers: Very sensitive to the presence of outliers. For example, when the range is large, the scaled data could shrink a lot more than expectation.

3. Max-abs Scaling
+ Explanation: Divide each data points with its maximun absolute data.
+ Main Effect: This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.
+ Similar to min-max scaling. The maximum range of the scaled data is **certain**.
+ Influence by Outliers: Very sensitive to the presence of outliers. For example, when the range is large, the scaled data could shrink a lot more than expectation.

4. Robust Scaling
+ Explanation: Substract data with the median and scales the data according to the **quantile range** (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).
+ Main Effect: This normalization is based on the fact that for a normal distribution, the interquartile range is approximately 1.349 times the standard deviation. The mean of the scaled data will not be zero, and the information of variance will also be preserved.
+ The maximum range of the scaled data is **uncertain**.
+ Influence by Outliers: The interquartile range is less effected by extremes than the standard deviation. The effect of outliers will not impact the range of the scaled data.

5. Power Transformation
+ Explanation: 
+ Main Effect: Applies a power transformation to each feature to make the data more Gaussian-like in order to stabilize variance and minimize skewness.


## Reference

1. [Normalization vs Standardization — Quantitative analysis](https://towardsdatascience.com/normalization-vs-standardization-quantitative-analysis-a91e8a79cebf)
2. [Wikipedia - Feature Scaling](https://en.wikipedia.org/wiki/Feature_scaling)
3. [Compare the effect of different scalers on data with outliers](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py)

In [None]:
# Feature Scaling