# Feature Scaling
When dealing with numeric features, we have specific attributes which may be completely unbounded in nature, like view counts of a video or web page hits. Using the raw values as input features might make models biased toward features having really high magnitude values. These models are typically sensitive to the magnitude or scale of features like linear or logistic regression. Other models like tree based methods can still work without feature scaling. However it is still recommended to normalize and scale down the features with feature scaling, especially if you want to try out multiple Machine Learning algorithms on input features.

### Import Packages

In [2]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
import numpy as np
import pandas as pd
np.set_printoptions(suppress=True)

### Load Data

In [3]:
views = pd.DataFrame([1295., 25., 19000., 5., 1., 300.], columns=['views'])
views

Unnamed: 0,views
0,1295.0
1,25.0
2,19000.0
3,5.0
4,1.0
5,300.0


## Standardized Scaling
The standard scalar tries to standardize each value in a feature column by rmoving the mean and scaling the variance to be 1 form the values. This is also known as centering and scaling and can be denoted mathematically as 

$$SS(X_i)=\frac{X_i - \mu_X}{\sigma_X}$$

where each value in the feature $X$ is subtracted by the mean $\mu_x$ and the resultant is divided by the standard deviation $\sigma_X$. This is also popularly known as Z-score scaling. You can also divide the resultant by the variance instead of the standard deiviation if needed.

In [5]:
ss = StandardScaler()
views['zscore'] = ss.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore
0,1295.0,-0.307214
1,25.0,-0.489306
2,19000.0,2.231317
3,5.0,-0.492173
4,1.0,-0.492747
5,300.0,-0.449877


### Min-Max Scaling
With min-max scaling, we can transform and scale our feature values such that each value is within the range of [0,1]. The `MinMaxScaler` also allows you to specify your own upper and lower bound in the scaled value range using the `feature_range` varibale. Mathematically we can represnet this scaler as:

$$MMS(X_i) = \frac{X_i - min(X)}{max(X)-min(X)}$$

where we scale each value in the feautre X by subtracting it from the minimum value in the feature min(X) and dividing the resultant by the difference between the maximum and minimum values in the feature max(X) - min(X).

In [7]:
mms = MinMaxScaler()
views['minmax'] = mms.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore,minmax
0,1295.0,-0.307214,0.068109
1,25.0,-0.489306,0.001263
2,19000.0,2.231317,1.0
3,5.0,-0.492173,0.000211
4,1.0,-0.492747,0.0
5,300.0,-0.449877,0.015738


### Robust Scaling
The disadvantage of min-max sclaing is that often the presence of outliers affects the scaled values for any feature. Robust scaling tries to use specific statistical measures to scale features without being affected by outliers. Mathmeatically this scaler can be represented as

$$RS(X_i) = \frac{X_i - medina(X}{IQR_{1,3}(X)}$$

where we scale each value of feature X by subtracting the median of X and dividing the resultant by the IQR also know as the Inter-Quartile Range of X which is the range (difference) between the first quartile (25th percentile) and the third quartile (75 percentile).

In [8]:
rs = RobustScaler()
views['robust'] = rs.fit_transform(views[['views']])
views

Unnamed: 0,views,zscore,minmax,robust
0,1295.0,-0.307214,0.068109,1.092883
1,25.0,-0.489306,0.001263,-0.13269
2,19000.0,2.231317,1.0,18.178528
3,5.0,-0.492173,0.000211,-0.15199
4,1.0,-0.492747,0.0,-0.15585
5,300.0,-0.449877,0.015738,0.13269
