### What is Feature Scaling?

__Feature scaling__ is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

### Why is Feature Scaling needed?
The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges. Scaling ensures that just because some features are __big__ (i.e. greater numeric ranges) it won't lead to using them as a __main predictor__. 

_We need Feature Scaling for **all techniques that use distances in any way**. We **must** perform feature scaling in **any technique that uses SGD (Stochastic Gradient Descent)**_

### Which techniques need Feature Scaling?
1. SVM (Support Vector Machines)
2. kNN (k-Nearest Neighbors)
3. PCA (Principal Component Analysis)
4. Neural Networks **(must)**
5. Logistic Regression **(must)**

### What are different ways of Feature Scaling?
1. Simple Feature Rescaling aka Rescaling aka min-max normalization aka  min-max scaling
2. Mean Normalization
3. Standardization
4. Normalization
5. Robust Scaling

# IMPORTANT
### Scale train/test data separately, otherwise this will result in leaking data!

## Simple Feature Rescaling aka Rescaling aka min-max normalization aka  min-max scaling

It is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data. This method is **heavily influenced by outliers**. Formula is as follows - 
$$x' = \frac{x - min(x)}{max(x) - min(x)}$$

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Mean Normalization
In this method instead of subtracting $min(x)$, $average(x)$ is subtracted. The formula is $$x' = \frac{x - avg(x)}{max(x) - min(x)}$$

# Standardization

In machine learning, we can handle various types of data, e.g. audio signals and pixel values for image data, and this data can include multiple dimensions. **Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance.** This method is widely used for normalization in many machine learning algorithms (e.g., **support vector machines, logistic regression, and artificial neural networks**). The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.

This method is used when data has many outliers, when we need data to have zero mean.

The formula is as follows - 
$$x' = \frac{x - \bar{x}}{\sigma}$$

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Normalization
In this method, every feature vector is scaled so that it has norm = 1. Usually we use L2 (euclidean) norm but we can also use others. This method is usually used when we are going to apply methods such as dot products on the feature vectors. Because this transformation does not depend on other points in your dataset, calling $.fit()$ has no effect.

In [None]:
from sklearn.preprocessing import Normalizer

normalizer = Normalizer()

# this does nothing because this method doesn't 'train' on your data
normalizer.fit(X_train)

X_train = normalizer.transform(X_train)
X_test = normalizer.transform(X_test)

# Robust Scaling
The RobustScaler uses a similar method to the Min-Max scaler but it instead uses the interquartile range, rathar than the min-max, so that **it is robust to outliers**.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

The formula is as follows - $$x' = \frac{x - Q_1(x)}{Q_3(x) - Q_1(x)}$$. This method uses less of the data for scaling so it’s more suitable for when there are outliers in the data.

In [None]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
scaler.fit(X_train)


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)