# Scalers

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

In addition to aiding in convergence, scalers also handle outliers in different ways.

Centering and scaling happen independently on each feature.

# Standard Scalar

StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean and standard deviation which shrink the range of the feature values. StandardScaler therefore cannot guarantee balanced feature scales in the presence of outliers.

$z = \frac{x - u}{s}$

where $u$ is the mean of the training samples and $s$ is the standard deviation of the training samples

# Min Max Scaler

MinMaxScaler rescales the data set such that all feature values are in the range [0, 1]. However, this scaling compress all inliers in a narrow range. Therefore, MinMaxScaler is very sensitive to the presence of outliers.

$\frac{x_i - min(x)}{max(x) - min(x)}$


# Max Abs Scalar

MaxAbsScaler differs from the previous scaler such that the absolute values are mapped in the range [0, 1]. On positive only data, this scaler behaves similarly to MinMaxScaler and therefore also suffers from the presence of large outliers.

$\frac{x_i}{max(abs(x))}$


# Robust Scaler (IQR)

The centering and scaling statistics of this scaler are based on percentiles and are therefore not influenced by a few number of very large marginal outliers. Consequently, the resulting range of the transformed feature values is larger than for the previous scalers and, more importantly, are approximately similar.

$\frac{x_i – median(x)}{p75(x) – p25(x)}$


# PowerTransformer

https://medium.com/@patricklcavins/using-scipys-powertransformer-3e2b792fd712

PowerTransformer applies a power transformation to each feature to make the data more Gaussian-like. By default, PowerTransformer implements the Yeo-Johnson transform. The power transform finds the optimal scaling factor to stabilize variance and mimimize skewness through maximum likelihood estimation. By default, PowerTransformer also applies zero-mean, unit variance normalization to the transformed output.

PowerTransformer, more specifically, can fix heteroskedasticity resulting from a variable's skewed distribution. A Probability Plot will sometimes reveal variance at the higher/lower end of the dependent variable. The PowerTransformer's objective is to obtain a normal distribution for that variable. This helps us not violate assumptions for Linear Regression, for example, that variables have a normal distribution and are homoskedastic.

To decide whether or not to apply this transformation, check for a normal distribution. To automate this, kurtosis and skew can be observed.

# Uniform QuantileTransformer

# Normal QuantileTransformer

# L1 Normalizer

# L2 Normalizer