# Standardization, or mean removal and variance scaling

Following the [SciKit - Learn Guide](https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling)

Standardization is essential for many machine learning models in scikit-learn, as models can perform poorly if features do not resemble a standard normal distribution (zero mean, unit variance). Typically, this involves centering the data by removing the mean and scaling it by the standard deviation of each feature. If a feature has a much larger variance than others, it can dominate the learning process, hindering the model's ability to learn effectively from other features. The StandardScaler [-3 to +3] utility class in the preprocessing module helps achieve this standardization. The other method being Scaling to a known range.

The Examples of Scaling to a known range are as follows :-
1. Min Max Scaler (0 - 1)
2. Max Absolute Scaler (-1 - 1)

## Standard Scaler

### Overview

`StandardScaler` is a preprocessing tool in scikit-learn that standardizes features by removing the mean and scaling them to unit variance. This process transforms the data into a standard normal distribution, which is crucial for many machine learning algorithms to perform optimally.

### How It Works

The standard score of a sample $x$ is calculated using the formula:

$$ z = \frac{x - \mu}{\sigma} $$

where:
- $\mu$ is the mean of the training samples.
- $\sigma$ is the standard deviation of the training samples.

### Key Features

- **Centering and Scaling**: Centers the data by subtracting the mean and scales it by the standard deviation, ensuring that each feature contributes equally to the model.
- **Independence**: The centering and scaling are performed independently for each feature, which is essential in high-dimensional datasets.
- **Handling Sparse Data**: Can be applied to sparse matrices by setting `with_mean=False` to maintain the sparsity of the dataset.

### Parameters

- `copy`: If `True`, it creates a copy of the data. If `False`, it tries to perform the operation in-place.
- `with_mean`: If `True`, it centers the data before scaling. This option is not available for sparse matrices.
- `with_std`: If `True`, it scales the data to unit variance.

### Usage Example

Here’s a simple example of how to use `StandardScaler`:

In [1]:
from sklearn.preprocessing import StandardScaler

# Sample data
data = [[0, 0], [0, 0], [1, 1], [1, 1]]

# Create a StandardScaler instance
scaler = StandardScaler()

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]


## Quantile Transformation

Quantile transformation is a method used in data preprocessing to transform the features to follow a uniform or normal distribution. This non-linear transformation technique is particularly useful when dealing with data that has outliers or when the distribution of the data is not Gaussian.

### Key Points:

1. **Purpose**: 
   - Reduces the impact of outliers
   - Makes the data distribution more Gaussian-like or uniform
   - Can improve the performance of many machine learning algorithms

2. **Process**:
   - Ranks the data
   - Transforms the features to follow a specified distribution (usually uniform or normal)

3. **Types**:
   - Uniform quantile transformation
   - Gaussian quantile transformation

4. **Mathematical Representation**:
   For a feature $X$ with cumulative distribution function $F_X$, the quantile transformation $Q$ is:
   
   $Q(X) = F^{-1}(F_X(X))$

   where $F^{-1}$ is the inverse cumulative distribution function of the desired output distribution.

5. **Advantages**:
   - Robust to outliers
   - Preserves order and relative distances between data points
   - Can handle non-linear relationships in the data

6. **Considerations**:
   - May change the interpretability of the features
   - Can be computationally expensive for large datasets
   - May not be suitable if the original scale of the data is important

7. **Implementation in Python**:\
   Scikit-learn provides `QuantileTransformer` for this purpose:

In [None]:
# from sklearn.preprocessing import QuantileTransformer   
# qt = QuantileTransformer(output_distribution='normal', random_state=0)
# X_transformed = qt.fit_transform(X)

Quantile transformation can be a powerful tool in your data preprocessing toolkit, especially when dealing with skewed or non-Gaussian data distributions.

## Mapping to Gausissian Distribution

Mapping data to a Gaussian (or normal) distribution is a common preprocessing technique in machine learning and statistics. This process, also known as Gaussian transformation or normalization, can be beneficial for many algorithms that assume normally distributed input features.

### Why Map to Gaussian?

- Many statistical methods and machine learning algorithms assume Gaussian distributed input data.
- It provides a way to standardize features to a common scale.
- Can help in reducing the impact of outliers.
- Often leads to better performance in various machine learning models.

### Methods for Gaussian Mapping

#### 1. Box-Cox Transformation

The Box-Cox transformation is defined as:

$$
y(\lambda) = 
\begin{cases} 
\frac{x^\lambda - 1}{\lambda}, & \text{if } \lambda \neq 0 \\
\ln(x), & \text{if } \lambda = 0 
\end{cases}
$$

Where \( x \) is the original data and \( \lambda \) is the transformation parameter.

In [None]:
# from scipy.stats import boxcox
# transformed_data, lambda_param = boxcox(data)

### 2. Yeo-Johnson Transformation

The Yeo-Johnson transformation is similar to Box-Cox but can handle negative values. It is defined as follows:

$$
y(\lambda) =
\begin{cases}
\frac{(x+1)^\lambda - 1}{\lambda}, & \text{if } \lambda \neq 0, \, x \geq 0 \\
\ln(x+1), & \text{if } \lambda = 0, \, x \geq 0 \\
-\frac{(-x+1)^{2-\lambda} - 1}{2-\lambda}, & \text{if } \lambda \neq 2, \, x < 0 \\
-\ln(-x+1), & \text{if } \lambda = 2, \, x < 0 
\end{cases}
$$

In [None]:
# from sklearn.preprocessing import PowerTransformer
# pt = PowerTransformer(method='yeo-johnson')
# transformed_data = pt.fit_transform(data)

# Normalization

**`What is Normalization?`**

Normalization is the process of scaling individual samples to have unit norm. This process can be useful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify the similarity of any pair of samples.
The goal of normalization is to ensure that all samples are on the same scale, which can improve the
performance of many machine learning algorithms. For example, the Euclidean distance between two points is sensitive to
the scale of the data. If the data is not normalized, the distance between two points can be
dominated by the scale of the data rather than the actual difference between the points.

**`How to Normalize Data?`**

There are several ways to normalize data, including:
1.  **Standardization**: This involves subtracting the mean and dividing by the standard deviation for
each feature. This is also known as Z-scoring.
2.  **L1 Normalization**: This involves dividing each feature by the sum of its absolute
values. This is also known as L1 normalization or sum-normalization.
3.  **L2 Normalization**: This involves dividing each feature by its Euclidean norm (
4.  **Min-Max Scaling**: This involves scaling each feature to a common range, usually
between 0 and 1.

**`Why is Normalization Important?`**

**Normalization is important because it can improve the performance of many machine learning algorithms. Here are some reasons
why normalization is important:**
1.  **Improved Performance**: Normalization can improve the performance of many machine learning algorithms, such
as k-means clustering, k-nearest neighbors, and support vector machines.
2.  **Reduced Overfitting**: Normalization can reduce overfitting by preventing features
with large ranges from dominating the model.
3.  **Improved Interpretability**: Normalization can improve the interpretability of the results by ensuring
that all features are on the same scale.
