# Quantile Normalization
Quantile normalization is a technique for making two distributions identical in statistical properties. It is often used in the analysis of high-throughput biology data when the distributions of measured quantities are expected to be the same. This notebook will thoroughly explain the concept and process of Quantile Normalization.

## What is Quantile Normalization?
Quantile normalization is a technique for making two distributions identical in statistical properties. The technique involves ordering the data, then replacing the original data with the mean of each order statistic in the pooled data. This makes the two distributions identical.

## Why use Quantile Normalization?
Quantile normalization is often used in the analysis of high-throughput biology data. For example, in a gene expression study where the distributions of measured quantities are expected to be the same across experiments, it can be used to correct for variations in experimental conditions.

A common example of this is in microarray data where each color represents a different gene. These colors have been converted into intensity values. Each gene has its own color and the value on the y-axis represents the intensity that that gene had on a microarray.

In this case, the mean values for each sample might be different, suggesting the need to compensate for different overall intensities of light. Quantile normalization corrects for these technical artifacts.

## How does Quantile Normalization work?
Here is a step-by-step process of how quantile normalization works:
1. Start by focusing on the most highly expressed gene in each sample.
2. Calculate the mean value of these genes.
3. Replace the original intensity values of these genes with the calculated mean value. This will be the quantile normalized value for the genes with the highest expression.
4. Repeat the process for the next most highly expressed gene in each sample, and so on, until all genes have been processed.

At the end of this process, the values for each sample are the same, but the original order of the genes is preserved. Also, the normalized data sets have identical quartiles, hence the name 'quantile normalization'.

## Example of Quantile Normalization
Below is a Python code example of quantile normalization using the `numpy` and `scipy` packages:

In [None]:
import numpy as np
from scipy import stats

def quantile_normalize(data):
    # Calculate the rank of each value
    ranks = stats.rankdata(data, method='average')
    
    # Sort the original data
    sorted_data = np.sort(data)
    
    # Create a dictionary where key is rank and value is mean of that rank
    rank_mean = {rank: np.mean(sorted_data[ranks==rank]) for rank in np.unique(ranks)}
    
    # Replace each rank in the original data with the rank mean
    normalized_data = np.array([rank_mean[rank] for rank in ranks])

    return normalized_data

## References
- Bolstad et al., (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185-193. [DOI: 10.1093/bioinformatics/19.2.185](https://doi.org/10.1093/bioinformatics/19.2.185)
- StatQuest by Josh Starmer. (2018). Quantile Normalization, Clearly Explained!!! [Video](https://www.youtube.com/watch?v=ecjN6Xpv6SE).