# Whitening Data

Whitening data referes to decorrelating it and giving it a unit variance (usually a variance = 1). This can be important for many data analysis tasks and preprocessing (such as for independent component analysis).

## How to whiten data

Before whitening the data it is important to center it. This can be done by simply subtracting the mean from the data.

$X = X - \bar{X}$ with $\bar{X}$ being the mean

With the data centered the next step is to obtain the eigenvalues and eigenvectors. This can be done with python either getting the eigenvalues and eigenvectors directly through say numpy's eig() function, or using singular value decomposition, which can require less computational power.

We'll refer to the whitened data as $\tilde{X}$, where $\tilde{X} = ED^{-\frac{1}{2}}E^TX$
- $E[XX^T]=EDE^T$
- $E$ is the eigenvectors
- $D$ is the diagonal matrix of egienvalues

$EDE^T = 
\begin{bmatrix}
| && | && \dots && | \\ 
e_1 && e_2 && \dots && e_n \\
| && | && \dots && | \\
\end{bmatrix}
\begin{bmatrix}
\lambda_1 && 0 && \dots && 0 \\ 
0 && \lambda_2 && \dots && 0 \\
\vdots && \dots && \ddots && \vdots \\
0 && 0 && \dots && \lambda_n \\
\end{bmatrix}
\begin{bmatrix}
- && e_1^T && - \\ 
- && e_2^T && - \\ 
\vdots && \vdots && \vdots \\ 
- && e_n^T && - \\ 
\end{bmatrix}$

How do we know if this worked? We can check that $E[\tilde{x}\tilde{x}^T]=I$

## Proof $E[\tilde{x}\tilde{x}^T]=I$

$$E[\tilde{x}\tilde{x}^T] = E[(ED^{-\frac{1}{2}}E^TX)(ED^{-\frac{1}{2}}E^TX)^T]$$
$$ = E[ED^{-\frac{1}{2}}E^TXX^TED^{-\frac{1}{2}}E^T]$$
$$ = ED^{-\frac{1}{2}}E^TE[XX^T]ED^{-\frac{1}{2}}E^T$$

Note that $E^TE = I$ and $D^{-\frac{1}{2}}DD^{-\frac{1}{2}}=I$ so we can simplify this to

$$ = EE^T = I$$

## What is a benefit of whitening?

With regards to ICA, the number of variables in the mixing matrix goes from $n^2$ to $\frac{n(n-1)}{2}$