## Robust Principal Component Analysis (RPCA)

> Robust PCA is a technique for decomposing a data matrix into two terms: a low-rank matrix capturing the characteristics of interest and a sparse matrix containing noise that has previously corrupted the data.

Standard PCA can be thought of as a maximization problem (maximizing the preserved variance of the data) or a minimization problem (minimizing the squared errors between data and their projection on the principal axes). Using this approach, the solution will be strongly skewed towards the outliers as they use an $L2$-norm, making the solution not robust. That is to say, PCA is a fragile approach that will fail when your data is either corrupted or incomplete. An answer to this dilemma was developed by Candes, Li, Ma, and Wright in their 2009 paper _"Robust Principal Component Analysis?"_ (see __[1]__ in __References__).

Like Sparse PCA (see my [notebook on Sparse PCA](../sparsePCA/sparsePCA.ipynb)), there are multiple approaches to solving this problem. This notebook will only set up the problem and refer the reader to specifics on theory and derivation.

Given a data matrix $X \in R^{n \times m}$, we wish to decompose it as the sum of a low-rank matrix $L$ and a sparse matrix $S$: $$X = L+S$$

Here, $X$ contains our corrupted data, $L$ is of low rank and contains our corrected data, and $S$ will hold all the outlier measurements. Given this decomposition, $L$ will be a new dataset we can work with that will be well approximated by standard PCA or other machine learning algorithms. Mathematically, this (idealized) problem can be written as $$\underset{L, S}{min}\;\;\{rank(L)+||S||_0\}\;\;s.t.\;\;X = L+S$$

This problem is intractable as neither component is convex and there is no guarantee to a solution. To solve this, a convex relaxation is introduced using proxies for the actual operators of interest. The rank is replaced by the _nuclear norm_ (the sum of singular values - more zero singular values means lower rank) and the $0$-norm is replaced by the $1$-norm (sum of absolute values).

$$\underset{L, S}{min}\;\;\{||L||_{*}+\lambda||S||_1\}\;\;s.t.\;\;X = L+S$$


The above formulation's solution converges to the idealized formulation with high probability providing $\lambda =  \frac{1}{\sqrt{max(n,m)}}$ (see __[2]__).

From this point, the method of Augmented Lagrange Multipliers can be applied (see __[3]__) to find the desired decomposition.



__Applications__

<u>__References__<u>

__[1]__ Emmanuel J. Candes; Xiaodong Li; Yi Ma; John Wright (2009). "Robust Principal Component Analysis?". Journal of the ACM. 58 (3): 1–37

__[2]__ Steven L. Brunton; J. Nathan Kutz (2019). "Data Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control". Section 3.7 

__[3]__ Hestenes, M. R. (1969). "Multiplier and gradient methods". Journal of Optimization Theory and Applications. 4 (5): 303–320.