Implementation of robust principal component analysis and stable principal component pursuit based on the following references:
- Candes, Emmanuel J. et al. "Robust Principal Component Analysis?" Journal of the ACM, Vol. 58, No. 3, Article 11, 2011.
- Zhou, Zihan, et al. "Stable principal component pursuit." Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on. IEEE, 2010.
The classical Principal Component Analysis (PCA) is widely used for high-dimensional analysis and dimensionality reduction. Mathematically, if all the data points are stacked as column vectors of a (n, m)matrix
where
given that rank(
To resolve this issue, Candes, Emmanuel J. et al proposed Robust Principal Component Analysis (Robust PCA or RPCA). The objective is still trying to decompose
$$ \min_{L,S} ||L||{*} + \lambda||S||{1}$$
subject to
Minimizing the
Also, Zhou et al. further proposed a "stable" version of Robust PCA, which is called Stable Principal Component Pursuit (Stable PCP or SPCP), which allows a non-sparse Gaussian noise term
Stable PCP is intuitively more practical since it combines the strength of classical PCA and Robust PCA. However, depending on the exact problem, the proper method should be selected.
The drawback of Robust PCA and Stable PCP is their scalability. They are generally slow since the implementation do SVD (singular value decomposition) in the converging iterations. Recently, a new algorithm was proposed: "Grassmann Averages" for Scalable Robust PCA.
To install the package:
pip install git+https://github.com/ShunChi100/RobustPCA
To use
from RobustPCA.rpca import RobustPCA
from RobustPCA.spcp import StablePCP
rpca = RobustPCA()
spcp = StablePCP()
rpca.fit(M)
L = rpca.get_low_rank()
S = rpca.get_sparse()
spcp.fit(M)
L = spcp.get_low_rank()
S = spcp.get_sparse()
Here L
and S
are desired low rank matrix and sparse matrix.
For more options of these functions, please see the documentation and source codes.
Feel free to fork and develop this project. It is under MIT license.