Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Travis CI Build Status AppVeyor Build Status Codecov test coverage Project Status: Active – The project has reached a stable, usable state and is being actively developed. BioC status Bioc Time status MIT license

Sparse Contrastive Principal Component Analysis for Computational Biology

Authors: Philippe Boileau, Nima Hejazi, Sandrine Dudoit

What’s scPCA?

The exploration and analysis of modern high-dimensional biological data regularly involves the use of dimension reduction techniques in order to tease out meaningful and interpretable information from complex experimental data, often subject to batch effects and other noise. In tandem with the development of sequencing technology (e.g., RNA-seq, scRNA-seq), many variants of PCA have been developed in attempts to remedy deficiencies in interpretability and stability that plague vanilla PCA.

Such developments have included both various forms of sparse PCA (SPCA) (Zou, Hastie, and Tibshirani 2006; Erichson et al. 2018), which increase the stability and interpretability of principal component loadings in high dimensions, and, more recently, contrastive PCA (cPCA) (Abid et al. 2018), which captures relevant information in the target (experimental) data set by eliminating technical noise through comparison to a so-called background data set. While SPCA and cPCA have both individually proven useful in resolving distinct shortcomings of PCA, neither is capable of simultaneously tackling the issues of interpretability, stability and relevance simultaneously. The scPCA package implements sparse contrastive PCA (Boileau, Hejazi, and Dudoit 2020) to accomplish these tasks in the context of high-dimensional biological data. In addition to implementing this newly developed technique, the scPCA package implements cPCA and generalizations thereof.


For standard use, install from Bioconductor using BiocManager:

if (!requireNamespace("BiocManager", quietly=TRUE)) {

To contribute, install the bleeding-edge development version from GitHub via remotes:


Current and prior Bioconductor releases are available under branches with numbers prefixed by “RELEASE_.” For example, to install the version of this package available via Bioconductor 3.10, use



For details on how to best use the scPCA R package, please consult the most recent package vignette available through the Bioconductor project.


If you encounter any bugs or have any specific feature requests, please file an issue.


Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


Please cite the first paper below after using the scPCA R software package. Please also make sure to cite the article describing the statistical methodology when using scPCA or cross-validated cPCA as part of an analysis.

  doi = {10.21105/joss.02079},
  url = {},
  year = {2020},
  publisher = {The Open Journal},
  volume = {5},
  number = {46},
  pages = {2079},
  author = {Philippe Boileau and Nima Hejazi and Sandrine Dudoit},
  title = {scPCA: A toolbox for sparse contrastive principal component analysis in R},
  journal = {Journal of Open Source Software}

    author = {Boileau, Philippe and Hejazi, Nima S and Dudoit, Sandrine},
    title = "{Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis}",
    journal = {Bioinformatics},
    year = {2020},
    month = {03},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btaa176},
    url = {},
    note = {btaa176},
    eprint = {},


© 2019-2022 Philippe Boileau

The contents of this repository are distributed under the MIT license. See file LICENSE for details.


Abid, Abubakar, Martin J Zhang, Vivek K Bagaria, and James Zou. 2018. “Exploring Patterns Enriched in a Dataset with Contrastive Principal Component Analysis.” Nature Communications 9 (1): 2134.

Boileau, Philippe, Nima S Hejazi, and Sandrine Dudoit. 2020. “Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis.” Bioinformatics, March.

Erichson, N. Benjamin, Peng Zeng, Krithika Manohar, Steven L. Brunton, J. Nathan Kutz, and Aleksandr Y. Aravkin. 2018. “Sparse Principal Component Analysis via Variable Projection.” ArXiv abs/1804.00341.

Zou, Hui, Trevor Hastie, and Robert Tibshirani. 2006. “Sparse Principal Component Analysis.” Journal of Computational and Graphical Statistics 15 (2): 265–86.