Skip to content

GfellerLab/SuperCell

Repository files navigation

R-CMD-check DOI License

Coarse-graining of large single-cell RNA-seq data into metacells

SuperCell is an R package for coarse-graining large single-cell RNA-seq data into metacells and performing downstream analysis at the metacell level.

The exponential scaling of scRNA-seq data represents an important hurdle for downstream analyses. One of the solutions to facilitate the analysis of large-scale and noisy scRNA-seq data is to merge transcriptionally highly similar cells into metacells. This concept was first introduced by Baran et al., 2019 (MetaCell) and by Iacono et al., 2018 (bigSCale). More recent methods to build metacells have been described in Ben-Kiki et al. 2022 (MetaCell2), Bilous et al., 2022 (SuperCell) and Persad et al., 2022 (SEACells). Despite some differences in the implementation, all the methods are network-based and can be summarized as follows:

1. A single-cell network is computed based on cell-to-cell similarity (in transcriptomic space)

2. Highly similar cells are identified as those forming dense regions in the single-cell network and merged together into metacells (coarse-graining)

3. Transcriptomic information within each metacell is combined (average or sum).

4. Metacell data are used for the downstream analyses instead of large-scale single-cell data

Unlike clustering, the aim of metacells is not to identify large groups of cells that comprehensively capture biological concepts, like cell types, but to merge cells that share highly similar profiles, and may carry repetitive information. Therefore metacells represent a compromise structure that optimally remove redundant information in scRNA-seq data while preserving the biologically relevant heterogeneity.

An important concept when building metacells is the graining level (γ), which we define as the ratio between the number of single cells in the initial data and the number of metacells. We suggest applying γ between 10 and 50, which significantly reduces the computational resources needed to perform the downstream analyses while preserving most of the result of the initial (i.e., single-cell) analyses.

Installation

SuperCell requires igraph, RANN, WeightedCluster, corpcor, weights, Hmisc, Matrix, matrixStats, plyr, irlba, grDevices, patchwork, ggplot2. SuperCell uses velocyto.R for RNA velocity.

install.packages("igraph")
install.packages("RANN")
install.packages("WeightedCluster")
install.packages("corpcor")
install.packages("weights")
install.packages("Hmisc")
install.packages("Matrix")
install.packages("patchwork")
install.packages("plyr")
install.packages("irlba")

Installing SuperCell package from gitHub

if (!requireNamespace("remotes")) install.packages("remotes")
remotes::install_github("GfellerLab/SuperCell")

library(SuperCell)

Examples

  1. Building and analyzing metacells with SuperCell
  2. RNA velocity applied to SuperCell object
  3. Building metacells with SuperCell and alayzing them with a standard Seurat pipeline
  4. Data integration of metacells built with SuperCell

SuperCell is developed by the group of David Gfeller at University of Lausanne.

SuperCell can be used freely by academic groups for non-commercial purposes (see license). The product is provided free of charge, and, therefore, on an “as is” basis, without warranty of any kind.

FOR-PROFIT USERS

If you plan to use SuperCell or any data provided with the script in any for-profit application, you are required to obtain a separate license. To do so, please contact eauffarth@licr.org at the Ludwig Institute for Cancer Research Ltd.

If required, FOR-PROFIT USERS are also expected to have proper licenses for the tools used in SuperCell, including the R packages igraph, RANN, WeightedCluster, corpora, weights, Hmisc, Matrix, ply, irlba, grDevices, patchwork, ggplot2 and velocyto.R

For scientific questions, please contact Mariia Bilous (mariia.bilous@unil.ch) or David Gfeller (David.Gfeller@unil.ch).

How to cite

If you use SuperCell in a publication, please cite: Bilous et al. Metacells untangle large and complex single-cell transcriptome networks, BMC Bioinformatics (2022).