Skip to content



Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation




Jan. 12, 2024: scCancer2 was accepted by Bioinformatics !

Jan. 18, 2024: scCancer2 was published online:


We updated our R toolkit, scCancer, based on massive single-cell transcriptome and spatial transcriptome data.

  1. Cell subtype annotation and cross-dataset label similarity: Our analysis mainly focused on cell subtype annotation by training multiple lightweight machine-learning models on scRNA-seq data. We proposed a method for quantitatively evaluating the similarity of cell subtype labels originating from different published datasets. We fully preserved the original labeling in cell atlases and analyzed the relationship between cell subtypes across datasets.

  2. Malignant cell identification: We constructed a reference dataset combining scRNA-seq and bulk RNA-seq data across multiple cancer types to identify the malignant cell in TME. We trained a model to identify malignant cells with high generalization ability and computational efficiency.

  3. Spatial transcriptome analysis: Finally, we integrated a spatial transcriptome analysis pipeline. It enables us to analyze TME from a spatial perspective systematically and automatically.

With scCancer2, researchers can understand the composition of the TME more accurately from multiple dimensions.

For old version of scCancer, see

Overview of scCancer2


Package installation and quick start for scRNA-seq analysis

System requirements

We have test scCancer2 on:

R version 4.0.0 (2020-04-24), Platform: x86_64-w64-mingw32/x64 (64-bit), Running under: Windows 10 x64 (build 22621)

R version 4.0.5 (2021-03-31), Platform: x86_64-w64-mingw32/x64 (64-bit), Running under: Windows 10 x64 (build 22621).

R version 4.1.1 (2021-08-10), Platform: x86_64-conda-linux-gnu (64-bit), Running under: CentOS Linux 7 (Core).

In windows, R tools need to be previously installed:

To avoid the version conflicts of R packages, we recommend that you install a brand new R environment and switch it in anaconda or RStudio. Specifically, if you have successfully installed an old version of scCancer, there is no need to create a new environment. The upgrade can be completed in the original environment by following the steps below.

Important: scCancer2 is not compatible with Seurat5 due to the significant adjustment to the data structure in the latest version of Seurat. Users need to install Seurat4 or Seurat3 manually before install scCancer2 (See Notice3 for details).

Quick start of scCancer2

Dependency installation

You can install the dependencies from the following steps. After installing them successfully, you can run the demos.

checkPkg <- function(pkg){
    return(requireNamespace(pkg, quietly = TRUE))

# Some frequently used packages
if(!checkPkg("BiocManager")) install.packages("BiocManager")
if(!checkPkg("devtools")) install.packages("devtools")

if(!checkPkg("NNLM")) devtools::install_github("linxihui/NNLM")
if(!checkPkg("monocle")) BiocManager::install(c("monocle"))
if(!checkPkg("edgeR")) BiocManager::install(c("edgeR"))
BiocManager::install(c('DelayedArray', 'DelayedMatrixStats', '', ''))
if(!checkPkg("garnett")) devtools::install_github("cole-trapnell-lab/garnett")
# Install Seurat with specific version
# See Notice3 or try the tutorial from
if(!checkPkg("remotes")) install.packages("remotes")
if(!checkPkg("Seurat")) remotes::install_version(package = "Seurat", version = package_version('4.3.0'))

Run scCancer2

If you have already installed the above dependencies, you have 2 ways to run scCancer2:

(a) Recommended (if you want to completely update scCancer to the next version):

# install scCancer2
## Make sure you have already installed Seurat4 or Seurat3.
## Remember to skip all updates in the following step.

See scCancer2.rmd for demos.

(b) Download .zip file of R package. Open scCancer.rproj and run temporary installation in scCancer2.rmd.

# Check the dependencies
# Load all files in the folder

Notice1: If errors occur when installing dependencies ("usethis", "hdf5r", "pbkrtest", "locfit", ...), you may install them from CRAN or source file. Pay attention to the relationship between the release time of the R package and the release time of R-base.

Notice2: Directly installing harmony from CRAN might meet this bug when running scCombination: github/harmony/issues/. You may download and install the source package from to run scCombination with harmony method smoothly.

Notice3: We recommend you installing specific version of Seurat and Matrix package.

We have tried: Seurat 4.1.1 and Matrix 1.4.1; Seurat 4.3.0 and Matrix 1.6.1

Download in and install from local:

# An example
install.packages("spatstat.core_1.65-0.tar.gz", repos = NULL, type = "source")
install.packages("Seurat_4.1.1.tar.gz", repos = NULL, type = "source")
install.packages("Matrix_1.4-1.tar.gz", repos = NULL, type = "source")

Data sets

We have uploaded 5 recommended data sets, including 3 unpublished data (single-sample) and 2 large-scale published datasets (multi-sample). The first 3 data sets are recommended for reproducing the whole pipeline because there are fewer samples for faster operation. The last 2 data sets are recommended for cell subtype annotation task because there are more cell types and richer cell numbers.




CRC-example-immune (Source: GSE146771)

PAC-example-tumor (Source: CRA001160)

R module for newly implemented scRNA-seq analysis modules

If you have a processed dataset (matrix or Seurat object), you can use cell subtype annotation and malignant cell identification module alone.

scStatistics and scAnnotation are not needed. See cellSubtypeAnno.Rmd and malignantCellIden.Rmd in vignettes folder for tutorials.

Python module for malignant cell identification

If you are only interested in identifying malignant cells in your own samples or want to reproduce Figure4 in our manuscript, we highly recommend using the pipeline malig-pred.ipynb

The constructed reference data set will be uploaded later.

The 5 recommended data sets above can be served as query data sets.

The trained model: sc_xgboost.model. It has been integrated into the R package.

The basic processing steps are relied on package scanpy, sklearn and xgboost.

Due to the differences between Seurat and scanpy and the parameters setting at the preprocessing steps, the results of malignant cell identification are slightly different in R and Python.

Report generation for scRNA-seq analysis

  1. The results of cell subtype annotation are stored in folder: cellSubtypeAnno/

  2. The results of malignant cell identification by machine learning method are directly insert into original report (report-scAnno.html).

Package installation and quick start for spatial transcriptome analysis

See for details.

Most of the dependencies are the same, only copyKAT need to be installed.

checkPkg <- function(pkg){
    return(requireNamespace(pkg, quietly = TRUE))
if(!checkPkg("copykat")) devtools::install_github("navinlabcode/copykat")

See stCancer.rmd for demos.


[1] Zeyu Chen, Yuxin Miao, Zhiyuan Tan, Qifan Hu, Yanhong Wu, Xinqi Li, Wenbo Guo, Jin Gu, scCancer2: data-driven in-depth annotations of the tumor microenvironment at single-level resolution, Bioinformatics, Volume 40, Issue 2, February 2024, btae028,

[2] Wenbo Guo, Dongfang Wang, Shicheng Wang, Yiran Shan, Changyi Liu, Jin Gu, scCancer: a package for automated processing of single-cell RNA-seq data in cancer, Briefings in Bioinformatics, Volume 22, Issue 3, May 2021, bbaa127,


scCancer2: data-driven in-depth annotations of the tumor microenvironment at single-level resolution







No packages published


  • Jupyter Notebook 96.0%
  • R 4.0%