Skip to content

WangX-Lab/PreCanCell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreCanCell

PreCanCell is a simple and effective ensemble learning algorithm for predicting cancer and non-cancer cells from single-cell transcriptomes. PreCanCell first identified the differentially expressed genes (DEGs) between malignant and non-malignant cells commonly in five common cancer-associated single-cell transcriptome datasets. With each of the five datasets as the training set and the DEGs as the features, a single cell is classified as malignant or non-malignant by k-NN (k = 5). Finally, the single cell is classified by the majority vote of the five k-NN classification results.

image

 

Details

  • The function PreCanCell_data() is used to data preprocessing. Its input should be normalized expression matrix with rownames being genes and colnames being cells. The input data can be any library-depth normalization (e.g. TPM, CPM).
  • The function PreCanCell_classifier() is used to identify malignant and non-malignant cells from single-cell transcriptomes, containing 2 parameters: testdata and cores.
    • "testdata" is a output matrix of the function PreCanCell_data().
    • "cores" is the number of threads.

 

Installation

  • The Seurat package (version >= 4.3.0) is used for data preprocessing.

  • To install PreCanCell , first install the fastknn package, which can be installed as follows:  

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("davpinto/fastknn")
  • Finally, users can install the released version of PreCanCell with:  
devtools::install_github("WangX-Lab/PreCanCell")

 

Examples

Prepare data:  

library(PreCanCell)
path <- system.file("extdata", "example.txt", package = "PreCanCell", mustWork = TRUE)
input <- read.table(path, stringsAsFactors = FALSE, header = TRUE, check.names = FALSE, sep = "\t", row.names = 1)
input[1:5,1:5]
0 1 2 3 4
SERINC2 0.00 0.00 0.00 0.00 0.00
PTPRF 0.00 0.00 0.00 0.00 0.00
S100A1 0.00 0.00 0.00 0.00 0.00
EFNA1 0.00 0.00 0.00 6.33 0.00
SOX4 2.59 0.00 0.00 5.23 0.00

Data preprocessing (select matched genes and [0,1]-scaled gene expression values):  

testdata <- PreCanCell_data(input)
testdata[1:5,1:5]
SERINC2 PTPRF S100A16 EFNA1 SOX4
0 0.00 0.00 0.00 0.00 0.46
1 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00
3 0.00 0.00 0.00 1.00 0.94
4 0.00 0.00 0.00 0.00 0.00

Prediction of malignant and non-malignant cells: cores represents the number of cores to use for parallel execution.  

results <- PreCanCell_classifier(testdata, cores = 2)
head(results)
Sample freq_cancer freq_non_cancer pred_label
0 1 0 cancer
1 0 1 non_cancer
2 0.8 0.2 cancer
3 1 0 cancer
4 0 1 non_cancer
5 0 1 non_cancer

Vignettes

Predicting malignant and non-malignant cells from single-cell transcriptomes

 

Contact

E-mail any questions to Xiaosheng Wang (xiaosheng.wang@cpu.edu.cn)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages