scCertify is an R package for evaluating the reliability of single-cell RNA-seq annotations using explainable confidence scoring.
The framework integrates:
- UCell-based marker enrichment
- Neighborhood agreement scoring
- Entropy-based uncertainty estimation
- Doublet-aware confidence modeling
- Ontology-aware label matching
- Confidence calibration
- Explainable confidence attribution
scCertify aims to provide a biologically interpretable framework for identifying reliable and uncertain cell annotations in single-cell datasets.
Most annotation tools assign labels without estimating how trustworthy those labels are.
In real single-cell datasets, uncertainty can arise from:
- Transitional cellular states
- Technical doublets
- Sparse transcriptomic profiles
- Weak marker enrichment
- Reference atlas mismatch
- Ambiguous neighborhood structure
- Batch effects
scCertify quantifies annotation reliability and explains why cells are considered uncertain.
- Confidence scoring for single-cell annotations
- UCell-based marker enrichment scoring
- kNN neighborhood agreement analysis
- Entropy-based uncertainty estimation
- Doublet-aware confidence scoring
- Ontology-aware label matching
- Confidence calibration
- Confidence classification
- Explainable confidence attribution
- Seurat integration
- Publication-ready visualization support
Single-cell RNA-seq Data
↓
SingleR Annotation
↓
Marker Enrichment
(UCell)
↓
Neighborhood Agreement
↓
Entropy Uncertainty
↓
Doublet Detection
(scDblFinder)
↓
Confidence Calibration
↓
Explainable Confidence
↓
Final Confidence Score
+ Confidence Class
install.packages(c(
"Seurat",
"ggplot2",
"FNN",
"entropy",
"devtools"
))
install.packages("BiocManager")
BiocManager::install(c(
"SingleR",
"celldex",
"SingleCellExperiment",
"UCell",
"scDblFinder"
))devtools::install_github(
"Jaya-Surya-dev/scCertify"
)library(scCertify)
library(Seurat)
library(SingleR)
library(celldex)
library(UCell)
library(scDblFinder)data("pbmc_small")pbmc_small <- NormalizeData(pbmc_small)
pbmc_small <- FindVariableFeatures(pbmc_small)
pbmc_small <- ScaleData(pbmc_small)
pbmc_small <- RunPCA(pbmc_small)
pbmc_small <- RunUMAP(
pbmc_small,
dims = 1:10
)sce <- as.SingleCellExperiment(
pbmc_small
)
ref <- HumanPrimaryCellAtlasData()
pred <- SingleR(
test = sce,
ref = ref,
labels = ref$label.main
)
pbmc_small$predicted_label <- pred$labelssce <- scDblFinder(sce)
pbmc_small$doublet_score <-
colData(sce)$scDblFinder.score
pbmc_small$doublet_class <-
colData(sce)$scDblFinder.classmarkers <- list(
"B_cell" = c(
"MS4A1",
"CD79A"
),
"T_cells" = c(
"CD3D",
"IL7R"
),
"Monocyte" = c(
"LYZ",
"S100A8"
),
"NK_cell" = c(
"NKG7",
"GNLY"
),
"DC" = c(
"FCER1A",
"CST3"
),
"Platelets" = c(
"PPBP",
"PF4"
)
)pbmc_small$entropy_score <-
entropy_score(
pred$scores
)
pbmc_small$entropy_norm <- (
pbmc_small$entropy_score -
min(pbmc_small$entropy_score)
) / (
max(pbmc_small$entropy_score) -
min(pbmc_small$entropy_score)
)pbmc_small <- cell_certify(
pbmc_small,
markers
)FeaturePlot(
pbmc_small,
features = "confidence_score"
)DimPlot(
pbmc_small,
group.by = "confidence_class"
)FeaturePlot(
pbmc_small,
features = "entropy_norm"
)cell_id <- colnames(
pbmc_small
)[1]
explain_confidence(
pbmc_small,
cell_id
)Example output:
[1] "Weak marker enrichment"
[2] "High annotation uncertainty"
[3] "Possible doublet contamination"
The current confidence model integrates:
- Marker enrichment
- Neighborhood agreement
- Entropy certainty
- Doublet probability
Current scoring framework:
Confidence = 0.35(Marker Score) + 0.35(Neighborhood Agreement) + 0.20(Entropy Certainty) - 0.10(Doublet Probability)
scCertify/
├── R/
│ ├── calibrate_confidence.R
│ ├── cell_certify.R
│ ├── classify_confidence.R
│ ├── entropy_score.R
│ ├── explain_cell.R
│ ├── explain_confidence.R
│ ├── marker_score.R
│ ├── match_labels.R
│ └── neighbor_score.R
│
├── man/
├── DESCRIPTION
├── NAMESPACE
├── README.md
└── LICENSE
- Trajectory-aware confidence scoring
- Multimodal confidence integration
- Spatial transcriptomics support
- Automatic marker retrieval
- Cell ontology integration
- Batch-aware confidence estimation
- Calibration models
- Benchmarking framework
- Explainable AI visualization
- Atlas-scale optimization
If you use scCertify in your work, please cite:
Doddetipalli JS.
scCertify: Explainable confidence scoring for
single-cell RNA-seq annotations.
Jaya Surya Doddetipalli
MIT License


