This project aims to evaluate the robustness and reliability of cell type annotations in human and mouse white adipose tissue (WAT) single-cell RNA sequencing atlases. Using the comprehensive dataset generated by Emont et al., the central objective is to assess how stable and reproducible annotated cell types are within and across species. To achieve this, we will apply scTypeEval, a framework designed to evaluate the quality of cell type annotations by quantifying inter-sample consistency and the robustness of assigned labels. This approach enables an objective assessment of how consistently annotated cell types are represented across individuals within each species and how well these annotations generalize between human and mouse.
- How consistent are annotated cell types across individuals within each species ?
- To what extent do annotated cell types generalize across species ?
- How does annotation resolution (broad vs fine-grained) affect consistency ?
Alterations in adiposity are associated with dyslipidemia, insulin resistance, and type 2 diabetes. Understanding how white adipose tissue (WAT) changes, and determining whether mouse models accurately reflect human biology, can help identify specific cell populations or pathways that drive disease progression. scRNA-seq atlases, such as the dataset generated by Emont et al., provide unprecedented resolution to characterize adipose tissue heterogeneity across species and physiological states. However, the reliability of biological conclusions drawn from these atlases critically depends on the robustness and consistency of cell type annotations. Variability in annotation strategies, marker selection, and resolution can limit reproducibility and hinder meaningful cross-species comparisons. Therefore, a systematic evaluation of annotation quality is necessary to ensure that observed similarities or differences between human and mouse WAT reflect biological reality rather than methodological inconsistencies.
scTypeEval is a framework designed to evaluate the quality of cell type annotations in single-cell RNA sequencing (scRNA-seq) data. Since true reference labels are often unavailable, it uses internal validation metrics to measure how consistent cell type labels are across samples. The tool processes scRNA-seq data, identifies relevant genes (like highly variable genes or marker genes), computes dissimilarities between cell types, and calculates consistency metrics to detect misclassified or ambiguous cell populations. Overall, scTypeEval helps benchmark and compare manual annotations, automated classifiers, and clustering results without requiring ground-truth labels.
The human lite objects dataset is available at: https://uchicago.box.com/s/bmhkw0j2qkkgnpmpz33bw583pppoib0y and the mouse lite objects dataset at: https://uchicago.box.com/s/p7r6cdbcbwqxh8lxm7frqlcx88zb0mjp . The human data were generated using single-nucleus sequencing (sNuc-seq), which enables the capture of adipocytes, on subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). In addition, whole-cell Drop-seq (scRNAseq) was performed on subcutaneous WAT. For this approach, single cells were isolated from the non-adipocyte stromal-vascular fraction SVF using collagenase digestion. However, this method cannot capture mature adipocytes because they are too fragile to withstand the procedure.
For the mouse data, mice were fed either a chow diet or a high-fat diet (HFD) for 13 weeks. sNuc-seq was then performed on inguinal adipose tissue (ING)(corresponding to human SAT) and perigonadal adipose tissue (PG) (epididymal (EPI) in males and periovarian (POV) in females, corresponding to human VAT)
We see 166 149 observations in the human metadata dataset and 197 721 observations in the mouse metadata dataset. The script dedicated to obtain these numbers is data/WAT_BroadSingleCellPortal_SCP1376/Curate_metadata.Rmd, it also creates a summary .csv file (dataset_metadata_summary.csv)
In the human metadata :
- Technology, represents the 2 techniques used to produce the dataset :
- Chromium-v3 (sNuc-seq) = 137 684 nuclei
- Drop-Seq = 28 465 whole cells
- Number of replicates, refers how many times an experiment is independently repeated to ensure reliability and measure variability :
- 32 samples
- 22 individuals
- Number of genes detected per cell (nFeature_RNA variable) :
- Min. : 249 genes/cell
- Median : 1524 genes/cell
- Mean : 1753 genes/cell
- Max. : 14 442 genes/cell
- Number of cells per replicate :
- Mean : 5192.156 cells /replicate
- Median : 5090 cells /replicate
- Low-quality / dying cells often exhibit extensive mitochondrial contamination (mt.percent variable)
- Min. : 0.00 %
- Median : 1.34 %
- Mean : 2.00 %
- Max. : 10.00 %
- Granularity, number of cell types described
- 45 cell types
In the mouse metadata :
- Technology, represents the only techniques used to produce the dataset :
- Chromium-v3 (sNuc-seq) = 197 721 nuclei
- Number of replicates:
- 24 samples
- 14 individuals (variable animal)
- Number of genes detected per cell (nFeature_RNA variable) :
- Min. : 21 genes/cell
- Median : 1369 genes/cell
- Mean : 1614 genes/cell
- Max. : 11 061 genes/cell
- Number of cells per replicate :
- Mean : 8238.375 cells /replicate
- Median : 7766 cells /replicate
- Low-quality / dying cells often exhibit extensive mitochondrial contamination (mt.percent variable)
- Min. : 0.00 %
- Median : 0.00 %
- Mean : 0.18 %
- Max. : 9.95 %
- Granularity, number of cell types described
- 48 cell types
The preprocessing strategy consisted of standardizing and quality-controlling the single-cell RNA-seq dataset to ensure comparability with other datasets in the atlas. First, the raw Seurat object was loaded and cell metadata were cleaned and standardized to include consistent variables such as sample, individual, tissue, and technology. Cell identifiers were reformatted to follow a unified structure combining study information and the original barcode, ensuring unique cell IDs across datasets. Gene symbols were then harmonized using STACAS to map them to a reference gene annotation (Ensembl GRCh38), allowing consistent gene naming across studies. The data were normalized using the NormalizeData function from Seurat, which applies log-normalization to correct for differences in sequencing depth between cells. Quality control metrics were evaluated, including the number of detected genes, total UMI counts, mitochondrial and ribosomal transcript percentages, and sequencing complexity. Cells not meeting predefined thresholds were removed to exclude low-quality or stressed cells. Finally, samples with very few cells were discarded and large samples were downsampled to balance representation across individuals.
