Cell type is lost during annotateNhoods #327

carlacohen · 2024-05-16T13:07:52Z

Describe the bug
I have used miloR before and have got it to work fine, thanks it is a great tool!

Now, I have a single nuc RNAseq dataset comprising 6 subtypes of fibroblasts. I have performed the miloR workflow and when I use the function "annotateNhoods" I only see 5 subtypes in the annotation column of da_results. The smallest subtype (myofibroblasts) has disappeared.

Is there a reason why the annotation isn't working properly, and could I alter some parameters to be able to rectify this?
Any help greatly appreciated! Thanks

Minimum code example
Minimum example to reproduce the error

# Seurat object
so.fibroblasts
# NB the idents are: 
[1] NEGR1hi_tendon_Fibroblasts         ITGA10hi_COMPhi_Tendon_fibroblasts Pericytes                         
[4] NEGR1hi_Muscle_Fibroblasts         Myofibroblasts                     ACAN1hi_Chondrocytes              
6 Levels: NEGR1hi_tendon_Fibroblasts ITGA10hi_COMPhi_Tendon_fibroblasts NEGR1hi_Muscle_Fibroblasts ... Pericytes

# Convert to single cell experiment object  
sce <- as.SingleCellExperiment(so.fibroblasts)

# Create a milo object
achilles_milo <- Milo(sce)

# construct KNN graph
achilles_milo <- buildGraph(achilles_milo, k = 30, d = 30, 
                            reduced.dim = ini$dim_reduction)

# define representative neighbourhoods
achilles_milo <- makeNhoods(achilles_milo, 
                                    prop = 0.1, #if ncells<30k use prop=0.1 
                                    # in the paper they use 0.3, and have 90k cells
                            k = 30, d=30, refined = TRUE, reduced_dims = ini$dim_reduction) # paper uses "PCA"

# compute neighbourhood connectivity
achilles_milo <- calcNhoodDistance(achilles_milo, d=30, 
                                   reduced.dim = ini$dim_reduction)

# Count how many cells are in each neighbourhood per sample
achilles_milo <- countCells(achilles_milo, 
                            meta.data = as.data.frame(colData(achilles_milo)),
                            sample="sample")

# Set up experimental design for differential abundance testing

achilles_design <- data.frame(colData(achilles_milo))[,c("sample", "patient", "microanatomical_site")]

## Convert batch info from integer to factor
achilles_design$patient <- as.factor(achilles_design$patient) 
achilles_design <- distinct(achilles_design) # keep unique rows
rownames(achilles_design) <- achilles_design$sample

# add a column to represent microanatomical site as an integer 
# 1 = enth, 2 = MB, 3 = MTJ, 4 = muscle
m.df <- data.frame(microanatomical_site = c("Enth", "MB", "MTJ", "muscle"),
                   microanatomy_integer = c(1, 2, 3, 4))

achilles_design <- left_join(achilles_design, m.df)

# add rownames (sample names)
rownames(achilles_design) <- achilles_design$sample

# make sure the design table is in the same order as the nhoodCounts table
achilles_design <- achilles_design[colnames(nhoodCounts(achilles_milo)), , drop=FALSE]

# Perform the differential abundance test  
da_results <- testNhoods(achilles_milo, 
                         design = as.formula(ini$design_formula),
                         design.df=achilles_design, 
                         fdr.weighting="k-distance", # default
                         reduced.dim = ini$dim_reduction
                         )

da_results$Diff <- sign(da_results$logFC) # add a column to show up or down regulation
da_results$Diff[da_results$SpatialFDR > 0.1] <- 0 # made that column 0 if the result is not significant
table(da_results$Diff)

# Build neihbourhood graph
achilles_milo <- buildNhoodGraph(achilles_milo, overlap=5)

# plot nhood graph by DA
plotNhoodGraphDA(achilles_milo, layout="UMAP", milo_res=da_results, alpha=0.1)+
    scale_colour_gradient2(low = "#00466B", mid = "#FFFFFF", high = "#A91400", midpoint = 0, name = "logFC")

# plot nhood graph by original cluster
plotNhoodGraph(achilles_milo, layout="UMAP", colour_by="cell_annotation_fibroblasts") +   
    scale_fill_manual(values=achilles_fibroblasts.colours) +
    guides(fill=guide_legend(title="Cell type", override.aes=list(size=4)))

# Annotate the neighbourhoods & plot the homogeneity  
da_results <- annotateNhoods(achilles_milo, da_results, coldata_col = "cell_annotation_fibroblasts")
ggplot(da_results, aes(cell_annotation_fibroblasts_fraction)) + geom_histogram(bins=50)

# Annotate as "Mixed" if there are a mixture of cell types
da_results$cell_annotation_fibroblasts <- ifelse(da_results$cell_annotation_fibroblasts_fraction < 0.7, "Mixed", da_results$cell_annotation_fibroblasts)

# Visualise the fold changes in different cell types
plotDAbeeswarm(da_results, group.by = "cell_annotation_fibroblasts")+
    theme(axis.title.y=element_blank())+
    scale_fill_gradient2(low = "#00466B", mid = "#FFFFFF", high = "#A91400", midpoint = 0, name = "logFC")

Full error traceback

## the error
No error message but the output of unique(da_results$cell_annotation_fibroblasts) is 
[1] "NEGR1hi_tendon_Fibroblasts"         "ITGA10hi_COMPhi_Tendon_fibroblasts" "ACAN1hi_Chondrocytes"              
[4] "NEGR1hi_Muscle_Fibroblasts"         "Pericytes"              "Mixed"

I have checked 
- all 6 cell types are present in the sce and milo objects
- the myofibroblasts are missing before the "Mixed" category is added.

Session info
Output of sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS: /ceph/package/u22/R-base/4.3.0/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] pheatmap_1.0.12 DEGreport_1.36.0 DESeq2_1.40.1 scran_1.28.1
[5] scater_1.28.0 scuttle_1.10.1 SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2
[9] Biobase_2.60.0 GenomicRanges_1.52.0 GenomeInfoDb_1.36.1 IRanges_2.34.1
[13] S4Vectors_0.38.1 BiocGenerics_0.46.0 MatrixGenerics_1.12.2 matrixStats_1.0.0
[17] miloR_1.8.1 edgeR_3.42.2 limma_3.56.1 clustree_0.5.0
[21] ggraph_2.1.0 svglite_2.1.1 RColorBrewer_1.1-3 ggVennDiagram_1.2.2
[25] EnhancedVolcano_1.18.0 ggrepel_0.9.3 viridis_0.6.3 viridisLite_0.4.2
[29] yaml_2.3.7 cowplot_1.1.1 SeuratObject_4.1.3 Seurat_4.3.0.1
[33] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[37] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[41] ggplot2_3.4.2 tidyverse_2.0.0 later_1.3.2

loaded via [1] spatstat.sparse_3.0-2 [5] backports_1.4.1 [9] R6_2.5.1 [13] withr_2.5.0 [17] cli_3.6.2 [21] labeling_0.4.2 [25] systemfonts_1.0.4 [29] shape_1.4.6 [33] Matrix_1.5-4.1 [37] lifecycle_1.0.3 [41] dqrng_0.3.0 [45] beachmat_2.16.0 [49] metapod_1.8.0 [53] leiden_0.4.3 [57] png_0.1-8 [61] mime_0.12 [65] survival_3.5-5 [69] ellipsis_0.3.2 [73] RcppAnnoy_0.0.20 [77] colorspace_2.1-0 [81] compiler_4.3.0 [85] plotly_4.10.2 [89] digest_0.6.32 [93] XVector_0.40.0 [97] fastmap_1.1.1 [101] shiny_1.7.4 [105] jsonlite_1.8.5 [109] magrittr_2.0.3 [113] Rcpp_1.0.10 [117] MASS_7.3-60 [121] deldir_1.0-9 [125] hms_1.1.3 [129] spatstat.geom_3.2-1 [133] tzdb_0.4.0 [137] RANN_2.6.1 [141] clue_0.3-64 [145] broom_1.0.4 [149] cluster_2.1.4 a namespace (and not attached):
bitops_1.0-7 httr_1.4.6 doParallel_1.0.17
tools_4.3.0 sctransform_0.3.5 utf8_1.2.3
lazyeval_0.2.2 uwot_0.1.15 GetoptLong_1.0.5
sp_2.1-1 gridExtra_2.3 progressr_0.13.0
textshaping_0.3.6 logging_0.10-108 spatstat.explore_3.2-1
spatstat.data_3.0-1 ggridges_0.5.4 pbapply_1.7-2
parallelly_1.36.0 rstudioapi_0.16.0 generics_0.1.3
gtools_3.9.4 ica_1.0-3 spatstat.random_3.1-5
ggbeeswarm_0.7.2 fansi_1.0.4 abind_1.4-5
Rtsne_0.16 grid_4.3.0 promises_1.2.0.1
crayon_1.5.2 miniUI_0.1.1.1 lattice_0.21-8
pillar_1.9.0 knitr_1.43 ComplexHeatmap_2.16.0
rjson_0.2.21 future.apply_1.11.0 codetools_0.2-19
glue_1.6.2 data.table_1.14.8 vctrs_0.6.3
gtable_0.3.3 xfun_0.39 S4Arrays_1.2.0
tidygraph_1.2.3 ConsensusClusterPlus_1.64.0 RVenn_1.1.0
iterators_1.0.14 statmod_1.5.0 bluster_1.10.0
fitdistrplus_1.1-11 ROCR_1.0-11 nlme_3.1-162
irlba_2.3.5.1 vipor_0.4.5 KernSmooth_2.23-21
mnormt_2.1.1 ggrastr_1.0.1 tidyselect_1.2.0
BiocNeighbors_1.18.0 ggdendro_0.1.23 DelayedArray_0.26.3
scales_1.2.1 psych_2.3.3 lmtest_0.9-40
goftest_1.2-3 spatstat.utils_3.0-3 rmarkdown_2.22
htmltools_0.5.5 pkgconfig_2.0.3 sparseMatrixStats_1.12.1
rlang_1.1.3 GlobalOptions_0.1.2 htmlwidgets_1.6.2
DelayedMatrixStats_1.22.1 farver_2.1.1 zoo_1.8-12
BiocParallel_1.34.2 BiocSingular_1.16.0 RCurl_1.98-1.12
GenomeInfoDbData_1.2.10 patchwork_1.1.2 munsell_0.5.0
reticulate_1.34.0 stringi_1.7.12 zlibbioc_1.46.0
plyr_1.8.8 parallel_4.3.0 listenv_0.9.0
graphlayouts_1.0.0 splines_4.3.0 tensor_1.5
circlize_0.4.15 locfit_1.5-9.7 igraph_1.5.0
reshape2_1.4.4 ScaledMatrix_1.8.1 evaluate_0.21
foreach_1.5.2 tweenr_2.0.2 httpuv_1.6.11
polyclip_1.10-4 reshape_0.8.9 future_1.32.0
scattermore_1.2 ggforce_0.4.1 rsvd_1.0.5
xtable_1.8-4 ragg_1.2.5 beeswarm_0.4.0
timechange_0.2.0 globals_0.16.2

The text was updated successfully, but these errors were encountered:

MikeDMorgan · 2024-05-16T13:19:51Z

Hi @carlacohen - the neighbourhood annotation is based on a majority vote of cells in each neighbourhood - therefore, if there if the missing cell types are in nhoods with other cell types, but don't reach a majority, then they won't be observed at the nhood level. You should be able to see this in the output of annotateNhood in the {variable}_col_fraction column.

You can double-check this yourself, but computing for each nhood the proportion of each cell type, and see that the missing cell type is never in the majority.

carlacohen · 2024-05-16T13:29:40Z

Thanks for the quick reply! That makes sense, I guess this is a caveat of having a very rare population of cells in the analysis.
Do you think reducing the size of neighbourhoods would help or will I never really be able to pull them out (the myofibroblast cluster only comprises 341 cells, and there are ~25k cells in the object)?

MikeDMorgan · 2024-05-16T13:54:10Z

You would need to reduce k if these cells are not well separated in the NN-graph, bearing in mind the need to balance sufficient counts in each nhood for the DA testing.

MikeDMorgan · 2024-06-03T15:17:30Z

Hi @carlacohen did this fix your issue?

carlacohen · 2024-06-04T10:18:15Z

Hi @MikeDMorgan yes I was able to tweak the parameters and create neighbourhoods that represented the missing cell type. Thanks for your help

MikeDMorgan closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cell type is lost during annotateNhoods #327

Cell type is lost during annotateNhoods #327

carlacohen commented May 16, 2024

MikeDMorgan commented May 16, 2024

carlacohen commented May 16, 2024

MikeDMorgan commented May 16, 2024

MikeDMorgan commented Jun 3, 2024

carlacohen commented Jun 4, 2024

Cell type is lost during annotateNhoods #327

Cell type is lost during annotateNhoods #327

Comments

carlacohen commented May 16, 2024

MikeDMorgan commented May 16, 2024

carlacohen commented May 16, 2024

MikeDMorgan commented May 16, 2024

MikeDMorgan commented Jun 3, 2024

carlacohen commented Jun 4, 2024