Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell type is lost during annotateNhoods #327

Closed
carlacohen opened this issue May 16, 2024 · 5 comments
Closed

Cell type is lost during annotateNhoods #327

carlacohen opened this issue May 16, 2024 · 5 comments

Comments

@carlacohen
Copy link

Describe the bug
I have used miloR before and have got it to work fine, thanks it is a great tool!

Now, I have a single nuc RNAseq dataset comprising 6 subtypes of fibroblasts. I have performed the miloR workflow and when I use the function "annotateNhoods" I only see 5 subtypes in the annotation column of da_results. The smallest subtype (myofibroblasts) has disappeared.

Is there a reason why the annotation isn't working properly, and could I alter some parameters to be able to rectify this?
Any help greatly appreciated! Thanks

Minimum code example
Minimum example to reproduce the error

# Seurat object
so.fibroblasts
# NB the idents are: 
[1] NEGR1hi_tendon_Fibroblasts         ITGA10hi_COMPhi_Tendon_fibroblasts Pericytes                         
[4] NEGR1hi_Muscle_Fibroblasts         Myofibroblasts                     ACAN1hi_Chondrocytes              
6 Levels: NEGR1hi_tendon_Fibroblasts ITGA10hi_COMPhi_Tendon_fibroblasts NEGR1hi_Muscle_Fibroblasts ... Pericytes

# Convert to single cell experiment object  
sce <- as.SingleCellExperiment(so.fibroblasts)

# Create a milo object
achilles_milo <- Milo(sce)

# construct KNN graph
achilles_milo <- buildGraph(achilles_milo, k = 30, d = 30, 
                            reduced.dim = ini$dim_reduction)

# define representative neighbourhoods
achilles_milo <- makeNhoods(achilles_milo, 
                                    prop = 0.1, #if ncells<30k use prop=0.1 
                                    # in the paper they use 0.3, and have 90k cells
                            k = 30, d=30, refined = TRUE, reduced_dims = ini$dim_reduction) # paper uses "PCA"

# compute neighbourhood connectivity
achilles_milo <- calcNhoodDistance(achilles_milo, d=30, 
                                   reduced.dim = ini$dim_reduction)

# Count how many cells are in each neighbourhood per sample
achilles_milo <- countCells(achilles_milo, 
                            meta.data = as.data.frame(colData(achilles_milo)),
                            sample="sample")

# Set up experimental design for differential abundance testing

achilles_design <- data.frame(colData(achilles_milo))[,c("sample", "patient", "microanatomical_site")]

## Convert batch info from integer to factor
achilles_design$patient <- as.factor(achilles_design$patient) 
achilles_design <- distinct(achilles_design) # keep unique rows
rownames(achilles_design) <- achilles_design$sample

# add a column to represent microanatomical site as an integer 
# 1 = enth, 2 = MB, 3 = MTJ, 4 = muscle
m.df <- data.frame(microanatomical_site = c("Enth", "MB", "MTJ", "muscle"),
                   microanatomy_integer = c(1, 2, 3, 4))

achilles_design <- left_join(achilles_design, m.df)

# add rownames (sample names)
rownames(achilles_design) <- achilles_design$sample

# make sure the design table is in the same order as the nhoodCounts table
achilles_design <- achilles_design[colnames(nhoodCounts(achilles_milo)), , drop=FALSE]

# Perform the differential abundance test  
da_results <- testNhoods(achilles_milo, 
                         design = as.formula(ini$design_formula),
                         design.df=achilles_design, 
                         fdr.weighting="k-distance", # default
                         reduced.dim = ini$dim_reduction
                         )

da_results$Diff <- sign(da_results$logFC) # add a column to show up or down regulation
da_results$Diff[da_results$SpatialFDR > 0.1] <- 0 # made that column 0 if the result is not significant
table(da_results$Diff)

# Build neihbourhood graph
achilles_milo <- buildNhoodGraph(achilles_milo, overlap=5)

# plot nhood graph by DA
plotNhoodGraphDA(achilles_milo, layout="UMAP", milo_res=da_results, alpha=0.1)+
    scale_colour_gradient2(low = "#00466B", mid = "#FFFFFF", high = "#A91400", midpoint = 0, name = "logFC")

# plot nhood graph by original cluster
plotNhoodGraph(achilles_milo, layout="UMAP", colour_by="cell_annotation_fibroblasts") +   
    scale_fill_manual(values=achilles_fibroblasts.colours) +
    guides(fill=guide_legend(title="Cell type", override.aes=list(size=4)))

# Annotate the neighbourhoods & plot the homogeneity  
da_results <- annotateNhoods(achilles_milo, da_results, coldata_col = "cell_annotation_fibroblasts")
ggplot(da_results, aes(cell_annotation_fibroblasts_fraction)) + geom_histogram(bins=50)

# Annotate as "Mixed" if there are a mixture of cell types
da_results$cell_annotation_fibroblasts <- ifelse(da_results$cell_annotation_fibroblasts_fraction < 0.7, "Mixed", da_results$cell_annotation_fibroblasts)

# Visualise the fold changes in different cell types
plotDAbeeswarm(da_results, group.by = "cell_annotation_fibroblasts")+
    theme(axis.title.y=element_blank())+
    scale_fill_gradient2(low = "#00466B", mid = "#FFFFFF", high = "#A91400", midpoint = 0, name = "logFC")

Full error traceback

## the error
No error message but the output of unique(da_results$cell_annotation_fibroblasts) is 
[1] "NEGR1hi_tendon_Fibroblasts"         "ITGA10hi_COMPhi_Tendon_fibroblasts" "ACAN1hi_Chondrocytes"              
[4] "NEGR1hi_Muscle_Fibroblasts"         "Pericytes"              "Mixed"

I have checked 
- all 6 cell types are present in the sce and milo objects
- the myofibroblasts are missing before the "Mixed" category is added. 

Session info
Output of sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS: /ceph/package/u22/R-base/4.3.0/lib/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] pheatmap_1.0.12 DEGreport_1.36.0 DESeq2_1.40.1 scran_1.28.1
[5] scater_1.28.0 scuttle_1.10.1 SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.2
[9] Biobase_2.60.0 GenomicRanges_1.52.0 GenomeInfoDb_1.36.1 IRanges_2.34.1
[13] S4Vectors_0.38.1 BiocGenerics_0.46.0 MatrixGenerics_1.12.2 matrixStats_1.0.0
[17] miloR_1.8.1 edgeR_3.42.2 limma_3.56.1 clustree_0.5.0
[21] ggraph_2.1.0 svglite_2.1.1 RColorBrewer_1.1-3 ggVennDiagram_1.2.2
[25] EnhancedVolcano_1.18.0 ggrepel_0.9.3 viridis_0.6.3 viridisLite_0.4.2
[29] yaml_2.3.7 cowplot_1.1.1 SeuratObject_4.1.3 Seurat_4.3.0.1
[33] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[37] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[41] ggplot2_3.4.2 tidyverse_2.0.0 later_1.3.2

loaded via a namespace (and not attached):
[1] spatstat.sparse_3.0-2 bitops_1.0-7 httr_1.4.6 doParallel_1.0.17
[5] backports_1.4.1 tools_4.3.0 sctransform_0.3.5 utf8_1.2.3
[9] R6_2.5.1 lazyeval_0.2.2 uwot_0.1.15 GetoptLong_1.0.5
[13] withr_2.5.0 sp_2.1-1 gridExtra_2.3 progressr_0.13.0
[17] cli_3.6.2 textshaping_0.3.6 logging_0.10-108 spatstat.explore_3.2-1
[21] labeling_0.4.2 spatstat.data_3.0-1 ggridges_0.5.4 pbapply_1.7-2
[25] systemfonts_1.0.4 parallelly_1.36.0 rstudioapi_0.16.0 generics_0.1.3
[29] shape_1.4.6 gtools_3.9.4 ica_1.0-3 spatstat.random_3.1-5
[33] Matrix_1.5-4.1 ggbeeswarm_0.7.2 fansi_1.0.4 abind_1.4-5
[37] lifecycle_1.0.3 Rtsne_0.16 grid_4.3.0 promises_1.2.0.1
[41] dqrng_0.3.0 crayon_1.5.2 miniUI_0.1.1.1 lattice_0.21-8
[45] beachmat_2.16.0 pillar_1.9.0 knitr_1.43 ComplexHeatmap_2.16.0
[49] metapod_1.8.0 rjson_0.2.21 future.apply_1.11.0 codetools_0.2-19
[53] leiden_0.4.3 glue_1.6.2 data.table_1.14.8 vctrs_0.6.3
[57] png_0.1-8 gtable_0.3.3 xfun_0.39 S4Arrays_1.2.0
[61] mime_0.12 tidygraph_1.2.3 ConsensusClusterPlus_1.64.0 RVenn_1.1.0
[65] survival_3.5-5 iterators_1.0.14 statmod_1.5.0 bluster_1.10.0
[69] ellipsis_0.3.2 fitdistrplus_1.1-11 ROCR_1.0-11 nlme_3.1-162
[73] RcppAnnoy_0.0.20 irlba_2.3.5.1 vipor_0.4.5 KernSmooth_2.23-21
[77] colorspace_2.1-0 mnormt_2.1.1 ggrastr_1.0.1 tidyselect_1.2.0
[81] compiler_4.3.0 BiocNeighbors_1.18.0 ggdendro_0.1.23 DelayedArray_0.26.3
[85] plotly_4.10.2 scales_1.2.1 psych_2.3.3 lmtest_0.9-40
[89] digest_0.6.32 goftest_1.2-3 spatstat.utils_3.0-3 rmarkdown_2.22
[93] XVector_0.40.0 htmltools_0.5.5 pkgconfig_2.0.3 sparseMatrixStats_1.12.1
[97] fastmap_1.1.1 rlang_1.1.3 GlobalOptions_0.1.2 htmlwidgets_1.6.2
[101] shiny_1.7.4 DelayedMatrixStats_1.22.1 farver_2.1.1 zoo_1.8-12
[105] jsonlite_1.8.5 BiocParallel_1.34.2 BiocSingular_1.16.0 RCurl_1.98-1.12
[109] magrittr_2.0.3 GenomeInfoDbData_1.2.10 patchwork_1.1.2 munsell_0.5.0
[113] Rcpp_1.0.10 reticulate_1.34.0 stringi_1.7.12 zlibbioc_1.46.0
[117] MASS_7.3-60 plyr_1.8.8 parallel_4.3.0 listenv_0.9.0
[121] deldir_1.0-9 graphlayouts_1.0.0 splines_4.3.0 tensor_1.5
[125] hms_1.1.3 circlize_0.4.15 locfit_1.5-9.7 igraph_1.5.0
[129] spatstat.geom_3.2-1 reshape2_1.4.4 ScaledMatrix_1.8.1 evaluate_0.21
[133] tzdb_0.4.0 foreach_1.5.2 tweenr_2.0.2 httpuv_1.6.11
[137] RANN_2.6.1 polyclip_1.10-4 reshape_0.8.9 future_1.32.0
[141] clue_0.3-64 scattermore_1.2 ggforce_0.4.1 rsvd_1.0.5
[145] broom_1.0.4 xtable_1.8-4 ragg_1.2.5 beeswarm_0.4.0
[149] cluster_2.1.4 timechange_0.2.0 globals_0.16.2

@MikeDMorgan
Copy link
Member

Hi @carlacohen - the neighbourhood annotation is based on a majority vote of cells in each neighbourhood - therefore, if there if the missing cell types are in nhoods with other cell types, but don't reach a majority, then they won't be observed at the nhood level. You should be able to see this in the output of annotateNhood in the {variable}_col_fraction column.

You can double-check this yourself, but computing for each nhood the proportion of each cell type, and see that the missing cell type is never in the majority.

@carlacohen
Copy link
Author

Thanks for the quick reply! That makes sense, I guess this is a caveat of having a very rare population of cells in the analysis.
Do you think reducing the size of neighbourhoods would help or will I never really be able to pull them out (the myofibroblast cluster only comprises 341 cells, and there are ~25k cells in the object)?

@MikeDMorgan
Copy link
Member

You would need to reduce k if these cells are not well separated in the NN-graph, bearing in mind the need to balance sufficient counts in each nhood for the DA testing.

@MikeDMorgan
Copy link
Member

Hi @carlacohen did this fix your issue?

@carlacohen
Copy link
Author

Hi @MikeDMorgan yes I was able to tweak the parameters and create neighbourhoods that represented the missing cell type. Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants