-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in scDEED #2
Comments
Dear Chit Tong,
Thank you for your interest in scDEED. The error messages are due to the
update of Seurat from V4 to V5. We are fixing this issue, and we will reply
to you as soon as we fix it. Thanks.
On Tue, Nov 7, 2023 at 7:39 AM Chit Tong Lio ***@***.***> wrote:
Hi scDeed authors,
Thank you for the cool tool. But it seems I couldn't get it run. I tried
using the example data and data from 10x:
1. When I run umap_example <- scDEED(input_counts , num_pc = 16,
use_method = "umap",visualization = TRUE) it returned an error:
Error in identical(input_data, input_counts) && perplexity == c(seq(from =
20, : 'length = 22' in coercion to 'logical(1)'
Traceback:
1. scDEED(input_counts, num_pc = 16, use_method = "umap",
visualization = TRUE)
1. When I use data from 10x and run scDEED, it gave me this error,
which is encountered after the message 'Optimization finished' :
Error in Permuted(input_data): no slot of name "scale.data" for this
object of class "Assay5"
Traceback:
1. scDEED(brain_sce, num_pc = 15, use_method = "umap", visualization =
TRUE)
2. Permuted(input_data)
I was actually running this on a spatial data matrix (filtered):
https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-2-sagittal-anterior-1-standard-1-1-0
Thank you!
Best,
Chit Tong
This is the R environment I am using:
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS/LAPACK: /nfs/home/students/chit/.conda/envs/nease/envs/sc/lib/
libopenblasp-r0.3.24.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Berlin
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_2.4.5 usethis_2.2.2
loaded via a namespace (and not attached):
[1] miniUI_0.1.1.1 compiler_4.3.1 crayon_1.5.2 promises_1.2.1
[5] Rcpp_1.0.11 stringr_1.5.0 callr_3.7.3 later_1.3.1
[9] fastmap_1.1.1 mime_0.12 R6_2.5.1 curl_5.1.0
[13] htmlwidgets_1.6.2 desc_1.4.2 profvis_0.3.8 rprojroot_2.0.4
[17] shiny_1.7.5.1 rlang_1.1.2 cachem_1.0.8 stringi_1.7.12
[21] httpuv_1.6.12 fs_1.6.3 pkgload_1.3.3 memoise_2.0.1
[25] cli_3.6.1 magrittr_2.0.3 ps_1.7.5 digest_0.6.33
[29] processx_3.8.2 xtable_1.8-4 remotes_2.4.2.1 lifecycle_1.0.4
[33] prettyunits_1.2.0 vctrs_0.6.4 glue_1.6.2 urlchecker_1.0.1
[37] sessioninfo_1.2.2 pkgbuild_1.4.2 purrr_1.0.2 tools_4.3.1
[41] ellipsis_0.3.2 htmltools_0.5.7
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQZP7H4GU7CZ7PX3IGY43C3YDJIZXAVCNFSM6AAAAAA7BMWUVSVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DCNRVGIYTONY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Best,
Jessica
…-----------------------------------------------
Jingyi Jessica Li (李婧翌), Ph.D.
Professor
Department of Statistics and Data Science (Primary)
Departments of Biostatistics, Computational Medicine, and Human Genetics
(Secondary)
University of California, Los Angeles
http://jsb.ucla.edu
Twitter: @jsb_ucla
|
Hello, Best, |
Hi Christy,
Also, the runtime using the umap defaults for n_neighbours and min.dist is around 8 hours. I realise that this explores more parameter combinations than the example above, but still wonder whether several hours is 'expected behaviour' or whether it signal something going wrong. Unfortunately, using the the umap defaults also returns more than 6000 dubious cells at best Many thanks in advance!! |
Hello,
The number of dubious cells still does not match the tutorial exactly; it turns out that using the default Seurat RunPCA(object, npcs = 50) and then subsetting to K dimensions is not the same as RunPCA(object, npcs = K). This seems to be an implementation error since they should be the same. We have chosen to use RunPCA(object, npcs = K), which results in a slightly different number of dubious cells compared to previously.
I have also made several modifications to the tutorial. Thank you! |
Hi Christy, Many thanks for the revised code, I can confirm that it identifies more realistic counts of 'dubious cells': However, I ran with scDEED with visualization=T' but no UMAPlot highlighting dubious cells was produced; I can' see the code blocks to generate are not included in (revised) scDEED.zip. Finally, can you confirm that the values in column 'dubious cells' of the 'results' df represent the row numbers of the input data? Meanwhile, many thanks for all your efforts!! PS: Don't worry about run time, it's fine when it completes OK. Just makes experimenting with code a little slow... |
Hello, Yes, values in the column dubious cells represents the number of dubious cells at that hyperparameter setting, in a string separated by commas. These are actually the column names (the original data and Seurat objects are gene x cell). You can plot the dubious cells this way: result_umap = scDEED(data, K = K, reduction.method = 'umap', n_neighbors = c(5, 20, 30, 40, 50), min.dist = c(0.2, 0.6)) dubious_cells = result_umap$full_results$ Best, |
Hi K,
Best, |
Hi Christy, No worries, and thanks for attending to it so quickly Re 1), do I need to re-run the scDEED function (all 8 hours of it...) to obtain the correct cells? Kind regards |
HI Klaus, Best, |
Hi Christy, Sorry to trouble you with bad news. I'm using scDEED.R from the zip archive you shared most recently (13 hours ago as I write this) I've run my data twice, and each time it produces this error:
where Independent of the error, can you please comment on my percentage of trustworthy cells (note this refers to results obtained with the second scDEED.zip archive you shared, before the most recent) Curiously, I obtain 76 dubious cells with Seurat RunUMAP default parameters (min.dist 0.3, n_neighbours 30), while the (so far) best parameters suggested by scDEED result in 68 dubious cells but at much larger n_neighbours (min.dist 0.3, n_neighbours 150). I have explored around both parameter pairs and the pair identified by scDEED is a more robust minimum (varying scDEED's n.neighbours in small increments results in substantially lower dubios cells than varying Seurat default's by the same degree). To me, these observations suggest that the scDEED approach works even if Seurat defaults were close (for this data set, I have another where they produced much higher dubious cell count than scDEED's parameters after just one round of optimisation) BTW, have you ever succeeded in optimising umap parameters to drive dubious cells count to zero? Are min.dist and n_neighbours rounded prior to your calculations? Sorry for the long post! |
Hi Klaus, The cutoffs for dubious and trustworthy cells are based on the null distribution; thus the number of dubious and trustworthy cells may be much larger than 5%. Actually, in our simulation results, we found that often there are not that many intermediate cells; the large majority of cells are trustworthy. I think this makes sense because if UMAP and t-SNE were not able to produce low dimensional embeddings that reasonably represented the pre-embedding space, then no one would use it. We are just trying to improve their low embedding space via hyperparameter optimization and identification of distortions. Below is a copy of Supplementary Fig. S17c from our paper, showing the number of intermediate cells (I have edited it here so that only the similarity percent =0.5, the default, is shown). At many perplexities, the number of intermediate cells is very low, actually 0. And this is what is causing the problem on the scDEED code, because now it cannot make the data frame when there are 0 intermediate cells. I didn’t think about that until you mentioned the error, I am sorry about that! I have updated this here. Yes, we also noted that around the minimum number of dubious cells, we find that the hyperparameter settings close to each other tend to be more stable. This is particularly true for t-SNE as perplexity increases (there are a few supplementary figures that show this). For UMAP it can be more random- the number of dubious cells can change drastically across hyperparameter settings. It is harder because there are two hyperparameters. Also yes to reaching 0 dubious cells; the marrow dataset (Supplementary Fig. S13) we obtained 0 dubious cells. I think also for the PBMC dataset, on both UMAP and t-SNE. There were some other examples as well, and actually we found that through hyper parameter optimziation, it is possible for both t-SNE and UMAP to have the same amount of dubious cells, and they may both be 0. So if you are not sure about which one to use, or maybe UMAP has more dubious cells than you would like, then you could try t-SNE and see if it’s better. Although it does sound like you have a good result from UMAP already! The last question I am not totally sure I am understanding- the n.neighbors and min.dist parameters are input directly into the RunUMAP function for Seurat. Per their website, n.neighbors should be between 5 to 50 and min.dist should be between 0.001 to 0.5 (although other websites have recommended differently, like this: https://pair-code.github.io/understanding-umap/. Our suggested UMAP parameters are to try to span the range of min.dist and n.neighbors without being too computationally expensive (we have changed the default to min.dist = c(0.1, 0.4) and n_neighbors = c(5, 20, 30, 40, 50) so there are only 10 pairs). Thank you very much for all your interest! This has been very helpful! |
Hi Christy, Overall, I found 5 very different combinations with similarly minimal counts of 'dubious' cells. 2* of the 5 combinations resulted in 'moving' of cells that were 'dubious' in the Seurat default UMAP and were located far away from the majority of the cells that were in the same cluster. Satisfyingly, these optimised hyperparameters 'moved' these cells into the UMAP area occupied by their cluster. *one of these combinations was I'm keen to see what happens with a more diverse data set ;-) Thanks also for the link to the umap explanation page - very nice!
Thanks you very much for all your help and guidance - I have enjoyed it very much Kind regards |
Hi K, devtools::install_github("JSB-UCLA/scDEED") For datasets with a complex toplogy, which may be what you have, it may not be possible to obtain 0 dubious cells, so in this case the goal is just to minimize the number of dubious cells so the overall visual is more trustworthy (like you said, the decrease from 243 dubious cell embeddings could be a reasonable gain). Similar to you, we have also found that some hyperparameter settings can result in similar visualizations. Overall, for UMAP it can be much harder to predict, due to the algorithm and combination of two hyperparameters. Thank you for sharing your experience with scDEED and helping debug the package! |
Hi scDeed authors,
Thank you for the cool tool. But it seems I couldn't get it run. I tried using the example data and data from 10x:
umap_example <- scDEED(input_counts , num_pc = 16, use_method = "umap",visualization = TRUE)
it returned an error:I was actually running this on a spatial data matrix (filtered): https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-2-sagittal-anterior-1-standard-1-1-0
Thank you!
Best,
Chit Tong
This is the R environment I am using:
The text was updated successfully, but these errors were encountered: