-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sequence depth issues in algorithms of LSI, LSA, LDA, PCA for dimension reduction #60
Comments
Hi, This is a very interesting and complex question. I will try to give an answer/discussion. Let me rephrase your questions in smaller ones. Technical low depth barcodes/cells can be caused by 2 things: About a): About b): Usually, if you are using peaks you will identify peaks from highly covered cells so you will lose the low-depth as noisy (having more than x percent of their reads outside of peaks).You can try to use an annotation based feature space to try to keep some of the biological signal. You can also focus on the lowly covered cell and try to use a different feature space, like promoter regions or small windows to see if there are some regions enriched in the lowly covered cells that might be cell type specific. Once you have done that you can decide on a feature space containing the regions that are cell type specific and look at all the cells together. So, to some extent you can salvage the low-depth cells from the technicaly lowly covered cells. However, you will still have the library size effect. It is a big technical artefact and it is not disappearing despite excluding PC1 and/or oding library size correction. You can check the relationship between library size (or any other technical artifact) and the PC components using the function correlation_pc. This is very useful to identify artifacts in the data. However, we would not recommend to remove the first four PCs, as you will be removing a lot of the biological variation present in the data like that (as you can see that library size is mainly correlated with PC1; to check how much library size explains the other PCs you can use correlation_pc). |
Hi all,
The sequencing depth of single cells would be an important factor that may hinder true discovers like cell type identification, pseudo-time paths calculation etc. As far as I know, many scATAC tools (cistopic with LDA, signac with LSI, cellranger-atac with LSA, episcanpy with PCA) have difficulties to deduce a true dimension reduced clustering space without pre-filtering low-depth cells (correct me if I miss something).
However, in some cases, cells may perhaps indeed show less ATAC fragments (or low UMI transcription) for some biological reasons. Therefore how to precisely distinguish those cells from broken cells is a true challenging. There is very few information about this issue (one mentioned here stuart-lab/signac#106) and I think this is an important question and many researchers will be interested with it.
In my case, I compared the UMAP plots before and after removal of the first four dimensions (the first dimension are indeed correlated with sequence depth, I excluded the first four dims for safe), the shape of scatterplot looks similar and positions of cell clusters with low-depth (not too low, at least 3k fragments per cell after prefiltering) remain unchanged too much.
To summary my question, how to deal with cells with low depth to avoid false positive result but keep real cells? Any suggestions will be weIcome.
The text was updated successfully, but these errors were encountered: