Number of topics and peak using questions. Thank you. #31

helenhuangmath · 2019-10-09T21:17:38Z

Hi cisTopic team,

Thank you for developing the cisTopic software. We’re trying it on our scATAC data and find some promising results. But we have several concerns about our results. It will be great if you could provide some suggestions.

We tested different number of topics, but results showed the more topics the more stable model is in our data (attached figure1). Do you have any idea the reason of this?
We also noticed that some of the topics are similar to each other. Is there any good way to merge those similar topics? Is that ok to average the z-score or probabilities for these topics? Or do you think that we’d better manually select lower number of topics in selectModel() step?
Do you have any idea that how many times that each peak/region is really meaningful in contribution to topics in general? I noticed that when the algorithm builds region score, it seems that almost all peaks are used. However, some peaks have very limited contributions. After running binarizecisToipcs() to binarize topics, there're only about 20% of peaks passed the cutoff and saved in the results object@binarized.cisTopics and used for downstream functional and pathway analysis. But the rest 80% of the peaks do not have meaningful contribution to any topics. (the attached figure2). And some peaks are used more than 15 times in contributing to different topics. Is that normal? How could we interpret this result?

Thank you so much!!

cbravo93 · 2019-10-10T07:31:53Z

Hi @helenhuangmath !

I think I just answered this by email, but here it goes:

0- Based on the top above left (likelihood per iteration plot), I would increase the number of burning iterations. For the models with higher number of topics (from 35 on), your likelihood is not stabilized when starting the sampling (this can effect you model selection and results). Maybe 300 burn-in, or even a bit higher would be better.

1a- We have seen that in some datasets it seems more like the curve reach a stable likelihood rather than go down again after maximum. In this case, you can try with the most simple model which has a similar likelihood. For your data, I would maybe add some models (10,20,30) to make sure it is not around there.

1b- I would check the correlation between the scores (topic-cell, region-topic) before merging, and would be careful with downstream analyses (these correlated topics, how much do the regions on them overlap?). If you opt for merging them, then do it on the assignment matrices [for topic-cell and region-topic, respectively: object@selected.model$document_expects & object@selected.model$topics], and the rest of the functions/normalizations should work. I can help with the code for this, just let me know :).

Normally, binarised topics are around couple thousands regions; but this depends on the thresholds for binarisation you choose. I don’t like thresholds and prefer to work with the probabilities themselves when possible. Normally, regions that are not in topics are because they are generally lowly accessible (can you check the number of cells in which topic regions vs non-topic regions are accessible? And also check the binarisation plots). If you prefer to work with differentially accessible regions, you can also use the predictive distribution matrix (with probabilities of the regions in cells), and run e.g. a wilcoxon test between whatever groups. Also with this matrix, you can look for the accessibility probability of regions of interest (whether they are in a topic or not, this is quite interesting :P).

I hope this is useful, and let me know if you have more questions :)!

C

helenhuangmath · 2019-10-11T13:26:25Z

Thank you so much again! It's quite helpful!

helenhuangmath closed this as completed Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of topics and peak using questions. Thank you. #31

Number of topics and peak using questions. Thank you. #31

helenhuangmath commented Oct 9, 2019

cbravo93 commented Oct 10, 2019

helenhuangmath commented Oct 11, 2019

Number of topics and peak using questions. Thank you. #31

Number of topics and peak using questions. Thank you. #31

Comments

helenhuangmath commented Oct 9, 2019

cbravo93 commented Oct 10, 2019

helenhuangmath commented Oct 11, 2019