-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyLDAvis topic IDs doesn't correspond to gensim topic IDs #127
Comments
I noticed the same issue when preparing an analysis on a gensim LDA model. Any insight on this topic is greatly appreciated. For ease of searching and additional analysis, it would be awesome if the visualization used the same model indexing as the underlying LDA model (index starting at zero). |
You can use
|
The same thing happens when using an sklearn model. This shuffling of topic IDs without warning is a very, very confusing behavior, and I struggle to comprehend why it occurs by default. If topics are to be sorted by prevalence, could it be helpful to assign different topic IDs that include a mapping back to the original topic ID? For example, if topic 9 is the most prevalent, why not call it "Topic A-09"? If the purpose of this visualization tool is to facilitate comprehension and labeling of topics discovered through unsupervised learning, how is it helpful to create labels that can't be mapped back to the model? Unfortunately, the docstrings and method signatures that users are most likely to read (
|
This issue had me tied up for hours. The crux of the issue is exactly what @mileserickson highlights - there's no way to easily discover that this exists. Perhaps an easy fix would be to change the |
Had same issue, thinking there was a problem during inference of topics. Topic sorting by relevance shouldn't be the default. |
Same issue happened to me.. this is very confusing |
I also would like to admit that this is really not something I have expected and it lead me to wrong results as no tutorial mentioned this. I would really appreciate changing the default to FALSE. |
Making this change will break the unit tests, that rely on R output data. So users who use the R package to produce the data, but use the pyLDAvis visualizations will have the problem then.
@bmabey Thoughts? |
Override non-intuitive parameters with more appropriate values to better match the expectations of tomotopy users who are not familiar with the internals of pyLDAvis A similar issue is discussed here: bmabey/pyLDAvis#127
When used with gensim model, pyLDAvis' topic 1 is not the same as gensim's topic 1, pyLDAvis' topic 2 is not the same as gensim's topic 2, and so on.
Is there any way to find out what is gensim's ID of pyLDAvis' topic 15?
The text was updated successfully, but these errors were encountered: