-
-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:

This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.
I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:

What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels