CTM: Topic Count Impossibly Large

I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:
![image](https://github.com/bab2min/tomotopy/assets/133242553/4a5154ab-282e-4585-8840-6cbf493d44db)

This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.

I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:
![image](https://github.com/bab2min/tomotopy/assets/133242553/bb4dd989-c5b2-4e41-9b90-1603f4169a47)

What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CTM: Topic Count Impossibly Large #202

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

CTM: Topic Count Impossibly Large #202

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions