Skip to content

CTM: Topic Count Impossibly Large #202

@tau-241

Description

@tau-241

I used a correlated topic model on a 4,500-document corpus to learn the type and frequency of topics. The results were very good, but unfortunately one of the topics (#14) has an impossible count more than double the number of documents:
image

This library is easy to use and very fast/performant and I feel lucky to have found it, but I can't use the results when a known-to-be-common topic has an impossible count.

I tried HDPModel and got a similar result, where one topic (#6) had a count of almost 4x the number of documents:
image

What caused the large counts? Did I make a mistake? Is there a way for me to get the topic distributions for each individual document?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions