Skip to content

Coherence crashing for 50 topics LDA model / 40k+ long documents (~20M total tokens) #191

@Dijkie85

Description

@Dijkie85

Trying to compute c_v coherence for a 50 topic LDA model trained on 40k long documents (around 20M total tokens) takes about 15 minutes before crashing the kernel. Using gensim (via the great snippet provided in another issue) works just fine, takes about 2.5 minutes.

I'm running the following code on tomotopy 0.12.3 / python 3.10.8, adapted from the examples repo:

coh_model = Coherence(lda_model_50k, coherence='c_v')
average_coherence = coh_model.get_score()
print(average_coherence)

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions