Coherence crashing for 50 topics LDA model / 40k+ long documents (~20M total tokens)

Trying to compute `c_v coherence` for a 50 topic LDA model trained on 40k long documents (around 20M total tokens) takes about 15 minutes before crashing the kernel. Using `gensim` (via the great snippet provided in another issue) works just fine, takes about 2.5 minutes.

I'm running the following code on `tomotopy 0.12.3` / `python 3.10.8`, adapted from the examples repo:

```
coh_model = Coherence(lda_model_50k, coherence='c_v')
average_coherence = coh_model.get_score()
print(average_coherence)
```

Any thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coherence crashing for 50 topics LDA model / 40k+ long documents (~20M total tokens) #191

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Coherence crashing for 50 topics LDA model / 40k+ long documents (~20M total tokens) #191

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions