Most important topics interpretation #16

cuent · 2020-04-23T02:14:14Z

maybe this question is dumb but I don't understand why the average of the weighted document-topic-proportions is a metric for the most important topics?

thetaWeightedAvg = sums * theta
thetaWeightedAvg = thetaWeightedAvg  /  num_docs
print('\nThe 10 most used topics are {}'.format(thetaWeightedAvg.argsort()[::-1][:10]))

From my understanding, the product of each document frequency (sums) with document-topic probabilities theta amplifies or reduces probability-based on the actual probability. And the average provides some insights on which topics are important in the whole corpus. Is it right? Also, what would be the difference if we only average the document-topic proportions (no weighting)

The text was updated successfully, but these errors were encountered:

460176980 · 2020-07-20T01:38:05Z

I think the best topic can be selected according to task requirements. For example, if you want the easiest to explain, you can choose the topic consistency index; or to better fit the data, you can choose the confusion index

cuent · 2020-07-21T17:34:04Z

Could you please elaborate on the consistency/confusion index? I thought it was a way of selecting the most used topics one by doc_frequency and topic proportion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Most important topics interpretation #16

Most important topics interpretation #16

cuent commented Apr 23, 2020

460176980 commented Jul 20, 2020

cuent commented Jul 21, 2020

Most important topics interpretation #16

Most important topics interpretation #16

Comments

cuent commented Apr 23, 2020

460176980 commented Jul 20, 2020

cuent commented Jul 21, 2020