Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most important topics interpretation #16

Open
cuent opened this issue Apr 23, 2020 · 2 comments
Open

Most important topics interpretation #16

cuent opened this issue Apr 23, 2020 · 2 comments

Comments

@cuent
Copy link

cuent commented Apr 23, 2020

maybe this question is dumb but I don't understand why the average of the weighted document-topic-proportions is a metric for the most important topics?

thetaWeightedAvg = sums * theta
thetaWeightedAvg = thetaWeightedAvg  /  num_docs
print('\nThe 10 most used topics are {}'.format(thetaWeightedAvg.argsort()[::-1][:10]))

From my understanding, the product of each document frequency (sums) with document-topic probabilities theta amplifies or reduces probability-based on the actual probability. And the average provides some insights on which topics are important in the whole corpus. Is it right? Also, what would be the difference if we only average the document-topic proportions (no weighting)

@460176980
Copy link

I think the best topic can be selected according to task requirements. For example, if you want the easiest to explain, you can choose the topic consistency index; or to better fit the data, you can choose the confusion index

@cuent
Copy link
Author

cuent commented Jul 21, 2020

Could you please elaborate on the consistency/confusion index? I thought it was a way of selecting the most used topics one by doc_frequency and topic proportion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants