-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to retrieve the words used to generate the tf-idf? #331
Comments
The Hopefully, this helps! |
That works, thanks Maarten!
|
No problem, glad to hear that it works!
No, the first row is related to topic -1, then 0, then, 1, etc. So if you want to access the c-TF-IDF representations for topic 23, you will have to access
In general, it is not necessary to clean the text. However, like in your case, that does not mean that it will never be helpful. In your use case, it seems that stopwords are finding their way into the topic representations and I can definitely imagine not wanting them there. There are two ways of approaching this. First, you can indeed clean the text up a bit. It might negatively influence the clustering quality but I would not be too worried about that. This way, you are focusing on the text directly which influences both clustering and topic representations. Second, you can focus on only changing the topic representation through using a custom |
awesome, thanks for the suggestions! |
Hey, I saw this issue and I wanted to get the P(word|topic)
#144
You suggested accessing it using
model.c_tf_idf
, but I still need the words that were used to generate the sparse matrix.By looking at the source code, I saw where that's defined, but it doesn't seem easy to access.
Is there a "standard" way to get it?
Thanks!
The text was updated successfully, but these errors were encountered: