Topic-word matrix #144

gouthamgithub · 2021-06-14T14:26:02Z

Thank you for Bertopic, I have recently started experimenting with this model.

I am trying to evaluate the diversity of topics using Kullback-Leibler Divergence metric. Is there a way I can get topic-word distribution? (the distribution of the words of the vocabulary for each topic: dimensions: |num topics| x |vocabulary|).

MaartenGr · 2021-06-16T04:59:38Z

Sorry for the late response!

Although there are topic-word distributions, they actually do not fully represent the quality of the generated topic representations. What happenings is the following. After generating topic clusters, c-TF-IDF is used to generate the most interesting words per topic compared to all other topics. The resulting c-TF-IDF matrix can then be used to get the topic-word distribution.

However! After generating the c-TF-IDF matrix, MMR is used to not only diversify the topic representations but also make sure that the coherence of the topic representation increases. This means that the c-TF-IDF then is not fully representative of the topic representation. Moreover, we only extract the top n words from the c-TF-IDF matrix, all of them would make little sense in the context of MMR.

Having said all that, if you want to extract the topic-word distribution purely based on the c-TF-IDF (which isn't the true quality of BERTopic), then I would advise you to do access the topic-word distribution with model.c_tf_idf.

gouthamgithub · 2021-06-16T12:29:20Z

Understood, thank you so much!

firmai · 2021-06-16T16:13:08Z

I am having a hard time mapping the probabilities to the topics, it's one less row, but I don't understand the ordering. Is the probabilities in order of frequency or number? For example, in the below I am trying to find the covid documents, but fail to do so -- if you have no quick advice, I can produce a sample

firmai · 2021-06-16T17:08:03Z

I should specify my purpose here: unlike Top2Vec your solution allows me to search for any keyword, which returns a topic, and how great would it be if I can use that topic outcomes to bring back a list of documents. It would be great if some makeshift piece of code can be developed for that purpose.

firmai mentioned this issue Jun 17, 2021

Probabilities FAQ (and topic outcomes) #146

Closed

MaartenGr closed this as completed Jun 23, 2021

sgdantas mentioned this issue Nov 17, 2021

Is there a way to retrieve the words used to generate the tf-idf? #331

Closed

juli-sch mentioned this issue Mar 1, 2022

About Coherence of topic models #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic-word matrix #144

Topic-word matrix #144

gouthamgithub commented Jun 14, 2021

MaartenGr commented Jun 16, 2021

gouthamgithub commented Jun 16, 2021

firmai commented Jun 16, 2021

firmai commented Jun 16, 2021

Topic-word matrix #144

Topic-word matrix #144

Comments

gouthamgithub commented Jun 14, 2021

MaartenGr commented Jun 16, 2021

gouthamgithub commented Jun 16, 2021

firmai commented Jun 16, 2021

firmai commented Jun 16, 2021