-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error calculating coherence score when using ngram range in vectorizor #441
Comments
Your code is quite difficult to read as only parts of it are generated as code blocks in markdown. Next time, I would advise parsing your code in markdown like this to display the entire codeblock:
Having said that, it might be interesting to look at the issue here where you can find a working version of calculating the coherence score with a larger n-gram value. From what I can see, you should use the |
Apologies...posting again for clarity. I will update you after taking your suggestion for build_analyzer. Thanks! ''' ValueError Traceback (most recent call last) Input In [16], in compute_coherence_values(start, step, limit, coherence_df) File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:215, in CoherenceModel.init(self, model, topics, texts, corpus, dictionary, window_size, keyed_vectors, coherence, topn, processes) File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:430, in CoherenceModel.topics(self, topics) File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:454, in CoherenceModel._ensure_elements_are_ids(self, topic) ValueError: unable to interpret topic as either a list of tokens or a list of ids ''' |
That worked for me! Thanks. |
Error Message:
ValueError Traceback (most recent call last)
Input In [17], in
----> 1 coherence_values, model_topic_list, model_probls_list = compute_coherence_values(start=10,step=5,limit=100)
Input In [16], in compute_coherence_values(start, step, limit, coherence_df)
51 topic_words = [[words for words, _ in topic_model.get_topic(topic)]
52 for topic in range(len(set(topics))-1)]
54 # Evaluate
---> 55 coherence_model = CoherenceModel(topics=topic_words,
56 texts=tokens,
57 corpus=corpus,
58 dictionary=dictionary,
59 coherence='c_v')
61 coherence = coherence_model.get_coherence()
64 #coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:215, in CoherenceModel.init(self, model, topics, texts, corpus, dictionary, window_size, keyed_vectors, coherence, topn, processes)
213 self._accumulator = None
214 self._topics = None
--> 215 self.topics = topics
217 self.processes = processes if processes >= 1 else max(1, mp.cpu_count() - 1)
File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:430, in CoherenceModel.topics(self, topics)
428 new_topics = []
429 for topic in topics:
--> 430 topic_token_ids = self._ensure_elements_are_ids(topic)
431 new_topics.append(topic_token_ids)
433 if self.model is not None:
File ~/.local/lib/python3.8/site-packages/gensim/models/coherencemodel.py:454, in CoherenceModel._ensure_elements_are_ids(self, topic)
452 return np.array(ids_from_ids)
453 else:
--> 454 raise ValueError('unable to interpret topic as either a list of tokens or a list of ids')
ValueError: unable to interpret topic as either a list of tokens or a list of ids
***********Code
import csv
from hdbscan import HDBSCAN
from bertopic import BERTopic
import gensim.corpora as corpora
from sklearn.feature_extraction.text import CountVectorizer
from gensim.models.coherencemodel import CoherenceModel
vectorizer_model = CountVectorizer(ngram_range=(1, 3), stop_words="english")
coherence_df = pd.DataFrame(columns=['min_cluster_size','coherence_score','num_of_topics'])
def compute_coherence_values(start=15,step=10,limit=205,coherence_df = coherence_df):
The text was updated successfully, but these errors were encountered: