-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception: cands
must be an iterable of tomotopy.label.Candidate
#40
Comments
I don't know if it is relevant, but I sometimes get a segmentation fault from the
I tried to debug more with the EDIT:
|
Hello @eyseman, thank you for your interest. |
I'm having the same issue on a Mac, with version 0.7.0 and python 3.7. I'm running exactly this code (slightly adapted from the example): import tomotopy as tp
import nltk
nltk.download("stopwords")
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
input_file = "shakespeare_sonnets.txt"
stemmer = PorterStemmer()
stopwords = set(stopwords.words('english'))
corpus = tp.utils.Corpus(tokenizer=tp.utils.SimpleTokenizer(stemmer=stemmer.stem),
stopwords=lambda x: len(x) <= 2 or x in stopwords)
# data_feeder yields a tuple of (raw string, user data) or a str (raw string)
corpus.process(open(input_file, encoding='utf-8'))
# make LDA model and train
mdl = tp.LDAModel(k=20, min_cf=10, min_df=5, corpus=corpus)
mdl.train(0)
print('Num docs:', len(mdl.docs), ', Vocab size:', mdl.num_vocabs, ', Num words:', mdl.num_words)
print('Removed top words:', mdl.removed_top_words)
bar = tqdm.trange(0, 10000, 10)
for i in bar:
mdl.train(10)
bar.set_description('Log-likelihood: {:5.3f}'.format(i, mdl.ll_per_word)) Up to here it runs without issues. Here is where I have problems: extractor = tp.label.PMIExtractor(min_cf=10, min_df=5, max_len=5, max_cand=10000)
cands = extractor.extract(mdl)
labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25) The exception is the same as the one in the title: ---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-10-d3b5d2594c0c> in <module>
3 cands = extractor.extract(mdl)
4
----> 5 labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25)
6 # for k in range(mdl.k):
7 # print("== Topic #{} ==".format(k))
Exception: `cands` must be an iterable of `tomotopy.label.Candidate` Then a very strange thing happens. If I print > print(cands)
[<tomotopy.label.Candidate object at 0x1325bbcb0>, <tomotopy.label.C
andidate object at 0x1325bbda0>, <tomotopy.label.Candidate object at 0x1325bbad0>, ...] all items in the list are similar. If I ask python what are their types, the answer is > {type(cand) for cand in cands}
{tomotopy.label.Candidate} But if I test if they are instances of > {isinstance(cand, tp.label.Candidate) for cand in cands}
{False} Looks like a strange bug involving inheritance. |
@rcalsaverini Thank you for providing detailed information. Apparently there is a problem with the type check. I'll investigate the cause. |
The issue seems to be caused by dynamic loading of binary for instruction set architecture in macOS. The issue will be fixed in the next version, and until then, the following solutions are available:
|
@bab2min
and then reloading the package |
This issue was fixed at 0.8.0 version. |
The example for LDA labelling ( def corpus_and_labeling_example() ) is not working properly.
The text was updated successfully, but these errors were encountered: