Exception: `cands` must be an iterable of `tomotopy.label.Candidate` #40

eyseman · 2020-04-11T10:35:37Z

The example for LDA labelling ( def corpus_and_labeling_example() ) is not working properly.

g3rfx · 2020-04-13T09:19:33Z

I don't know if it is relevant, but I sometimes get a segmentation fault from the extract method with trained HDPModel and CTModel :

    extractor = tp.label.PMIExtractor(min_cf=10, min_df=5, max_len=5, max_cand=10000)
    cands = extractor.extract(mdl)

Fatal Python error: Segmentation fault
Current thread 0x00007fbd1009b740 (most recent call first)
...
Segmentation fault (core dumped)

I tried to debug more with the faulthandler module of Python 3, but I cannot get a more detailed output.

EDIT:
Here is the stacktrace using gdb. I hope it helps:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffc74cec6d in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
(gdb) backtrace
#0  0x00007fffc74cec6d in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#1  0x00007fffc74cec8b in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#2  0x00007fffc74cfbcf in tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#3  0x00007fffc7079dd8 in ExtractorObject::extract(ExtractorObject*, _object*, _object*) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#4  0x00005555556654f4 in _PyCFunction_FastCallDict () at /tmp/build/80754af9/python_1578429706181/work/Objects/methodobject.c:231
#5  0x00005555556ecdac in call_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4851
#6  0x000055555570f66a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:3335
#7  0x00005555556e6ebb in _PyFunction_FastCall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4933
#8  fast_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4968
#9  0x00005555556ece85 in call_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4872
#10 0x000055555570f66a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:3335
#11 0x00005555556e7c09 in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=0x0, argcount=0, args=0x0,
    locals=0x7ffff7f55120, globals=0x7ffff7f55120, _co=0x7ffff6aaba50) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4166
#12 PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4187
#13 0x00005555556e89ac in PyEval_EvalCode (co=co@entry=0x7ffff6aaba50, globals=globals@entry=0x7ffff7f55120, locals=locals@entry=0x7ffff7f55120) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:731
#14 0x0000555555768c64 in run_mod () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:1025
#15 0x0000555555769061 in PyRun_FileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:978
#16 0x0000555555769263 in PyRun_SimpleFileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:419
#17 0x000055555576936d in PyRun_AnyFileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:81
#18 0x000055555576cd53 in run_file (p_cf=0x7fffffffdddc, filename=0x5555558a76c0 L"gdpr_topic_modelling.py", fp=0x5555558f5110) at /tmp/build/80754af9/python_1578429706181/work/Modules/main.c:340
#19 Py_Main () at /tmp/build/80754af9/python_1578429706181/work/Modules/main.c:811
#20 0x00005555556373be in main () at /tmp/build/80754af9/python_1578429706181/work/Programs/python.c:69
#21 0x00007ffff77e6b97 in __libc_start_main (main=0x5555556372d0 <main>, argc=2, argv=0x7fffffffdfe8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdfd8) at ../csu/libc-start.c:310
#22 0x0000555555716084 in _start () at ../sysdeps/x86_64/elf/start.S:103

bab2min · 2020-04-13T16:05:38Z

@g3rfx Thanks for reporting a bug and sharing details, but I think the problem you reported is different from this issue, so I open another issue(#41 ) for the problem.

bab2min · 2020-04-13T16:12:52Z

Hello @eyseman, thank you for your interest.
I've tested corpus_and_labeling_example() in several environments including Windows & Linux, but it is hard to reproduce the exception. Could you give more details about your problem?
I guess the variable cands have something wrong value. Can you print cands before initializing FoRelevance instance?

rcalsaverini · 2020-04-21T20:56:52Z

I'm having the same issue on a Mac, with version 0.7.0 and python 3.7.

I'm running exactly this code (slightly adapted from the example):

import tomotopy as tp

import nltk
nltk.download("stopwords")
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords

input_file = "shakespeare_sonnets.txt"

stemmer = PorterStemmer()
stopwords = set(stopwords.words('english'))
corpus = tp.utils.Corpus(tokenizer=tp.utils.SimpleTokenizer(stemmer=stemmer.stem), 
    stopwords=lambda x: len(x) <= 2 or x in stopwords)
# data_feeder yields a tuple of (raw string, user data) or a str (raw string)
corpus.process(open(input_file, encoding='utf-8'))
# make LDA model and train
mdl = tp.LDAModel(k=20, min_cf=10, min_df=5, corpus=corpus)
mdl.train(0)
print('Num docs:', len(mdl.docs), ', Vocab size:', mdl.num_vocabs, ', Num words:', mdl.num_words)
print('Removed top words:', mdl.removed_top_words)
bar = tqdm.trange(0, 10000, 10)
for i in bar:
    mdl.train(10)
    bar.set_description('Log-likelihood: {:5.3f}'.format(i, mdl.ll_per_word))

Up to here it runs without issues. Here is where I have problems:

extractor = tp.label.PMIExtractor(min_cf=10, min_df=5, max_len=5, max_cand=10000)
cands = extractor.extract(mdl)

labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25)

The exception is the same as the one in the title:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-10-d3b5d2594c0c> in <module>
      3 cands = extractor.extract(mdl)
      4 
----> 5 labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25)
      6 # for k in range(mdl.k):
      7 #     print("== Topic #{} ==".format(k))

Exception: `cands` must be an iterable of `tomotopy.label.Candidate`

Then a very strange thing happens. If I print cands, I get:

> print(cands)
[<tomotopy.label.Candidate object at 0x1325bbcb0>, <tomotopy.label.C
andidate object at 0x1325bbda0>, <tomotopy.label.Candidate object at 0x1325bbad0>, ...]

all items in the list are similar. If I ask python what are their types, the answer is tomotopy.label.Candidate for everyone:

> {type(cand) for cand in cands}
{tomotopy.label.Candidate}

But if I test if they are instances of tomotopy.label.Candidate, it fails for everyone:

> {isinstance(cand, tp.label.Candidate) for cand in cands}
{False}

Looks like a strange bug involving inheritance.

bab2min · 2020-04-26T09:02:07Z

@rcalsaverini Thank you for providing detailed information. Apparently there is a problem with the type check. I'll investigate the cause.

bab2min · 2020-05-07T03:15:32Z

The issue seems to be caused by dynamic loading of binary for instruction set architecture in macOS. The issue will be fixed in the next version, and until then, the following solutions are available:

Run python with environment variable export TOMOTOPY_ISA=none
The environment variable TOMOTOPY_ISA=none disables dynamic binary loading of tomotopy, but it may slow down the executable since it forces tomotopy to run without SIMD instruction sets.
Or you can install tomotopy by compiling from source codes using pip:
pip install --no-binary tomotopy tomotopy
Compiling from source codes will take a long time. Since tomotopy built by source code compiling uses static loading of binary for isa, it can avoid to slow down.

ecoronado92 · 2020-05-20T00:39:29Z

@bab2min
If you're in a Jupyter Notebook I found the following to work as well using some magics % (%set_env or %env)

%env TOMOTOPY_ISA=none

and then reloading the package

bab2min · 2020-06-06T10:19:33Z

This issue was fixed at 0.8.0 version.

bab2min added the bug Something isn't working label Apr 11, 2020

bab2min mentioned this issue Apr 13, 2020

segmentation fault from the extract method with trained HDPModel and CTModel : #41

Closed

bab2min added this to In Progress in Future development plans May 31, 2020

bab2min closed this as completed Jun 6, 2020

Future development plans automation moved this from In Progress to Done Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: `cands` must be an iterable of `tomotopy.label.Candidate` #40

Exception: `cands` must be an iterable of `tomotopy.label.Candidate` #40

eyseman commented Apr 11, 2020

g3rfx commented Apr 13, 2020 •

edited

Loading

bab2min commented Apr 13, 2020

bab2min commented Apr 13, 2020 •

edited

Loading

rcalsaverini commented Apr 21, 2020 •

edited

Loading

bab2min commented Apr 26, 2020

bab2min commented May 7, 2020 •

edited

Loading

ecoronado92 commented May 20, 2020

bab2min commented Jun 6, 2020

Exception: cands must be an iterable of tomotopy.label.Candidate #40

Exception: cands must be an iterable of tomotopy.label.Candidate #40

Comments

eyseman commented Apr 11, 2020

g3rfx commented Apr 13, 2020 • edited Loading

bab2min commented Apr 13, 2020

bab2min commented Apr 13, 2020 • edited Loading

rcalsaverini commented Apr 21, 2020 • edited Loading

bab2min commented Apr 26, 2020

bab2min commented May 7, 2020 • edited Loading

ecoronado92 commented May 20, 2020

bab2min commented Jun 6, 2020

Exception: `cands` must be an iterable of `tomotopy.label.Candidate` #40

Exception: `cands` must be an iterable of `tomotopy.label.Candidate` #40

g3rfx commented Apr 13, 2020 •

edited

Loading

bab2min commented Apr 13, 2020 •

edited

Loading

rcalsaverini commented Apr 21, 2020 •

edited

Loading

bab2min commented May 7, 2020 •

edited

Loading