Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: cands must be an iterable of tomotopy.label.Candidate #40

Closed
eyseman opened this issue Apr 11, 2020 · 8 comments
Closed

Exception: cands must be an iterable of tomotopy.label.Candidate #40

eyseman opened this issue Apr 11, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@eyseman
Copy link

eyseman commented Apr 11, 2020

The example for LDA labelling ( def corpus_and_labeling_example() ) is not working properly.

@bab2min bab2min added the bug Something isn't working label Apr 11, 2020
@g3rfx
Copy link

g3rfx commented Apr 13, 2020

I don't know if it is relevant, but I sometimes get a segmentation fault from the extract method with trained HDPModel and CTModel :

    extractor = tp.label.PMIExtractor(min_cf=10, min_df=5, max_len=5, max_cand=10000)
    cands = extractor.extract(mdl)
Fatal Python error: Segmentation fault
Current thread 0x00007fbd1009b740 (most recent call first)
...
Segmentation fault (core dumped)

I tried to debug more with the faulthandler module of Python 3, but I cannot get a more detailed output.

EDIT:
Here is the stacktrace using gdb. I hope it helps:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffc74cec6d in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
(gdb) backtrace
#0  0x00007fffc74cec6d in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#1  0x00007fffc74cec8b in tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >* tomoto::TrieEx<unsigned int, unsigned long, tomoto::ConstAccess<std::map<unsigned int, int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, int> > > > >::makeNext<tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&>(unsigned int const&, tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const::{lambda()#1}&) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#2  0x00007fffc74cfbcf in tomoto::label::PMIExtractor::extract(tomoto::ITopicModel const*) const () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#3  0x00007fffc7079dd8 in ExtractorObject::extract(ExtractorObject*, _object*, _object*) () from /home/henry/anaconda3/envs/gdpr/lib/python3.6/site-packages/_tomotopy.cpython-36m-x86_64-linux-gnu.so
#4  0x00005555556654f4 in _PyCFunction_FastCallDict () at /tmp/build/80754af9/python_1578429706181/work/Objects/methodobject.c:231
#5  0x00005555556ecdac in call_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4851
#6  0x000055555570f66a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:3335
#7  0x00005555556e6ebb in _PyFunction_FastCall (globals=<optimized out>, nargs=2, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4933
#8  fast_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4968
#9  0x00005555556ece85 in call_function () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4872
#10 0x000055555570f66a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:3335
#11 0x00005555556e7c09 in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=0x0, argcount=0, args=0x0,
    locals=0x7ffff7f55120, globals=0x7ffff7f55120, _co=0x7ffff6aaba50) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4166
#12 PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:4187
#13 0x00005555556e89ac in PyEval_EvalCode (co=co@entry=0x7ffff6aaba50, globals=globals@entry=0x7ffff7f55120, locals=locals@entry=0x7ffff7f55120) at /tmp/build/80754af9/python_1578429706181/work/Python/ceval.c:731
#14 0x0000555555768c64 in run_mod () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:1025
#15 0x0000555555769061 in PyRun_FileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:978
#16 0x0000555555769263 in PyRun_SimpleFileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:419
#17 0x000055555576936d in PyRun_AnyFileExFlags () at /tmp/build/80754af9/python_1578429706181/work/Python/pythonrun.c:81
#18 0x000055555576cd53 in run_file (p_cf=0x7fffffffdddc, filename=0x5555558a76c0 L"gdpr_topic_modelling.py", fp=0x5555558f5110) at /tmp/build/80754af9/python_1578429706181/work/Modules/main.c:340
#19 Py_Main () at /tmp/build/80754af9/python_1578429706181/work/Modules/main.c:811
#20 0x00005555556373be in main () at /tmp/build/80754af9/python_1578429706181/work/Programs/python.c:69
#21 0x00007ffff77e6b97 in __libc_start_main (main=0x5555556372d0 <main>, argc=2, argv=0x7fffffffdfe8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdfd8) at ../csu/libc-start.c:310
#22 0x0000555555716084 in _start () at ../sysdeps/x86_64/elf/start.S:103

@bab2min
Copy link
Owner

bab2min commented Apr 13, 2020

@g3rfx Thanks for reporting a bug and sharing details, but I think the problem you reported is different from this issue, so I open another issue(#41 ) for the problem.

@bab2min
Copy link
Owner

bab2min commented Apr 13, 2020

Hello @eyseman, thank you for your interest.
I've tested corpus_and_labeling_example() in several environments including Windows & Linux, but it is hard to reproduce the exception. Could you give more details about your problem?
I guess the variable cands have something wrong value. Can you print cands before initializing FoRelevance instance?

@rcalsaverini
Copy link

rcalsaverini commented Apr 21, 2020

I'm having the same issue on a Mac, with version 0.7.0 and python 3.7.

I'm running exactly this code (slightly adapted from the example):

import tomotopy as tp

import nltk
nltk.download("stopwords")
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords

input_file = "shakespeare_sonnets.txt"

stemmer = PorterStemmer()
stopwords = set(stopwords.words('english'))
corpus = tp.utils.Corpus(tokenizer=tp.utils.SimpleTokenizer(stemmer=stemmer.stem), 
    stopwords=lambda x: len(x) <= 2 or x in stopwords)
# data_feeder yields a tuple of (raw string, user data) or a str (raw string)
corpus.process(open(input_file, encoding='utf-8'))
# make LDA model and train
mdl = tp.LDAModel(k=20, min_cf=10, min_df=5, corpus=corpus)
mdl.train(0)
print('Num docs:', len(mdl.docs), ', Vocab size:', mdl.num_vocabs, ', Num words:', mdl.num_words)
print('Removed top words:', mdl.removed_top_words)
bar = tqdm.trange(0, 10000, 10)
for i in bar:
    mdl.train(10)
    bar.set_description('Log-likelihood: {:5.3f}'.format(i, mdl.ll_per_word))

Up to here it runs without issues. Here is where I have problems:

extractor = tp.label.PMIExtractor(min_cf=10, min_df=5, max_len=5, max_cand=10000)
cands = extractor.extract(mdl)

labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25)

The exception is the same as the one in the title:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-10-d3b5d2594c0c> in <module>
      3 cands = extractor.extract(mdl)
      4 
----> 5 labeler = tp.label.FoRelevance(mdl, cands, min_df=5, smoothing=1e-2, mu=0.25)
      6 # for k in range(mdl.k):
      7 #     print("== Topic #{} ==".format(k))

Exception: `cands` must be an iterable of `tomotopy.label.Candidate`

Then a very strange thing happens. If I print cands, I get:

> print(cands)
[<tomotopy.label.Candidate object at 0x1325bbcb0>, <tomotopy.label.C
andidate object at 0x1325bbda0>, <tomotopy.label.Candidate object at 0x1325bbad0>, ...]

all items in the list are similar. If I ask python what are their types, the answer is tomotopy.label.Candidate for everyone:

> {type(cand) for cand in cands}
{tomotopy.label.Candidate}

But if I test if they are instances of tomotopy.label.Candidate, it fails for everyone:

> {isinstance(cand, tp.label.Candidate) for cand in cands}
{False}

Looks like a strange bug involving inheritance.

@bab2min
Copy link
Owner

bab2min commented Apr 26, 2020

@rcalsaverini Thank you for providing detailed information. Apparently there is a problem with the type check. I'll investigate the cause.

@bab2min
Copy link
Owner

bab2min commented May 7, 2020

The issue seems to be caused by dynamic loading of binary for instruction set architecture in macOS. The issue will be fixed in the next version, and until then, the following solutions are available:

  1. Run python with environment variable export TOMOTOPY_ISA=none
    The environment variable TOMOTOPY_ISA=none disables dynamic binary loading of tomotopy, but it may slow down the executable since it forces tomotopy to run without SIMD instruction sets.

  2. Or you can install tomotopy by compiling from source codes using pip:
    pip install --no-binary tomotopy tomotopy
    Compiling from source codes will take a long time. Since tomotopy built by source code compiling uses static loading of binary for isa, it can avoid to slow down.

@ecoronado92
Copy link

@bab2min
If you're in a Jupyter Notebook I found the following to work as well using some magics % (%set_env or %env)

%env TOMOTOPY_ISA=none

and then reloading the package

@bab2min bab2min added this to In Progress in Future development plans May 31, 2020
@bab2min
Copy link
Owner

bab2min commented Jun 6, 2020

This issue was fixed at 0.8.0 version.

@bab2min bab2min closed this as completed Jun 6, 2020
Future development plans automation moved this from In Progress to Done Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

5 participants