Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: release unlocked lock" in numpy mtrand.RandomState.randint while training doc2vec #1311

Closed
yangkky opened this issue May 9, 2017 · 9 comments

Comments

@yangkky
Copy link

yangkky commented May 9, 2017

Description

I'm trying to train a doc2vec model on a set of protein sequences divided into k-mers. With some datasets/divisions into k-mers, the training completes successfully. However, sometimes, I get a RuntimeError:

2017-05-09 13:37:56,183 : INFO : PROGRESS: at 96.88% examples, 409235 words/s, in_qsize 15, out_qsize 0
Exception in thread Thread-34:
Traceback (most recent call last):
  File "/Users/kevinyang/anaconda/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/Users/kevinyang/anaconda/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/kevinyang/anaconda/lib/python3.5/site-packages/gensim/models/word2vec.py", line 822, in worker_loop
    tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
  File "/Users/kevinyang/anaconda/lib/python3.5/site-packages/gensim/models/doc2vec.py", line 717, in _do_train_job
    doctag_vectors=doctag_vectors, doctag_locks=doctag_locks)
  File "gensim/models/doc2vec_inner.pyx", line 428, in gensim.models.doc2vec_inner.train_document_dm (./gensim/models/doc2vec_inner.c:5444)
  File "mtrand.pyx", line 1266, in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:15836)
RuntimeError: release unlocked lock

Steps/Code/Corpus to Reproduce

Unfortunately, I don't know how to reproduce this on a smaller corpus. I've attached the training code (in a Jupyter notebook).
bug_report.zip

What would be the best way to attach the corpus? It is around 170 MB.

Versions

Darwin-16.5.0-x86_64-i386-64bit
Python 3.5.3 |Anaconda custom (x86_64)| (default, Mar  6 2017, 12:15:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.11.3
SciPy 0.18.1
gensim 1.0.1
FAST_VERSION 1
@menshikh-iv
Copy link
Contributor

Thank you @yangkky, you can use external file store service like dropbox, google disk or something else for share your corpus

@gojomo
Copy link
Collaborator

gojomo commented May 14, 2017

Note the actual error is occurring inside numpy code – and gensim's use of numpy on that doc2vec_inner.pyx line (https://github.com/RaRe-Technologies/gensim/blob/fdc01ab1ee350ce223ab9209e911352fba5d4290/gensim/models/doc2vec_inner.pyx#L428) is pretty straightforward. Even though it's in cython, we haven't yet entered a nogil section. That makes me suspect a numpy/anaconda bug rather than gensim. (Gensim isn't managing any locking/unlocking of the random object, and is calling it in an acceptable way, it seems whatever mess-up is happening with locking must be caused by non-gensim code.)

Does the problem recur with a later Numpy (0.12+), or with non-Anaconda Python 3.5.x?

@yangkky
Copy link
Author

yangkky commented May 15, 2017

@gojomo Do you mean Numpy 1.12+?

I'll try updating jupyter and see if it still fails. I'll also upload the corpus soon.

@gojomo
Copy link
Collaborator

gojomo commented May 15, 2017

Yes, sorry, I meant Numpy 1.12 or later. (Also beyond non-Anaconda Python, a later Anaconda could also be worth trying while the problem is still recurring.)

@yangkky
Copy link
Author

yangkky commented May 15, 2017

@yangkky
Copy link
Author

yangkky commented May 15, 2017

Upgrading to numpy 1.12.1 did not resolve the issue.

@gojomo gojomo changed the title RuntimeError while training doc2vec "RuntimeError: release unlocked lock" in numpy mtrand.RandomState.randint while training doc2vec May 15, 2017
@gojomo
Copy link
Collaborator

gojomo commented May 15, 2017

I recommend separately reporting to Numpy's issues; I don't see any error in how gensim is calling this method, and the error arises in that code, with regard to locks it manages.

@yangkky
Copy link
Author

yangkky commented May 15, 2017

@gojomo ok I will do that.

@piskvorky
Copy link
Owner

From the discussion in that numpy ticket, the error seems to be somehow related to multiprocessing and numpy.

Closing here as unrelated -- please let us know if there's anything we could do on our side @yangkky .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants