Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments seem stuck when acquiring lock #273

Closed
kmkurn opened this issue Apr 26, 2018 · 5 comments
Closed

Experiments seem stuck when acquiring lock #273

kmkurn opened this issue Apr 26, 2018 · 5 comments

Comments

@kmkurn
Copy link

kmkurn commented Apr 26, 2018

Hi,

Firstly, I want to thank you for your amazing work. Sacred really, really helps me organizing and analyzing my experiments. Love it sooo much!

When I run several experiments in a row though, sometimes Sacred got stuck after an experiment is finished. Last time I waited for hours but it was still stuck. I had to manually press CTRL-C for the next experiment to start. And when I did that, this traceback occurred:

^CException ignored in: <module 'threading' from '/Users/kemal/.pyenv/versions/miniconda3-latest/envs/id-pos-tagging/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/Users/kemal/.pyenv/versions/miniconda3-latest/envs/id-pos-tagging/lib/python3.6/threading.py", line 1294, in _shutdown
    t.join()
  File "/Users/kemal/.pyenv/versions/miniconda3-latest/envs/id-pos-tagging/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/Users/kemal/.pyenv/versions/miniconda3-latest/envs/id-pos-tagging/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

Also, the metrics for that run weren't saved. This always happens after ~7 experiments in a row. I am running and storing my experiments locally on my macOS 10.13. I am using Sacred 0.7.2, Pymongo 3.4.0, and Mongodb 3.6.4. I'm probably wrong but this might have something to do with macOS because I had no such issues when running on Ubuntu.

@kmkurn
Copy link
Author

kmkurn commented Apr 30, 2018

UPDATE:

I was wrong, the metrics were saved. And it has nothing to do with macOS because now I am having the same problem when running on Ubuntu (both the experiments and MongoDB, but different machines). Also, I am using Python 3.6.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Apr 30, 2018

Ugh, this is an ugly one, and it is going to be difficult to track down. Just to be sure, could you check if the problem persists with the current master? Because I've since changed the threading behavior to fix some other problem.

pip install git+https://github.com/IDSIA/Sacred.git@master

@Qwlouse Qwlouse added the bug label Apr 30, 2018
@kmkurn
Copy link
Author

kmkurn commented May 1, 2018

Interesting. It seems the issue is fixed in master. Do you have any guess what the cause might be? Also, if I switch to master from 0.7.2 from PyPI, what are the major differences? I'm in the middle of experimentation so I don't want to have potential unfair comparisons among my experiments.

@Qwlouse Qwlouse added this to the Bugfix Release 0.7.3 milestone May 6, 2018
@Qwlouse
Copy link
Collaborator

Qwlouse commented May 6, 2018

Yes. I had a bad way of dealing with the heartbeat thread, such that it would sometimes not exit when the experiment finished. This is probably also where it got stuck in your case.

I just released 0.7.3 on pypi, so you can now just pip install the new version. It is mostly a bugfix release (release notes), so it shouldn't affect your experiments much ;-).

@Qwlouse Qwlouse closed this as completed May 6, 2018
@talesa
Copy link

talesa commented May 10, 2018

I have 0.7.3 and sadly running into the same issue.

kmkurn added a commit to kmkurn/id-pos-tagging that referenced this issue May 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants