Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error releasing un-acquired lock in dask #22

Closed
ghost opened this issue Aug 15, 2017 · 10 comments · Fixed by #50
Closed

Error releasing un-acquired lock in dask #22

ghost opened this issue Aug 15, 2017 · 10 comments · Fixed by #50
Labels

Comments

@ghost
Copy link

ghost commented Aug 15, 2017

Was distributed (1.18.0) when this error occurred. Changed to distributed (1.16.3).

  File "/Applications/anaconda/envs/my_pycalphad/bin/espei", line 11, in <module>
    sys.exit(main())
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/run_espei.py", line 135, in main
    mcmc_steps=args.mcmc_steps, save_interval=args.save_interval)
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/paramselect.py", line 754, in fit
    for i, result in enumerate(sampler.sample(walkers, iterations=mcmc_steps)):
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 259, in sample
    lnprob[S0])
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 332, in _propose_stretch
    newlnprob, blob = self._get_lnprob(q)
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 382, in _get_lnprob
    results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in map
    result = [x.result() for x in result]
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in <listcomp>
    result = [x.result() for x in result]
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/client.py", line 155, in result
    six.reraise(*result)
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
RuntimeError: cannot release un-acquired lock```
@bocklund
Copy link
Member

@Olivia-Higgins hasn't tested 1.16.3, but I am successfully using this on my machine with dask 0.15.1.

I think that possibly something between 1.16.3 and 1.18.0 changed and our ImmediateClient might need to be less Immediate.

@bocklund
Copy link
Member

Also cloudpickle is also reportedly seeing some issues also (dask/distributed#1254).

I am running 0.2.2 on my machine and @Olivia-Higgins was running 0.3.1

@bocklund bocklund added the bug label Aug 15, 2017
@bocklund
Copy link
Member

bocklund commented Sep 8, 2017

I have been able to reproduce this with a fresh install on Windows. A known workaround is to use ESPEI in MPI mode. For local machines, it is easiest to install MPI binaries. The suggested known working methods for installing MPI binaries are:

@bocklund bocklund changed the title Distributed Version Error releasing un-acquired lock in dask Sep 8, 2017
@bocklund
Copy link
Member

bocklund commented Sep 8, 2017

I matched the requirements on another machine to exactly mine that are known working using a pip freeze and installing from those. The issue still came up.

I'm thinking it must be something in the lazy ImmediateClient. If a less complicated solution is not apparent, then an alternative might be to allow locks to be serialized and pickled.

bocklund added a commit that referenced this issue Dec 9, 2017
…hem with x.result()

This attempts to fix the lock error in gh-22
bocklund added a commit that referenced this issue Dec 9, 2017
…hem with x.result()

This attempts to fix the lock error in gh-22
@bocklund
Copy link
Member

I finally hit this issue on my Mac.

Traceback (most recent call last):
  File "/Users/brandon/anaconda3/envs/espei/bin/espei", line 11, in <module>
    load_entry_point('espei', 'console_scripts', 'espei')()
  File "/Users/brandon/Projects/espei/espei/espei_script.py", line 237, in main
    run_espei(input_settings)
  File "/Users/brandon/Projects/espei/espei/espei_script.py", line 189, in run_espei
    deterministic=deterministic,
  File "/Users/brandon/Projects/espei/espei/mcmc.py", line 567, in mcmc_fit
    for i, result in enumerate(sampler.sample(walkers, iterations=mcmc_steps)):
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 259, in sample
    lnprob[S0])
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 332, in _propose_stretch
    newlnprob, blob = self._get_lnprob(q)
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 382, in _get_lnprob
    results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
  File "/Users/brandon/Projects/espei/espei/utils.py", line 43, in map
    result = [x.result() for x in result]
  File "/Users/brandon/Projects/espei/espei/utils.py", line 43, in <listcomp>
    result = [x.result() for x in result]
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/distributed/client.py", line 158, in result
    six.reraise(*result)
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
    return pickle.loads(x)
RuntimeError: cannot release un-acquired lock

This is on ESPEI commit 5acb606 (on the single phase lnprob branch) running on 7 cores. I can try to merge #41 into that branch and see if I can reproduce the issue with the same settings.

@richardotis
Copy link
Collaborator

Can ImmediateClient be removed?

@richardotis
Copy link
Collaborator

Okay, so emcee needs a client which implements a map function. What if the implementation of ImmediateClient was changed to this:

class ImmediateClient(Client):
    map = Client.gather

@bocklund
Copy link
Member

bocklund commented Mar 2, 2018

emcee needs amap and expects that map to be blocking (i.e. returning values not futures).

gather collects the results from the async map and using that is essentially the change suggested by #41. I thought I was able to reproduce this error with #41 and that's why I didn't pursue it further, but I didn't keep a comment there to remind future me.

@bocklund
Copy link
Member

bocklund commented Mar 2, 2018

Per dask/distributed#780 and dask/dask#2096 this could also be an issue with logging module. This might make the most sense because logging is global and I could imagine that there is a lock that prevents simultaneous access.

Possible solution then is to figure out how to get around this (possibly with serializable locks? Not sure what the implementation of logging is) or by working around by implementing our own 'logging' levels with print statements and verbosity passed around as a kwarg.

@bocklund
Copy link
Member

bocklund commented Mar 18, 2018

@richardotis suggested this is possibly contributed to by locks in SymPy and that removing the fastcache package may be effective.

bocklund added a commit that referenced this issue Mar 20, 2018
* Results in significant performance boost (30-50% in Cu-Mg) and ability to fit arbitrary models with MCMC. 
* We explicitly compile and pass callables around, which also improves performance especially in tight equilibrium for loops.
* Use pycalphad Species, so pycalphad 0.7 or later is a hard requirement.
* Non-dask/distributed schedulers are deprecated due to requirements in Pickleing SymPy objects. Work stealing must also be turned off by users in dask, however removing the fastcache package may be an alternate solution. This leads to breaking changes in the input files where we only support 'dask', 'None', and a path to a distributed 'scheduler.json' file.

Closes #40, #22, #53 

* First pass at utilities for using the Model class

* Cleanup of model class utilities

* Move eqcallables above building phase models

* Update eq_callables to not use the default dicts when not needed, pass models dict around

* WIP debugging with time

* FIX: Don't pass real symbol values to Model or PhaseRecord

* WIP: Update lnprob test

* WIP: Turn off parameter printing

* ENH: Add message that building phase models may take time

* MAINT: Remove timing code from lnprob

* ENH: Remove dask delayed dbf and phase models

* FIX: Make get_prop_samples compatible with Species

* ENH: automatically wrap param_symbols in eq_callables_dict as Symbols from strings

* FIX: Fix passing callables to calculate for single phase error

* WIP: comment out some test code illustrating the multiproessing hack

* WIP: Debug logging distributed client

* WIP: force dask work stealing extension to be turned off.

Work-stealing breaks SymPy code.

Closes #53

* Add back the logging to dask workers

* Add lnprob times to debug

* Add some documentation of multi phase fit, remove dask delayed

* Add basic docs to estimate_hyperplane

* ENH: Enable passing scheduler.json files as inputs

Deprecates emcee and MPIPool.
Everything runs through dask because cloudpickle is used and emcee multiprocessing and MPIPool can not use cloudpickle or custom serializers.

* ENH: evaluate=False for Piecewise in refdata

Huge performance bump, especially on SymPy 1.1.2dev versions where there were regressions.

Will make the espei script much faster.

* MAINT: Add work-stealing checks to dask scheduler with nice error message

* MAINT: Bump pycalphad requirement to 0.7

* DOC: Update documentation for MPI schedulers

* DOC: Update Mocking in docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants