New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error releasing un-acquired lock in dask #22
Comments
@Olivia-Higgins hasn't tested 1.16.3, but I am successfully using this on my machine with I think that possibly something between 1.16.3 and 1.18.0 changed and our ImmediateClient might need to be less Immediate. |
Also cloudpickle is also reportedly seeing some issues also (dask/distributed#1254). I am running 0.2.2 on my machine and @Olivia-Higgins was running 0.3.1 |
I have been able to reproduce this with a fresh install on Windows. A known workaround is to use ESPEI in MPI mode. For local machines, it is easiest to install MPI binaries. The suggested known working methods for installing MPI binaries are:
|
I matched the requirements on another machine to exactly mine that are known working using a I'm thinking it must be something in the lazy ImmediateClient. If a less complicated solution is not apparent, then an alternative might be to allow locks to be serialized and pickled. |
…hem with x.result() This attempts to fix the lock error in gh-22
…hem with x.result() This attempts to fix the lock error in gh-22
I finally hit this issue on my Mac. Traceback (most recent call last):
File "/Users/brandon/anaconda3/envs/espei/bin/espei", line 11, in <module>
load_entry_point('espei', 'console_scripts', 'espei')()
File "/Users/brandon/Projects/espei/espei/espei_script.py", line 237, in main
run_espei(input_settings)
File "/Users/brandon/Projects/espei/espei/espei_script.py", line 189, in run_espei
deterministic=deterministic,
File "/Users/brandon/Projects/espei/espei/mcmc.py", line 567, in mcmc_fit
for i, result in enumerate(sampler.sample(walkers, iterations=mcmc_steps)):
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 259, in sample
lnprob[S0])
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 332, in _propose_stretch
newlnprob, blob = self._get_lnprob(q)
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/emcee/ensemble.py", line 382, in _get_lnprob
results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
File "/Users/brandon/Projects/espei/espei/utils.py", line 43, in map
result = [x.result() for x in result]
File "/Users/brandon/Projects/espei/espei/utils.py", line 43, in <listcomp>
result = [x.result() for x in result]
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/distributed/client.py", line 158, in result
six.reraise(*result)
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/Users/brandon/anaconda3/envs/espei/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
RuntimeError: cannot release un-acquired lock This is on ESPEI commit 5acb606 (on the single phase lnprob branch) running on 7 cores. I can try to merge #41 into that branch and see if I can reproduce the issue with the same settings. |
Can |
Okay, so class ImmediateClient(Client):
map = Client.gather |
|
Per dask/distributed#780 and dask/dask#2096 this could also be an issue with logging module. This might make the most sense because logging is global and I could imagine that there is a lock that prevents simultaneous access. Possible solution then is to figure out how to get around this (possibly with serializable locks? Not sure what the implementation of logging is) or by working around by implementing our own 'logging' levels with print statements and verbosity passed around as a kwarg. |
@richardotis suggested this is possibly contributed to by locks in SymPy and that removing the fastcache package may be effective. |
* Results in significant performance boost (30-50% in Cu-Mg) and ability to fit arbitrary models with MCMC. * We explicitly compile and pass callables around, which also improves performance especially in tight equilibrium for loops. * Use pycalphad Species, so pycalphad 0.7 or later is a hard requirement. * Non-dask/distributed schedulers are deprecated due to requirements in Pickleing SymPy objects. Work stealing must also be turned off by users in dask, however removing the fastcache package may be an alternate solution. This leads to breaking changes in the input files where we only support 'dask', 'None', and a path to a distributed 'scheduler.json' file. Closes #40, #22, #53 * First pass at utilities for using the Model class * Cleanup of model class utilities * Move eqcallables above building phase models * Update eq_callables to not use the default dicts when not needed, pass models dict around * WIP debugging with time * FIX: Don't pass real symbol values to Model or PhaseRecord * WIP: Update lnprob test * WIP: Turn off parameter printing * ENH: Add message that building phase models may take time * MAINT: Remove timing code from lnprob * ENH: Remove dask delayed dbf and phase models * FIX: Make get_prop_samples compatible with Species * ENH: automatically wrap param_symbols in eq_callables_dict as Symbols from strings * FIX: Fix passing callables to calculate for single phase error * WIP: comment out some test code illustrating the multiproessing hack * WIP: Debug logging distributed client * WIP: force dask work stealing extension to be turned off. Work-stealing breaks SymPy code. Closes #53 * Add back the logging to dask workers * Add lnprob times to debug * Add some documentation of multi phase fit, remove dask delayed * Add basic docs to estimate_hyperplane * ENH: Enable passing scheduler.json files as inputs Deprecates emcee and MPIPool. Everything runs through dask because cloudpickle is used and emcee multiprocessing and MPIPool can not use cloudpickle or custom serializers. * ENH: evaluate=False for Piecewise in refdata Huge performance bump, especially on SymPy 1.1.2dev versions where there were regressions. Will make the espei script much faster. * MAINT: Add work-stealing checks to dask scheduler with nice error message * MAINT: Bump pycalphad requirement to 0.7 * DOC: Update documentation for MPI schedulers * DOC: Update Mocking in docs
Was distributed (1.18.0) when this error occurred. Changed to distributed (1.16.3).
The text was updated successfully, but these errors were encountered: