-
-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dask distributed , fail to start worker #2446
Comments
I've run into issues when using |
there is a library named billiard Would you consider using it to solve this issue? |
I run the following code in loop test1.py
like this
after less then 10 iteration . the code is in endless loop
all the child process where created but didn't complete the worker initialisation callling
process 20741 is a dask-worker that didn't complete the initialisation and didn't change it name yet calling
process 20741 is the worker that is stuck in deadline , due to a lock that was forked from the parent |
I think it would be reasonable to allow people to come with their own Thank you for the analysis above. If you are able to track down the source of the issue and provide a fix that would be very welcome. |
Here is the stack trace of process when it is deadlocked
|
Hrm, interesting. Any suggestions?
…On Fri, Jan 11, 2019 at 12:40 PM redsum ***@***.***> wrote:
Here is the stack trace of process when it is deadlocked
It is trying to acquire the lock of the logger
Traceback (most recent call first):
Waiting for the GIL
File "/usr/lib/python2.7/threading.py", line 174, in acquire
rc = self.__block.acquire(blocking)
File "/usr/lib/python2.7/logging/__init__.py", line 212, in _acquireLock
_lock.acquire()
File "/usr/lib/python2.7/logging/__init__.py", line 1041, in getLogger
_acquireLock()
File "/usr/lib/python2.7/logging/__init__.py", line 1574, in getLogger
return Logger.manager.getLogger(name)
File "/usr/local/lib/python2.7/dist-packages/distributed/process.py", line 156, in reset_logger_locks
for handler in logging.getLogger(name).handlers:
File "/usr/local/lib/python2.7/dist-packages/distributed/process.py", line 165, in _run
cls.reset_logger_locks()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2446 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszKkXvcQUmiw0pe6AWWX-FnFBT73Dks5vCPazgaJpZM4Zl5VR>
.
|
it may be fixed just to fail on the next lock |
A workaround , was to use start with one worker and scale 1 by 1 |
I get timeout error when trying to tart a localCluster with 4 process or more
I'm using dask 1.25.1 with python 2.7 running over mac
This happens also happen during tests
i modify the dask distributed test
test_procs
found in distributed/deploy/tests/test_local.pylike this
i set the n_workers to 4 instead of 2
and i get this error
The text was updated successfully, but these errors were encountered: