Respect timeouts when restarting #1304

mrocklin · 2017-08-02T15:00:45Z

Previously restarting a cluster that had long-running tasks would sometimes
hang. This was because of two reasons:

The nanny's restart timeout was longer than the scheduler's restart
timeout
Now we pass a fraction of the scheduler's timeout down to the nanny
The workers used to wait for the executor to finish all currently
running tasks.
Now we don't

Fixes #1303 . Either of these changes are enough to fix the issue independently.

Previously restarting a cluster that had long-running tasks would sometimes hang. This was because of two reasons: 1. The nanny's restart timeout was longer than the scheduler's restart timeout Now we pass a fraction of the scheduler's timeout down to the nanny 2. The workers used to wait for the executor to finish all currently running tasks. Now we don't Fixes dask#1303

3.2 is raising SkipErrors

mrocklin added 2 commits August 2, 2017 10:56

pin pytest to 3.1

974554e

3.2 is raising SkipErrors

mrocklin merged commit 7985689 into dask:master Aug 2, 2017

mrocklin deleted the restart-timeout branch August 2, 2017 20:09

mrocklin mentioned this pull request Aug 2, 2017

Forced termination of workers by nannies not working properly #1303

Closed

mrocklin restored the restart-timeout branch October 6, 2017 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect timeouts when restarting #1304

Respect timeouts when restarting #1304

mrocklin commented Aug 2, 2017

Respect timeouts when restarting #1304

Respect timeouts when restarting #1304

Conversation

mrocklin commented Aug 2, 2017