Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local: extend started_cbs AFTER state is initialized #4460

Merged
merged 1 commit into from Feb 9, 2019

Conversation

Projects
None yet
4 participants
@crepererum
Copy link
Contributor

commented Feb 5, 2019

  • Tests added / passed
  • Passes flake8 dask

I have observed a case where an exception seems to be raised very early and therefore state is not initialized in the finally block, leading to an UnboundLocalError and swallowing the original exception (which would have been helpful to have). I think in general, the state here should be initialized first, no matter what the root cause of my local problem (which I will report once I have found it).

I don't know if this needs a test or what a good test would be to trigger this issue.

@mrocklin

This comment has been minimized.

Copy link
Member

commented Feb 5, 2019

Thanks for catching this @crepererum . Do you happen to have a minimal example for this that could serve as a test? Otherwise I'm afraid that future maintainers might reverse this decision without realizing it.

Also cc @jcrist for review

@crepererum crepererum force-pushed the crepererum:fix/local_race_condition branch from 79f45f5 to 9fd528d Feb 5, 2019

@crepererum

This comment has been minimized.

Copy link
Contributor Author

commented Feb 5, 2019

@mrocklin I've managed to design a regression test :)

Show resolved Hide resolved dask/tests/test_threaded.py Outdated

@crepererum crepererum force-pushed the crepererum:fix/local_race_condition branch from 9fd528d to 531bcf4 Feb 6, 2019

@mrocklin

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

Merging this in tomorrow if there are no further comments (cc @jcrist )

@mrocklin mrocklin merged commit 2376da2 into dask:master Feb 9, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mrocklin

This comment has been minimized.

Copy link
Member

commented Feb 9, 2019

Thanks for the fix @crepererum ! This is in.

@rainwoodman

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

After this exact PR is merged, we start to see errors like this raised from cachey:

dask/cachey#15

I suspect it is because we are now registering the same key twice into cachey, which it doesn't currently support. But how could this happen?

@@ -405,15 +405,16 @@ def get_async(apply_async, num_workers, dsk, result, cache=None,
started_cbs = []
succeeded = False
try:
keyorder = order(dsk)

state = start_state_from_dask(dsk, cache=cache, sortkey=keyorder.get)

This comment has been minimized.

Copy link
@jcrist

jcrist Mar 4, 2019

Member

Sorry for the late review. This change breaks caching because the start callback needs to run before we initialize state, otherwise we don't replace tasks with computed keys properly, resulting in an invalid state (we'll compute dependencies for cached task x, then the callback will replace task x with its computed value, invalidating the computed dependencies).

@jcrist

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

I have observed a case where an exception seems to be raised very early and therefore state is not initialized in the finally block, leading to an UnboundLocalError and swallowing the original exception (which would have been helpful to have). I think in general, the state here should be initialized first, no matter what the root cause of my local problem (which I will report once I have found it).

I'm not sure I fully understand the issue here, but a different fix would just be to set state = {} before the try block, and go on from there. No unbound local, and keeps things working as is.

rainwoodman added a commit to rainwoodman/dask that referenced this pull request Mar 4, 2019

@rainwoodman rainwoodman referenced this pull request Mar 4, 2019

Merged

BUG: caching broken due to changed callback. #4542

0 of 2 tasks complete

mrocklin added a commit that referenced this pull request Mar 5, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.