Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error messages for P2P shuffling #7979

Merged
merged 8 commits into from Jul 10, 2023

Conversation

hendrikmakait
Copy link
Member

  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor cleanups

@@ -119,15 +122,24 @@ def shuffle_ids(self) -> set[ShuffleId]:

async def barrier(self, id: ShuffleId, run_id: int) -> None:
shuffle = self.states[id]
assert shuffle.run_id == run_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert shuffle.run_id == run_id
assert shuffle.run_id == run_id, "Shuffle barrier ID does not match requested run_id"

? Or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

distributed/shuffle/_scheduler_extension.py Outdated Show resolved Hide resolved
distributed/shuffle/_scheduler_extension.py Outdated Show resolved Hide resolved
@@ -696,10 +703,10 @@ async def _get_shuffle_run(
shuffle_id=shuffle_id,
)
if run_id < shuffle.run_id:
raise RuntimeError("Stale shuffle")
raise RuntimeError(f"{shuffle} stale, expected run_id=={run_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(f"{shuffle} stale, expected run_id=={run_id}")
raise RuntimeError(f"{shuffle} stale, expected {run_id=}")

elif run_id > shuffle.run_id:
# This should never happen
raise RuntimeError("Invalid shuffle state")
raise RuntimeError(f"{shuffle} invalid, expected run_id=={run_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(f"{shuffle} invalid, expected run_id=={run_id}")
raise RuntimeError(f"{shuffle} invalid, expected {run_id=}")

@@ -696,10 +703,10 @@ async def _get_shuffle_run(
shuffle_id=shuffle_id,
)
if run_id < shuffle.run_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: use the same ordering for the conditions (shuffle.run_id > run_id) as in _restrict_task in the scheduler extensions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

distributed/shuffle/_worker_extension.py Outdated Show resolved Hide resolved
elif run_id > shuffle.run_id:
# This should never happen
raise RuntimeError("Invalid shuffle state")
if run_id > shuffle.run_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: flip order of operands to align with restrict_task in the scheduler extension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

hendrikmakait and others added 2 commits July 10, 2023 12:44
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@hendrikmakait hendrikmakait merged commit 9beab9a into dask:main Jul 10, 2023
19 of 26 checks passed
@hendrikmakait hendrikmakait deleted the improved-p2p-errors branch July 10, 2023 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants