New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved error messages for P2P shuffling #7979
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor cleanups
@@ -119,15 +122,24 @@ def shuffle_ids(self) -> set[ShuffleId]: | |||
|
|||
async def barrier(self, id: ShuffleId, run_id: int) -> None: | |||
shuffle = self.states[id] | |||
assert shuffle.run_id == run_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert shuffle.run_id == run_id | |
assert shuffle.run_id == run_id, "Shuffle barrier ID does not match requested run_id" |
? Or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -696,10 +703,10 @@ async def _get_shuffle_run( | |||
shuffle_id=shuffle_id, | |||
) | |||
if run_id < shuffle.run_id: | |||
raise RuntimeError("Stale shuffle") | |||
raise RuntimeError(f"{shuffle} stale, expected run_id=={run_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError(f"{shuffle} stale, expected run_id=={run_id}") | |
raise RuntimeError(f"{shuffle} stale, expected {run_id=}") |
elif run_id > shuffle.run_id: | ||
# This should never happen | ||
raise RuntimeError("Invalid shuffle state") | ||
raise RuntimeError(f"{shuffle} invalid, expected run_id=={run_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError(f"{shuffle} invalid, expected run_id=={run_id}") | |
raise RuntimeError(f"{shuffle} invalid, expected {run_id=}") |
@@ -696,10 +703,10 @@ async def _get_shuffle_run( | |||
shuffle_id=shuffle_id, | |||
) | |||
if run_id < shuffle.run_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: use the same ordering for the conditions (shuffle.run_id > run_id
) as in _restrict_task
in the scheduler extensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
elif run_id > shuffle.run_id: | ||
# This should never happen | ||
raise RuntimeError("Invalid shuffle state") | ||
if run_id > shuffle.run_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: flip order of operands to align with restrict_task
in the scheduler extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
pre-commit run --all-files