New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffle metrics 4/4: Remove bespoke diagnostics #8367
Conversation
@@ -4326,16 +4326,12 @@ def __init__(self, scheduler, **kwargs): | |||
"comm_memory": [], | |||
"comm_memory_limit": [], | |||
"comm_buckets": [], | |||
"comm_avg_duration": [], | |||
"comm_avg_size": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're losing a little bit of functionality here.
IMHO it's not a big deal. Worth noting that we still have the information under the fine performance metrics (you'll have to calculate seconds/count and bytes/count yourself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree that this is not a big deal. The decaying averages of comm_avg_*
gave a (admittedly very crude) way of understanding distributions over time which are helpful to understand performance. (See also #8364 (comment)). For end-user analytics total averages should be enough to hint at problems, but I'm wondering if we should have a second set of metrics that is focused on debugging/performance optimization that goes into more detail.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 11h 45m 9s ⏱️ + 8m 29s For more details on these failures, see this check. Results for commit 5311963. ± Comparison against base commit 9273186. ♻️ This comment has been updated with latest results. |
d74aa0a
to
4bd63e2
Compare
cff1192
to
b5a6821
Compare
ed27fa6
to
77beba9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the existing bespoke metrics around for now. Some of them are more detailed and give us information per worker which has been extremely helpful in the past - in particular when viewed in real-time. Keeping them will also help us compare the information we get from the new approach and iterate if we identify gaps.
distributed/shuffle/_core.py
Outdated
label = (label,) | ||
if isinstance(label[0], str) and label[0].startswith("shuffle-"): | ||
label = (label[0][len("shuffle-") :], *label[1:]) | ||
name = ("shuffle", self.span_id, where, *label, unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General nit: I'd store these metrics under p2p
not shuffle
. IMO this should be clearer as there are other shuffle implementations and some P2P-based algorithms are not necessarily what would be called a shuffle
by the respective end users (e.g., rechunk
). This is a general grievance I have with the P2P codebase, but this feels like a good starting point to change things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed "shuffle" tag to "p2p" everywhere.
@@ -4326,16 +4326,12 @@ def __init__(self, scheduler, **kwargs): | |||
"comm_memory": [], | |||
"comm_memory_limit": [], | |||
"comm_buckets": [], | |||
"comm_avg_duration": [], | |||
"comm_avg_size": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree that this is not a big deal. The decaying averages of comm_avg_*
gave a (admittedly very crude) way of understanding distributions over time which are helpful to understand performance. (See also #8364 (comment)). For end-user analytics total averages should be enough to hint at problems, but I'm wondering if we should have a second set of metrics that is focused on debugging/performance optimization that goes into more detail.
4fc4fe4
to
4993e5a
Compare
4993e5a
to
982a6e6
Compare
@hendrikmakait I've reinstated all metrics that are visible from the dashboard, as discussed. This is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @crusaderky, this entire series of changes looks great!
Please read: #7943 (comment)
There are four commits in this PR. All but the last are the previous PRs in the chain.