Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle metrics 4/4: Remove bespoke diagnostics #8367

Merged
merged 2 commits into from Dec 21, 2023

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Nov 19, 2023

Please read: #7943 (comment)
There are four commits in this PR. All but the last are the previous PRs in the chain.

@@ -4326,16 +4326,12 @@ def __init__(self, scheduler, **kwargs):
"comm_memory": [],
"comm_memory_limit": [],
"comm_buckets": [],
"comm_avg_duration": [],
"comm_avg_size": [],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing a little bit of functionality here.
IMHO it's not a big deal. Worth noting that we still have the information under the fine performance metrics (you'll have to calculate seconds/count and bytes/count yourself).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree that this is not a big deal. The decaying averages of comm_avg_* gave a (admittedly very crude) way of understanding distributions over time which are helpful to understand performance. (See also #8364 (comment)). For end-user analytics total averages should be enough to hint at problems, but I'm wondering if we should have a second set of metrics that is focused on debugging/performance optimization that goes into more detail.

@crusaderky crusaderky marked this pull request as ready for review November 19, 2023 23:38
Copy link
Contributor

github-actions bot commented Nov 20, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±  0         27 suites  ±0   11h 45m 9s ⏱️ + 8m 29s
  3 949 tests +  1    3 835 ✔️ +  1     110 💤 ±0  4 ±0 
49 672 runs  +31  47 367 ✔️ +39  2 301 💤  - 6  4  - 2 

For more details on these failures, see this check.

Results for commit 5311963. ± Comparison against base commit 9273186.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky force-pushed the shuffle/metrics4 branch 3 times, most recently from d74aa0a to 4bd63e2 Compare November 30, 2023 16:35
@crusaderky crusaderky force-pushed the shuffle/metrics4 branch 3 times, most recently from cff1192 to b5a6821 Compare December 12, 2023 18:33
@crusaderky crusaderky force-pushed the shuffle/metrics4 branch 2 times, most recently from ed27fa6 to 77beba9 Compare December 18, 2023 14:07
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the existing bespoke metrics around for now. Some of them are more detailed and give us information per worker which has been extremely helpful in the past - in particular when viewed in real-time. Keeping them will also help us compare the information we get from the new approach and iterate if we identify gaps.

label = (label,)
if isinstance(label[0], str) and label[0].startswith("shuffle-"):
label = (label[0][len("shuffle-") :], *label[1:])
name = ("shuffle", self.span_id, where, *label, unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General nit: I'd store these metrics under p2p not shuffle. IMO this should be clearer as there are other shuffle implementations and some P2P-based algorithms are not necessarily what would be called a shuffle by the respective end users (e.g., rechunk). This is a general grievance I have with the P2P codebase, but this feels like a good starting point to change things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed "shuffle" tag to "p2p" everywhere.

@@ -4326,16 +4326,12 @@ def __init__(self, scheduler, **kwargs):
"comm_memory": [],
"comm_memory_limit": [],
"comm_buckets": [],
"comm_avg_duration": [],
"comm_avg_size": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree that this is not a big deal. The decaying averages of comm_avg_* gave a (admittedly very crude) way of understanding distributions over time which are helpful to understand performance. (See also #8364 (comment)). For end-user analytics total averages should be enough to hint at problems, but I'm wondering if we should have a second set of metrics that is focused on debugging/performance optimization that goes into more detail.

@crusaderky crusaderky force-pushed the shuffle/metrics4 branch 7 times, most recently from 4fc4fe4 to 4993e5a Compare December 20, 2023 00:10
@crusaderky
Copy link
Collaborator Author

@hendrikmakait I've reinstated all metrics that are visible from the dashboard, as discussed. This is ready for review again.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @crusaderky, this entire series of changes looks great!

@hendrikmakait hendrikmakait merged commit 81774d4 into dask:main Dec 21, 2023
29 of 34 checks passed
@crusaderky crusaderky deleted the shuffle/metrics4 branch December 22, 2023 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use metering for P2P shuffling instrumentation
2 participants