Spans skeleton #7862

crusaderky · 2023-05-26T17:58:53Z

XREF User-defined spans #7860

In scope

Define spans on the client
Define a default span when no spans are explicitly set by the user
third party extensions can navigate spans top-down or bottom-up

Out of scope

As of this PR, you can trace a task back to its span, but not yet the other way around (without a full scan of Scheduler.tasks)
spans don't contain any useful aggregated information of their own
once a task is forgotten, you lose all the information

crusaderky · 2023-05-26T18:00:09Z

distributed/spans.py

+
+@contextmanager
+def span(*tags: str) -> Iterator[None]:
+    """Tag group of tasks to be part of a certain group, called a span.


This docstring is not rendered anywhere at the moment.
We should wait until we have something actually useful before we advertise it in sphinx.

crusaderky · 2023-05-26T18:10:49Z

CC @fjetter @hendrikmakait @ntabris

github-actions · 2023-05-26T20:03:11Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      20 files ±  0       20 suites ±0 11h 29m 15s ⏱️ + 11m 52s
  3 648 tests +  5   3 537 ✔️ +  3   108 💤 ±0 3 ❌ +2
35 270 runs +50 33 504 ✔️ +48 1 763 💤 ±0 3 ❌ +2

For more details on these failures, see this check.

Results for commit 2098b9a. ± Comparison against base commit 3286d2f.

♻️ This comment has been updated with latest results.

mrocklin · 2023-05-30T13:54:35Z

distributed/tests/test_spans.py

+    def assert_span(d, *expect):
+        span_id = (c.id, *expect)
+        assert s.tasks[d.key].annotations["span"] == span_id
+        assert a.state.tasks[d.key].annotations["span"] == span_id
+        assert ext.spans[span_id].id == span_id
+
+    assert_span(x)


Is it the case that every task gets a span defined by the client ID?

If so, my sense is that multi-client workloads are rare, and we're probably already able to attribute the client ID. My inclination would be to drop Client ID from spans, which leaves the common case of no spans unchanged / clean and without annotations, which seems good to me.

Is it the case that every task gets a span defined by the client ID?

Yes, there's always a root span if the user does not explicitly call span() on the client workflow, and it's identified by the client ID.

Last week, @fjetter, @hendrikmakait, @ntabris and I reached consensus in this direction during a high-bandwidth design discussion. The conclusion was that adding an annotation to all tasks is cheap and is a small cost to pay to support for multi-tenancy. The overhead becomes even smaller if the user actually uses the span() context manager, as this adds just a single element to a tuple.

The conclusion was that adding an annotation to all tasks is cheap and is a small cost to pay to support for multi-tenancy.

Can I ask to learn more about the benefits of multi-tenancy that were discussed? I appreciate that the cost is small. I'm confused about the benefit, it seems very small to me.

Adding an annotation to every task that includes the client ID doesn't make much sense to me, especially not in support of a new feature that hasn't demonstrated that it will get much use. I'd love to learn more.

Alternatively, I'd like to propose that we drop the client id from the span annotations. I suspect that that won't have any bad effects (please correct me if I'm not thinking of something) and will also isolate the effects of the proposed change to only when the new feature is used.

As an example, I can imagine changes in the future which turn off certain optimizations if any annotations are present, just because we don't know how to handle them. In this case (which seems not unlikely to me) it would have low impact if annotations are rare (as they are today) but would have very high impact if this change goes through.

I'm not against turning the implicit span creation by clients into a config value that's off by default and assuming a global span instead. If we found a way to attach tasks to an implicit client span without annotations, even better.

The main point is that we want to support overlapping spans to tag independent workloads. This also means that we want to handle workloads from different clients independently.

Setting one default span per client was almost an afterthought to me: When discussing how to pick default spans, we decided to drop any heuristics and simply use static defaults. (This is the important decision.) Using one default span per client felt like a better default on a multi-tenant setup as it would allow users to always differentiate between independent work triggered by different clients (scheduled jobs/users) without any/much perceived cost. I still think having such a feature would be nice, but I personally don't care if that's hidden behind some flag to avoid some cost.

Multi-client workloads are exceedingly rare. I recommend that we don't worry about them until they come up (I'll bet lots of beers that they don't come up). If they're the only motivation behind adding annotations to all tasks then, to me, the choice is clear that we don't think much about multi-client workloads and also get to not add annotations in the common case.

That works for me, spans can still be set explicitly in multi-client environments to handle workloads independently and whether or not we add annotations in the common case is essentially an implementation detail to me.

Removed client ID.

Thanks @crusaderky . I appreciate it.

crusaderky · 2023-05-30T20:07:32Z

This is again ready for review and merge

hendrikmakait

Thanks @crusaderky for getting this off the ground! The code looks good to me.

Outside the scope of this PR, I'd like to ideate on the representation of spans and how we communicate them to the scheduler. I think we should move our representation closer to something like the trace/span specification of OpenTelemetry (https://opentelemetry.io/docs/concepts/signals/traces/). In particular, I see value in relying on UUIDs to differentiate between different "runs" of the same span (in particular, now that we've decided against creating a default span per client) and using a richer representation that would allow us to add additional attributes. This is probably best done in a high-bandwidth conversation.

distributed/spans.py

crusaderky · 2023-05-31T12:02:05Z

In particular, I see value in relying on UUIDs to differentiate between different "runs" of the same span (in particular, now that we've decided against creating a default span per client)

Sounds like update_graph should prepend a unique ID to the span annotations.
...which is almost exactly what I just removed with the client id...

mrocklin · 2023-05-31T12:32:50Z

Sounds like update_graph should prepend a unique ID to the span annotations.
...which is almost exactly what I just removed with the client id...

Two possible differences:

We wouldn't add the ID to tasks without a span (thus leaving the common case the same as it was before this PR)
We might want to differentiate between similarly named spans within the same cluster (this is what I took from Hendrik's comment above)

crusaderky requested a review from fjetter as a code owner May 26, 2023 17:58

crusaderky commented May 26, 2023

View reviewed changes

crusaderky force-pushed the spans branch from f77ce0b to bc52af1 Compare May 26, 2023 18:01

crusaderky self-assigned this May 26, 2023

crusaderky removed the request for review from fjetter May 26, 2023 18:13

crusaderky marked this pull request as draft May 26, 2023 21:31

crusaderky marked this pull request as ready for review May 27, 2023 00:04

crusaderky force-pushed the spans branch 2 times, most recently from 7d31b76 to c9398a7 Compare May 30, 2023 11:18

mrocklin reviewed May 30, 2023

View reviewed changes

crusaderky mentioned this pull request May 30, 2023

Link TaskGroups to Spans #7869

Merged

spans

33f0f1e

crusaderky force-pushed the spans branch from 41915e3 to 33f0f1e Compare May 30, 2023 20:20

hendrikmakait self-requested a review May 31, 2023 08:48

hendrikmakait approved these changes May 31, 2023

View reviewed changes

distributed/spans.py Outdated Show resolved Hide resolved

Update distributed/spans.py

2098b9a

hendrikmakait merged commit 7926ea6 into dask:main May 31, 2023
23 of 28 checks passed

crusaderky deleted the spans branch May 31, 2023 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spans skeleton #7862

Spans skeleton #7862

crusaderky commented May 26, 2023 •

edited

crusaderky May 26, 2023

crusaderky commented May 26, 2023

github-actions bot commented May 26, 2023 •

edited

mrocklin May 30, 2023

crusaderky May 30, 2023

mrocklin May 30, 2023

mrocklin May 30, 2023

hendrikmakait May 30, 2023

hendrikmakait May 30, 2023

mrocklin May 30, 2023

hendrikmakait May 30, 2023

crusaderky May 30, 2023

mrocklin May 30, 2023

crusaderky commented May 30, 2023

hendrikmakait left a comment

crusaderky commented May 31, 2023

mrocklin commented May 31, 2023

Spans skeleton #7862

Spans skeleton #7862

Conversation

crusaderky commented May 26, 2023 • edited

In scope

Out of scope

Choose a reason for hiding this comment

crusaderky commented May 26, 2023

github-actions bot commented May 26, 2023 • edited

Unit Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky commented May 30, 2023

hendrikmakait left a comment

Choose a reason for hiding this comment

crusaderky commented May 31, 2023

mrocklin commented May 31, 2023

crusaderky commented May 26, 2023 •

edited

github-actions bot commented May 26, 2023 •

edited