Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework some tests related to gather_dep #6472

Merged
merged 7 commits into from Jun 15, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented May 30, 2022

Cosmetic tweaks to a few tests.

This has been salvaged from #6462 following controversy.

for name, color in zip(lists["name"], lists["color"]):
if name == "transfer":
assert color == "red"
assert (name == "transfer-sum") == (color == "red")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tightened tested conditions

assert ev[0] == "request-dep"
assert len(ev[2]) == 5
for ev in story[20:]:
assert ev[0] == "receive-dep"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tighten tested condition + clarifications

@gen_cluster(client=True, Worker=Nanny)
async def test_acquire_replicas_already_in_flight(c, s, *nannies):
@gen_cluster(client=True, nthreads=[("", 1)])
async def test_acquire_replicas_already_in_flight(c, s, a):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is now deterministic and much faster

@gen_cluster(client=True, nthreads=[("", 1)])
async def test_gather_dep_cancelled_rescheduled(c, s, a):
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is
waiting for the data of the initial flight.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can now be drastically simplified after #6371.
Remove assumption that both dependencies of a task will be gathered in a single fetch (this will no longer be the case in #6462).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can now be drastically simplified after #6371.

I don't think we should simplify tests just because we know that internals changed.

Remove assumption that both dependencies of a task will be gathered in a single fetch

Where do we assume this in the previous test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should simplify tests just because we know that internals changed.

The previous test was so complicated because it had to have two checkpoints:

  1. after you enter gather_dep, but before you run the preamble
  2. while you're in the comms

I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?

Where do we assume this in the previous test?

fut4 = c.submit(sum, fut1, fut2, workers=[b.address], key="f4")

This sends a compute-task event to b, which in turn sends f1 and f2 into fetch at the same time.
As they're dependencies of the same task, they have the same exact priority.

After #6462, there are two separate fetches. You'll get that f1 is fetched first 50% of the times, at random. The two keys are most likely go through a set at some point during the process, and the ordering of sets changes every time you restart the python interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?

Will this be true forever? How do we ensure that this stays true? This test asks the question
"Is it harmful to cancel a task after a fetch is scheduled (gather-dep) but before we're actually stepping in this coroutine".
The only way to answer this question with certainty is to have this test or ensure there is not a single await before the get_data message is sent which is hard/impossible.

While this is a seemingly artificial question to ask, this was causing a deadlock in the past #5525 and I don't think the pre-filtering optimization was entirely unreasonable.

Test are also there to protect us from reintroducing regressions and this test is written at a high enough abstraction level (we only rely on task state names (e.g. flight, cancelled) and the fact that there are the gather_dep/get_data methods) that I have confidence it won't bother us a lot. The test is not known to be flaky, it runs very fast (0.1s) and still works on your proposed change of #6462 (at least for me locally)

I don't see a reason why we should change anything about this test

"""A task transitions from flight to cancelled while gather_dep is waiting for the
data.

See also test_gather_dep_cancelled_rescheduled
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove assumption that both dependencies of a task will be gathered in a single fetch (this will no longer be the case in #6462).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we assume this in the previous test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test still passes for me w/ your changes in #6462

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not.
50% of the times you get:

>           assert b.tasks[fut2.key].state == "flight"
E           AssertionError: assert 'fetch' == 'flight'

@crusaderky crusaderky marked this pull request as ready for review May 30, 2022 14:38
@crusaderky crusaderky self-assigned this May 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented May 30, 2022

Unit Test Results

       15 files  +       12         15 suites  +12   6h 16m 7s ⏱️ + 5h 28m 21s
  2 832 tests +  1 636    2 748 ✔️ +  1 586    81 💤 +  47  2 +2  1 🔥 +1 
20 987 runs  +17 402  20 038 ✔️ +16 555  946 💤 +844  2 +2  1 🔥 +1 

For more details on these failures and errors, see this check.

Results for commit 8d5d50e. ± Comparison against base commit 69b798d.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky linked an issue May 30, 2022 that may be closed by this pull request
@crusaderky crusaderky force-pushed the WSMR/clustered_transfers_tests branch from d0d03ee to e59a138 Compare June 1, 2022 12:13
@crusaderky crusaderky force-pushed the WSMR/clustered_transfers_tests branch from e59a138 to efb2dd9 Compare June 1, 2022 14:32
Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like us to use stories in tests only if necessary and would always prefer to use something more high level.
Most of these tests were written in a way that stories were not necessary before and I do not see added value in having a redundant, low level assert. I left comments everywhere so we can discuss individually if this is necessary

distributed/tests/test_scheduler.py Outdated Show resolved Hide resolved
"""
a.total_out_connections = 2
futures = await c.scatter(
{f"x{i}": i for i in range(100)},
workers=[w.address for w in workers],
)
assert all(w.data for w in workers)
assert all(len(w.data) == 5 for w in workers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I feel this is too granular for this test. Does this test require the distribution to be homogeneous? I think we should not test a "decide_worker" logic in this test.
  2. How is 5 even a correct number when we're scattering 100 tasks to 21 workers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is too granular for this test.

Ok, relaxing it.

How is 5 even a correct number when we're scattering 100 tasks to 21 workers?

No, we're scattering 100 tasks to 20 workers.
See function signature: c, s, a, *workers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed workers to snd_workers for clarification

Comment on lines 2760 to 2761
while a.log[-1][:5] != ("x", "flight", "fetch", "flight", {}):
await asyncio.sleep(0.01)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there an event we can listen to? I consider events much more robust than the transition logs and much easier to read.

I would even prefer having a plain sleep in here than listening to the transition log

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to trigger the event directly on the worker so that no wait is involved

@gen_cluster(client=True, nthreads=[("", 1)])
async def test_gather_dep_cancelled_rescheduled(c, s, a):
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is
waiting for the data of the initial flight.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can now be drastically simplified after #6371.

I don't think we should simplify tests just because we know that internals changed.

Remove assumption that both dependencies of a task will be gathered in a single fetch

Where do we assume this in the previous test?

Comment on lines 3167 to 3183
assert_story(
a.story(fut1.key),
[
(fut1.key, "fetch", "flight", "flight", {}),
(fut1.key, "flight", "released", "cancelled", {}),
(fut1.key, "cancelled", "fetch", "flight", {}),
(fut1.key, "flight", "memory", "memory", {"f2": "ready"}),
],
)
# Test that the data transfer only happens once
assert_story(
a.story("request-dep"),
[
("request-dep", b.address, {fut1.key}),
],
strict=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these stories add a lot of value to the test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you propose? I think it's important to test that we sent out only one transfer request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is outgoing_transfer_log and incoming_transfer_log or even more coarse, there are counters for both.

e.g.

Suggested change
assert_story(
a.story(fut1.key),
[
(fut1.key, "fetch", "flight", "flight", {}),
(fut1.key, "flight", "released", "cancelled", {}),
(fut1.key, "cancelled", "fetch", "flight", {}),
(fut1.key, "flight", "memory", "memory", {"f2": "ready"}),
],
)
# Test that the data transfer only happens once
assert_story(
a.story("request-dep"),
[
("request-dep", b.address, {fut1.key}),
],
strict=True,
)
assert a.incoming_count == 1

Comment on lines 3212 to 3219
assert_story(
b.story(fut1.key),
[
(fut1.key, "flight", "released", "cancelled", {}),
(fut1.key, "cancelled", "memory", "released", {fut1.key: "forgotten"}),
(fut1.key, "released", "forgotten", "forgotten", {}),
],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this story is redundant. We're asserting all of this above already

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

"""A task transitions from flight to cancelled while gather_dep is waiting for the
data.

See also test_gather_dep_cancelled_rescheduled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we assume this in the previous test?

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to reproduce issues that are supposedly introduced with #6462
If there are any issues that connect to that PR specifically, I suggest to discuss it over there. I appreciate the effort of breaking up larger changes but if the changes are directly connected, I'm actually having a harder time reviewing if they are split up

"""A task transitions from flight to cancelled while gather_dep is waiting for the
data.

See also test_gather_dep_cancelled_rescheduled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test still passes for me w/ your changes in #6462

@gen_cluster(client=True, nthreads=[("", 1)])
async def test_gather_dep_cancelled_rescheduled(c, s, a):
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is
waiting for the data of the initial flight.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?

Will this be true forever? How do we ensure that this stays true? This test asks the question
"Is it harmful to cancel a task after a fetch is scheduled (gather-dep) but before we're actually stepping in this coroutine".
The only way to answer this question with certainty is to have this test or ensure there is not a single await before the get_data message is sent which is hard/impossible.

While this is a seemingly artificial question to ask, this was causing a deadlock in the past #5525 and I don't think the pre-filtering optimization was entirely unreasonable.

Test are also there to protect us from reintroducing regressions and this test is written at a high enough abstraction level (we only rely on task state names (e.g. flight, cancelled) and the fact that there are the gather_dep/get_data methods) that I have confidence it won't bother us a lot. The test is not known to be flaky, it runs very fast (0.1s) and still works on your proposed change of #6462 (at least for me locally)

I don't see a reason why we should change anything about this test

@jrbourbeau jrbourbeau mentioned this pull request Jun 7, 2022
9 tasks
@crusaderky crusaderky marked this pull request as draft June 9, 2022 15:56
@crusaderky
Copy link
Collaborator Author

Salvaged PR. This is again ready for review and merge.

@crusaderky crusaderky marked this pull request as ready for review June 9, 2022 16:54
@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 28m 20s ⏱️ - 21m 18s
  2 865 tests ±0    2 784 ✔️ +33    80 💤  - 1  1  - 29 
21 224 runs  ±0  20 287 ✔️ +38  936 💤  - 3  1  - 32 

For more details on these failures, see this check.

Results for commit 51a3ba9. ± Comparison against base commit 344868a.

♻️ This comment has been updated with latest results.

@fjetter fjetter merged commit cb88e3b into dask:main Jun 15, 2022
@crusaderky crusaderky deleted the WSMR/clustered_transfers_tests branch June 16, 2022 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants