Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden test_abort_execution_to_fetch and more #6026

Merged
merged 1 commit into from Apr 4, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Mar 30, 2022

  • Make test_abort_execution_to_fetch and test_reschedule_concurrent_requests_deadlock faster and not reliant on timings
  • Tweaked a few tests to use hardcoded keys, as they make the output of assert_worker_story a lot more readable.

@crusaderky crusaderky self-assigned this Mar 30, 2022
("f1", "released", "forgotten", "forgotten", {}),
],
strict=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that you and @fjetter are aligned here, so I'm happy to step back here.

However, my experience is that tests like these will become expensive and annoying if you ever want to change the state machine system. Every time a dev changes the system, they will need to look at this test, understand its intent, and change it accordingly. If this output is at all likely to change over the next few years then this seems like it might add a lot of inertia.

Dask tests used to look a lot like this in the early days. There was a painful process to get rid of them and replace them with tests that use the user API.

Again, I'm happy if this is the direction that you all want to go (please do not block progress on this comment).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the worker state machine refactoring, I stepped on a few tests that failed because they were testing a story and it changed in the refactoring. Yes, it costed me an extra 10 minutes to go through them and update the story, but it was a good thing to see and validate what changed in the story exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense if it's one or two, and if it's the same person writing the tests as writing the system.

However, if you make dozens of these tests, then if someone follows you in a year they will either not be able to make any changes, or more likely, they will delete all of these tests.

Anyway, I think that you and Florian have this covered. I'm raising this as a general concern. It may be that you aren't planning to repeat this many times, or it may be that you are planning to repeat it, but are making a reasoned decision to proceed down this path because the benefits outweight the costs. Again, not blocking anything here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if you make dozens of these tests, then if someone follows you in a year they will either not be able to make any changes, or more likely, they will delete all of these tests.

If somebody wants to change the state machine without understanding this, they shouldn't change the state machine at all.
Deleting these tests would be rather reckless. This has been incredibly helpful in debugging deadlocks. If these stories change, this should be done absolutely intentional and not by chance.
We should not introduce these all over the place but particularly tests that test sophisticated race conditions, this has proven to be quite useful.

One of the outcomes of the current refactoring will be the possibility to log state machine events that do not go down to this granular level. This should make tests more readable while maintaining similar expressiveness.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 31, 2022

Unit Test Results

       18 files  ±0         18 suites  ±0   8h 51m 56s ⏱️ - 19m 4s
  2 697 tests ±0    2 612 ✔️  - 2       82 💤 +1  3 +1 
24 121 runs  ±0  22 900 ✔️  - 5  1 218 💤 +4  3 +1 

For more details on these failures, see this check.

Results for commit 276a2c0. ± Comparison against base commit ccb0362.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator Author

@fjetter FYI This is blocking the _ensure_computing refactoring

@crusaderky crusaderky changed the title Harden test_abort_execution_to_fetch Harden test_abort_execution_to_fetch and more Mar 31, 2022
@crusaderky crusaderky merged commit 0f2674b into dask:main Apr 4, 2022
@crusaderky crusaderky deleted the test_abort_execution_to_fetch branch April 4, 2022 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants