Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Enable GIL monitoring when gilknocker installed #7730

Merged
merged 7 commits into from Apr 26, 2023

Conversation

milesgranger
Copy link
Contributor

@milesgranger milesgranger commented Mar 31, 2023

After benchmarking with latest gilknocker (0.4.0), it appears to have negligible performance impact (see screenshot of A/B test).

This patch only changes the config to default to enable, but would still require gilknocker to be installed.

  • Tests added / passed
  • Passes pre-commit run --all-files

image

Link to full A/B run: https://github.com/coiled/coiled-runtime/actions/runs/4437855278

xref #7290

@github-actions
Copy link
Contributor

github-actions bot commented Mar 31, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       22 files   -        4         22 suites   - 4   13h 52m 16s ⏱️ + 24m 2s
  3 527 tests  -      25    3 442 ✔️ ±       0       82 💤  -   23  3  - 2 
38 717 runs   - 6 235  37 166 ✔️  - 5 655  1 547 💤  - 579  4  - 1 

For more details on these failures, see this check.

Results for commit c942dc9. ± Comparison against base commit 76bbfaf.

This pull request removes 25 tests.
distributed.cli.tests.test_dask_scheduler
distributed.cli.tests.test_dask_ssh
distributed.dashboard.tests.test_components
distributed.dashboard.tests.test_scheduler_bokeh
distributed.dashboard.tests.test_worker_bokeh
distributed.deploy.tests.test_old_ssh
distributed.deploy.tests.test_ssh
distributed.diagnostics.tests.test_progress_stream
distributed.diagnostics.tests.test_widgets
distributed.protocol.tests.test_arrow
…
This pull request skips 1 test.
distributed.tests.test_nanny ‑ test_no_unnecessary_imports_on_worker[pandas]

♻️ This comment has been updated with latest results.

@milesgranger milesgranger force-pushed the MINOR-default-gil-monitoring-on branch 3 times, most recently from f8c3e21 to 42e1d5e Compare April 3, 2023 09:43
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @milesgranger. Looks like this PR is being hit by unrelated failures that have been resolved on main. Merging main here should fix things

@milesgranger
Copy link
Contributor Author

I think that's right for most, but also the gilknocker sampling thread is being left open since (somewhere) the SystemMonitor.close() is missing which then calls gilknocker.KnockKnock.stop(). This results in:

FATAL: exception not rethrown
Fatal Python error: Aborted

Current thread 0x00007fbae56bc640 (most recent call first):
<no Python frame>

I spent most of yesterday trying to solve it from gilknocker's perspective but it appears to be a niche and difficult if not impossible problem to solve from gilknocker's/c-extension side as the Python main thread is being sigabrt'd from underneath the sampling thread which is trying to reacquire the (non-existent) GIL at that point.

I'll try a few more things from that end, but otherwise I'll hunt to see where in the tests it's not being stopped.

@milesgranger milesgranger marked this pull request as draft April 4, 2023 06:31
@milesgranger milesgranger changed the title MINOR: Enable GIL monitoring when gilknocker installed [WIP] MINOR: Enable GIL monitoring when gilknocker installed Apr 4, 2023
@milesgranger milesgranger force-pushed the MINOR-default-gil-monitoring-on branch 3 times, most recently from d1634da to 1c560df Compare April 4, 2023 12:29
@milesgranger milesgranger marked this pull request as ready for review April 4, 2023 13:20
@milesgranger milesgranger changed the title [WIP] MINOR: Enable GIL monitoring when gilknocker installed MINOR: Enable GIL monitoring when gilknocker installed Apr 4, 2023
@milesgranger
Copy link
Contributor Author

milesgranger commented Apr 4, 2023

Okay, I think this is ready for another look. From what I can see, the remaining errors are not related.

I tried a few different things from gilknocker's perspective but it's a tricky and rare bug to catch. Went for the easier solution for now, perhaps later I can devote some time to saving it from gilknocker's side.


Edit: Seems there is actually 2 that I could see which were related... :( will poke at it a bit more and ping you when it's ready.

@milesgranger
Copy link
Contributor Author

milesgranger commented Apr 5, 2023

@jrbourbeau @fjetter Apologies for the noise, turns out the MINOR wasn't so minor after all. 😅

I'm confident now the remaining failures are unrelated to this patch. And after spending more time that I'd like to admit, found a not-so-awful way to stop the GIL monitoring thread in gilknocker to avoid the niche no Python frame core dumps that were happening when the test suite exited.

For the remaining failures, I can open a PR to try out re-running failed tests using pytest-rerunfailures if you think that may be helpful? EDIT: I see now we're already using it, marked on specific tests.

@milesgranger milesgranger force-pushed the MINOR-default-gil-monitoring-on branch 3 times, most recently from 3e75a70 to f6ec3b5 Compare April 11, 2023 12:33
@milesgranger milesgranger force-pushed the MINOR-default-gil-monitoring-on branch from ad42cd3 to c942dc9 Compare April 25, 2023 08:59
Comment on lines 120 to 121
sm = SystemMonitor()
a = sm.update()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to assert "gil_contention" in a here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hendrikmakait, after your comment I realized it could be refactored a bit since there is checking the default ON behavior at the top and then OFF behavior at the end of the test. Let me know if you think it should be changed at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, thanks, @milesgranger. At a glance, the test failures appear to be unrelated. I have one minor clarification question that's still open.

@hendrikmakait hendrikmakait merged commit afe6491 into dask:main Apr 26, 2023
25 of 32 checks passed
@milesgranger milesgranger deleted the MINOR-default-gil-monitoring-on branch April 26, 2023 13:21
@phofl
Copy link
Collaborator

phofl commented Apr 26, 2023

Looks like this broke the dask testsuite. It looks like stuff breaks when it isn't installed.

https://github.com/dask/dask/actions/runs/4810571135/jobs/8563382185?pr=10227

@jrbourbeau
Copy link
Member

Thanks for flagging @phofl. It looks like the mindeps builds here also have a bunch of failures (those builds don't include gilknocker). Will look into this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants