New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: Enable GIL monitoring when gilknocker installed #7730
MINOR: Enable GIL monitoring when gilknocker installed #7730
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 22 files - 4 22 suites - 4 13h 52m 16s ⏱️ + 24m 2s For more details on these failures, see this check. Results for commit c942dc9. ± Comparison against base commit 76bbfaf. This pull request removes 25 tests.
This pull request skips 1 test.
♻️ This comment has been updated with latest results. |
f8c3e21
to
42e1d5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @milesgranger. Looks like this PR is being hit by unrelated failures that have been resolved on main
. Merging main
here should fix things
I think that's right for most, but also the gilknocker sampling thread is being left open since (somewhere) the
I spent most of yesterday trying to solve it from gilknocker's perspective but it appears to be a niche and difficult if not impossible problem to solve from gilknocker's/c-extension side as the Python main thread is being sigabrt'd from underneath the sampling thread which is trying to reacquire the (non-existent) GIL at that point. I'll try a few more things from that end, but otherwise I'll hunt to see where in the tests it's not being stopped. |
d1634da
to
1c560df
Compare
Okay, I think this is ready for another look. From what I can see, the remaining errors are not related. I tried a few different things from gilknocker's perspective but it's a tricky and rare bug to catch. Went for the easier solution for now, perhaps later I can devote some time to saving it from gilknocker's side. Edit: Seems there is actually 2 that I could see which were related... :( will poke at it a bit more and ping you when it's ready. |
@jrbourbeau @fjetter Apologies for the noise, turns out the I'm confident now the remaining failures are unrelated to this patch. And after spending more time that I'd like to admit, found a not-so-awful way to stop the GIL monitoring thread in gilknocker to avoid the niche For the remaining failures, I can open a PR to try out re-running failed tests using |
3e75a70
to
f6ec3b5
Compare
ad42cd3
to
c942dc9
Compare
sm = SystemMonitor() | ||
a = sm.update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to assert "gil_contention" in a
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmakait, after your comment I realized it could be refactored a bit since there is checking the default ON behavior at the top and then OFF behavior at the end of the test. Let me know if you think it should be changed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, thanks, @milesgranger. At a glance, the test failures appear to be unrelated. I have one minor clarification question that's still open.
Looks like this broke the dask testsuite. It looks like stuff breaks when it isn't installed. https://github.com/dask/dask/actions/runs/4810571135/jobs/8563382185?pr=10227 |
Thanks for flagging @phofl. It looks like the |
After benchmarking with latest gilknocker (0.4.0), it appears to have negligible performance impact (see screenshot of A/B test).
This patch only changes the config to default to enable, but would still require
gilknocker
to be installed.pre-commit run --all-files
Link to full A/B run: https://github.com/coiled/coiled-runtime/actions/runs/4437855278
xref #7290