-
Notifications
You must be signed in to change notification settings - Fork 216
Io uring fix stop facade #977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pull requests from external contributors require approval from a |
|
/ok to test |
test/exec/test_io_uring_context.cpp
Outdated
| }))); | ||
| auto end = std::chrono::steady_clock::now(); | ||
| auto diff = end - start; | ||
| CHECK(diff < 5ms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure of this test perhaps indicates that work scheduled to run in the future on the io_uring scheduler doesn't complete promptly when it has been cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it might be as you say. I will look into it asap. The failing test uses kernel 5.4 and the implementation is not touched by this pr (yet) because I only changed the IORING_OP_ASYNC_CANCEL code path, which requires kernel 5.5 and newer.
|
/ok to test |
|
It doesn't seem to be a timing issue. |
|
/ok to test |
|
/ok to test |
|
I could finally reproduce the issue. It is because the container image is based on Ubuntu-2204 but the GPU host runner is probably on an older system with a Kernel 5.4 (I assume ubuntu-2004). I could reproduce the error in a Ubuntu-2004 VM where I compiled and ran the tests in an Ubuntu-2204 container. We have implemented two strategies for cancelation based on the kernel version. But since this is a header-only solution the detection of the kernel version works via defines in The question is how to solve this now. I know of following solutions
any thoughts? |
This would be my preferred solution. I don't think we're under any obligation to support Franken-distros. @trxcllnt, any clue what could have led to this situation? |
I want to side note that up until recently the default github action runner were based on ubuntu-20.04 too. If upgrading the runners host system is too complicated because of other dependencies the test should also be OK with a container image based on 20.04. Or at least faking the Linux header files |
|
Paul makes the valid point that if code is compiled on one kernel version and run on another, it should still work. That argues for runtime detection. |
We have compatibilty for newer kernel versions. i.e. if we compile an application with kernel version 5.4 it will also run correctly on kernel 5.15. If I build my application against kernel version 5.15 and use features of this newer kernel I don't really expect my application to work correctly on an older kernel. I will add the runtime check nonetheless. But it will pessimize the size of the timed schedule operations, because the solution that uses timerfd has more state. io_uring itself provides a query for supported op codes but this is officially available since kernel version 5.6. Since Ubuntu-20.04 ships with kernel version 5.4 the only way to test for cancelation support is to make a dummy cancel request and check the error code for |
|
How about this: We require a kernel version of 5.5 (which introduces |
|
/ok to test |
|
/ok to test |
|
There was something weird going on for |
|
/ok to test |
|
/ok to test |
|
Thank you! |
Using the stopping facade writes currently the wrong pointer into the
IORING_OP_ASYNC_CANCELoperation. This PR fixes this by passing the pointer to__taskwhich is used for the original submission.Its still a draft because I stlil want to think about a cleaner solution