-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.15-rc6 vs 5.10: iou-wrk-# spawned too much; iou-sqp-# hang; linked timeout returen -EINVAL #460
Comments
@beldzhang, let's try to take them one by one, but first I'll ask
|
The problem might be in __io_uring_submit() not checking submitted, let's try out |
What is your rlimit value for RLIMIT_NPROC? |
dmesg output may be useful. Also, can you get us a stack trace of that hung iou-sqp task? cat /proc/<task-pid>/stack May need to run several times to get anything meaningful if it's a live lock. |
Is it SQPOLL? Do you share SQPOLL backends between multiple io_uring instances (i.e. Will you be able to test kernel patches? |
some details: issue 1)
issue 2) related to issue 3: hung after timeout invalid happen
issue 3)
this is a dump of SQE/CQE sequences. Check dmesg everytime, nothing about this part |
Thanks for helping with that,
To be clear on this one, when you get many iou-wrk workers, was you using It's expected for splice to be punted to iowq and create so many tasks, surely not ideal, but that's how it's for now. However, if
I hope your kernel has debug symbols. Can you do the following, please?
If you've built the kernel from sources you can do |
reproduced issue 3), IOW we'll fix it soon edit: not as soon, it's not it, still needs debugging |
Do you handle those signals in the app? Do you mask a sigset, e.g. see arguments of io_uring_wait_cqes()? |
issue 1) BTW: after the prog successfully exit, the system load was increasing slowly, checked by htop but can not found which one caused this. issue 2)
issue .b) I will try to write a testing server/client in C to reproduce this issues. |
ok, I made a test case, it will produce issue 2) and 3)
|
@beldzhang, awesome. I think I have a repro for issue 1 (too many workers), I'll get back later. |
@beldzhang, found one problem relevant to issue 3, but it's not necessarily the one you have.
The thing is that Also, can you tell your full name and email, so we can attach a reported-by tag to commits to give you some credit? |
@beldzhang, can you try a custom liburing? Want to get some more info. cd <some_folder>
git clone https://github.com/isilence/liburing.git
cd liburing
git checkout origin/test-515
make Then you'd need to link it somehow. |
io_..._max_workers is called just after io_uring_queue_init() in the same thread, before any submits. and the ring is not shared. for the custom liburing: "sigmask set" not seen. following count was seen from a fresh booted system and first time run of testing server: when SQPOLL = off when SQPOLL = on it is my honour to get this credit, my full name on internet(not my passport :p) is: Beld Zhang <beldzhang@gmail.com> |
The reproducer was pretty helpful thanks. Recapping:
There is a couple of new patches in the tree, I'd want you to try them.
very likely it's because of issue 3
Fixed, patch is sent.
Can happen with SQPOLL and the userspace should be able to tolerate it, that's how liburing (not to mix with the kernel side) handles SQPOLL submissions. E.g. SQPOLL reaps some requests in the middle of submission and io_uring_submit() might return less than the number of SQE that had been in the queue.
confirmed not to happen in 5.15 |
build and tested with this commit: issue 1) fixed issue 2) still, can be reproduced by server/cli_read issue 3) fixed issue a) confirmed good issue 4) system load increasing slowly |
Sorry, I'd usually give more instructions and links, but was waiting until Jens takes the patch.
perfect
Yeah, rebase broke it. Jens will hopefully fix the branch today, but if not I create a new one, basically Jens' io_uring-5.15~1 + that patch. Could you try it? git clone https://github.com/isilence/linux.git
cd linux
git checkout origin/unprep
# and all the same further
Can you tell if that's still the case after retesting? |
test result of: issue 1) good issue 2) fixed issue 3) good issue 4) fixed |
Great, all the problems are from 5.15, so we're lucky on that front. Do you mind if we add Tested-by tag from you? Will also be attached to the commit |
Thanks for your hard working :) So shall we close this one now? or wait until 5.15 released ? |
5.15 will be released soon and you already tested fixes, so I think we can just close it. |
Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task that issued it, it's unexpected for users. If one task creates a ring, limits workers and then passes it to another task the limit won't be applied to the other task. Another pitfall is that a task should either create a ring or submit at least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all, furher complicating the picture. Change the API, save the limits and apply to all future users. Note, it should be done first before giving away the ring or submitting new requests otherwise the result is not guaranteed. Fixes: 2e48005 ("io-wq: provide a way to limit max number of workers") Link: axboe/liburing#460 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_unprep_linked_timeout() is broken, first it needs to return back REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But now we refcounted it, and linked timeouts may get not executed at all, leaking a request. Just kill the unprep optimisation. Fixes: 906c6ca ("io_uring: optimise io_prep_linked_timeout()") Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com Link: axboe/liburing#460 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
issue 1) on 5.15-rc7. issue 1.1) io-wq workers exceed on SMP system this is sqpoll is on:
this is sqpoll is off:
all the same except hardware difference:
issue 1.2) side effect of io_..._max_workers(0, 0)
when indicate issue 1.1, I add a verify step, then the iou-wrk-#### became over created again, although the numbers was set.
after:
issue 1.3) workers setting process wide or ring wide ?
after the first ring created, following ring got the initial max-workers of first one setting. dmesg is clean, no any error in program. |
106 of io-wq workers, right? I see a problem with SQPOLL in the code.
Will look what that is
It's per task (aka thread or process), don't like the API, but what you see is expected. |
issue 1.1) issue 1.3) my program is written in Free Pascal, not familiar with C. I will try to make a more complete testing case. |
For the sake of this discussion all three name the same thing. Initially The problem that there is N:M relation between tasks and io_urings, and as it uses io_uring as a mediator the API will always be a bit confusing whatever we do.
I see what may be leading to 1.1 and 1.2, so no worries. |
@beldzhang, can you try out a branch? https://github.com/isilence/linux/tree/iowq-limit_test git clone https://github.com/isilence/linux.git
cd linux
git checkout origin/iowq-limit_test
... |
result of: https://github.com/isilence/linux/tree/iowq-limit_test issue 1.1) SQPOLL = on
SQPOLL = off
|
There was a stupid mistake. Force pushed the branch, can you try it again to confirm whether it fixes 1.2?. I don't know what particularly happens with 1.1 and can't reproduce it. Do I get it right, you see the problem for both SQPOLL and non-SQPOLL modes, is that right? Also interesting if it's related to 1.2 and the branch fixes it. |
result of: isilence/linux@72c7f89 issue 1.2) fixed issue 1.1) confirmed, still there
since the server is truly SMP(2 sockets), is there any thing related? or NUMA ? Is it related to 1.3? since the workers is not that simple with ring, |
Add some debug printk, found the max_workers bumped to RLIMIT_NPROC change on 5.15-rc7
dmesg (partial):
full dmesg output attached: |
Can you make sure you call io_uring_register_iowq_max_workers(1,4) (or whatever arguments were) at least once for each ring? The limit is per task, but how the it's propagated is a different story. Just guessing how your app is written and want to check a hypothesis. |
In my code, the after a reviews for the then change the ring creation code to the begin of working thread, and tested with isilence/linux@72c7f89 non-SQPOLL: it's good now SQPOLL: iou-wrk-### still over created, but if I change to following sequence, it became good then ...
BTW, my habit is set the resources usage to the minimal, (1,2) is the smallest value for the ring to work. |
@isilence emmm... any idea? found a problem in io-wq.c: io_wq_max_workers(), do some patch, and issue 1.1 is fixed. before:
after fix:
temporary patch i tested
|
Awesome, makes sense, slipped from my head that there are multiple nodes, wanted to kill it off but forgot. |
patch sent. |
I think this need to be patched to 5.15.x also, shall I send to somewhere else? |
It'll automatically go to 5.15.x, that's why I marked it with a fixes tag. You don't need to do anything else. |
@isilence |
Pavel, are you sending that one in for 5.16? |
We are convert our storage protocol to io_uring, on kernel 5.10 it worked perfectly, when try run on kernel 5.15, there are multi issues, I'm not sure there are related so I put them together.
The protocol has typical structure:
loop:
we use liburing 2.1, compared kernel 5.10 and 5.15-rc6:
issue 1:
when SQPOLL not set, lots of iou-wrk-#### spawned ( >100 ) when connections >= 2,
I think it could be limited by io_uring_register_iowq_max_workers()
It happens when the disk read/write via splice way, if worked by buffered way, is ok.
issue 2:
when SQPOLL is set, after program exit, iou-sqp-#### hung up at state D (disk sleep)
the program can not be KILL, only do a system reboot
issue 3:
every network recv/send was linked with a timeout SQE, sometimes this SQE linked to header recving returned -EINVAL by cqe.res, this never occurred on kernel 5.10.
for now I just treat it like ECANCELED and looks ok.
and other minor issues:
a) io_uring_submit() return 0 sometimes, is this an error?, we just try accept this as good, and looks everything good.
b) when timeout was set in io_uring_wait_cqe_timeout(), io_uring_wait_cqes(), then this calls can not be interrupted by signal either SIGINT or SIGTERM, this was found from liburing 0.7
more details can be provided if required, and any suggestion or code testing are welcome.
Thanks a lot for your hard working.
The text was updated successfully, but these errors were encountered: