-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: IORING_OP_POLL_ADD with signalfd #5
Comments
I tested your test app, I get the same behavior with io_uring and aio - ctrl-c does not wake either of them up? |
I got expected behavior for epoll and plain read. #include <unistd.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <signal.h>
int main() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigprocmask(SIG_BLOCK, &mask, NULL);
int sfd = signalfd(-1, &mask, 0);
struct signalfd_siginfo res;
read(sfd, &res, sizeof(res));
perror("read");
close(sfd);
return 0;
} |
Any updates? @axboe |
I need to look closer at this, because it seems to me like we should get the ctrl-c initially. Once the work has moved off to async, if it does, then that's a different story. I've flushed out my backlog of stuff for 5.5 fixes today, so I'll get on this tomorrow. |
Tomorrow came early, here's what I found. The initial ->poll() call, which ends up with signalfd for this case, is using the correct task. But we don't find any signals, so we queue this up for async execution. When the signal then comes in, it's delivered to the original task, it's woken up, and we call ->poll() to check for events. This time the task is the async worker, and it has no signals pending. Hence still no POLLIN, and we swallowed the SIGINT. I hacked with it a bit, this is what I came up with. It works for me, but not yet sure how palatable it'll be to the greater kernel community. I'll check.
|
Here's a cleaner variant, that just fixes signalfd instead:
|
I feel it's important to point out that it's signalfd that's broken here, it's not aio or io_uring. signalfd should be looking at the task that created the signalfd in the first place, not whatever process happens to be calling ->poll() on its behalf. |
Since this might take a bit of time to hash out, I'd recommend just using a sigset_t with io_uring_enter() instead. |
Using I can wait for it. Thanks. |
You don't have to use it directly, there is also |
Since we now support non-iter based read/write. Does IORING_OP_READV with signalfd have similar problem like IORING_OP_POLL_ADD? |
I tested IORING_OP_READV with signalfd on Linux 5.4.0. It didn't work too. #define _POSIX_C_SOURCE 1
#include <unistd.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <liburing.h>
int main() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigprocmask(SIG_BLOCK, &mask, NULL);
int sfd = signalfd(-1, &mask, 0);
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
struct signalfd_siginfo res;
struct iovec iov = { &res, sizeof(res) };
io_uring_prep_readv(sqe, sfd, &iov, 1, 0);
io_uring_submit(&ring);
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
printf("%d, %d\n", SIGINT, res.ssi_signo);
puts("OK");
close(sfd);
} Any progress? @axboe |
Working on it...
https://lore.kernel.org/io-uring/254505c9-2b76-ebeb-306c-02aaf1704b88@kernel.dk/T/#u
… On Nov 26, 2019, at 6:59 PM, Carter Li ***@***.***> wrote:
I tested IORING_OP_READV with signalfd on Linux 5.4.0. It didn't work too.
#define _POSIX_C_SOURCE 1
#include <unistd.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <liburing.h>
int main() {
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigprocmask(SIG_BLOCK, &mask, NULL);
int sfd = signalfd(-1, &mask, 0);
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
struct signalfd_siginfo res;
struct iovec iov = { &res, sizeof(res) };
io_uring_prep_readv(sqe, sfd, &iov, 1, 0);
io_uring_submit(&ring);
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
io_uring_cqe_seen(&ring, cqe);
io_uring_queue_exit(&ring);
printf("%d, %d\n", SIGINT, res.ssi_signo);
puts("OK");
close(sfd);
}
Any progress? @axboe
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
This is the reproducer for the kworker hang bug. Reproduction Steps: 1) A user task calls io_uring_queue_exit(). 2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is finished (do it as soon as step (1) is done). 3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds elapsed. 4) We get a complaint from the khungtaskd because the kworker is stuck in an uninterruptible state (D). The kworkers waiting on ring exit are not progressing as the task cannot proceed. When the user task is continued (e.g. get SIGCONT after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers then can finish the ring exit. We need a special handling for this case to avoid khungtaskd complaint. Currently we don't have the fix for this. The dmesg says: [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds. [247390.432314] Tainted: G OE 5.15.0-stable #5 [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432329] task:kworker/u8:2 state:D stack: 0 pid:358488 ppid: 2 flags:0x00004000 [247390.432341] Workqueue: events_unbound io_ring_exit_work [247390.432354] Call Trace: [247390.432368] __schedule+0x453/0x1850 [247390.432388] ? lock_acquire+0xc8/0x2d0 [247390.432404] ? usleep_range+0x90/0x90 [247390.432412] schedule+0x59/0xc0 [247390.432420] schedule_timeout+0x1aa/0x1f0 [247390.432429] ? mark_held_locks+0x49/0x70 [247390.432439] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432445] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432456] __wait_for_common+0xc2/0x170 [247390.432473] io_ring_exit_work+0x1d9/0x750 [247390.432486] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432502] ? verify_cpu+0xf0/0x100 [247390.432520] process_one_work+0x23b/0x550 [247390.432540] worker_thread+0x55/0x3c0 [247390.432546] ? process_one_work+0x550/0x550 [247390.432556] kthread+0x140/0x160 [247390.432564] ? set_kthread_struct+0x40/0x40 [247390.432574] ret_from_fork+0x1f/0x30 [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds. [247390.432613] Tainted: G OE 5.15.0-stable #5 [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432626] task:kworker/u8:0 state:D stack: 0 pid:359615 ppid: 2 flags:0x00004000 [247390.432635] Workqueue: events_unbound io_ring_exit_work [247390.432643] Call Trace: [247390.432653] __schedule+0x453/0x1850 [247390.432676] ? usleep_range+0x90/0x90 [247390.432684] schedule+0x59/0xc0 [247390.432691] schedule_timeout+0x1aa/0x1f0 [247390.432700] ? mark_held_locks+0x49/0x70 [247390.432710] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432717] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432728] __wait_for_common+0xc2/0x170 [247390.432744] io_ring_exit_work+0x1d9/0x750 [247390.432758] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432772] ? verify_cpu+0xf0/0x100 [247390.432788] process_one_work+0x23b/0x550 [247390.432807] worker_thread+0x55/0x3c0 [247390.432813] ? process_one_work+0x550/0x550 [247390.432824] kthread+0x140/0x160 [247390.432830] ? set_kthread_struct+0x40/0x40 [247390.432839] ret_from_fork+0x1f/0x30 [247390.432870] Showing all locks held in the system: [247390.432877] 1 lock held by khungtaskd/40: [247390.432880] #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174 [247390.432911] 1 lock held by in:imklog/922: [247390.432915] #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60 [247390.432977] 2 locks held by pager/318088: [247390.432981] #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50 [247390.433001] #1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660 [247390.433024] 1 lock held by htop/341462: [247390.433032] 2 locks held by kworker/u8:2/358488: [247390.433035] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433053] #1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433071] 2 locks held by kworker/u8:0/359615: [247390.433075] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433092] #1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433110] 1 lock held by dmesg/361178: [247390.433113] #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230 [247390.433134] ============================================= Cc: Pavel Begunkov <asml.silence@gmail.com> Link: #448 Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id> Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is the reproducer for the kworker hang bug. Reproduction Steps: 1) A user task calls io_uring_queue_exit(). 2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is finished (do it as soon as step (1) is done). 3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds elapsed. 4) We get a complaint from the khungtaskd because the kworker is stuck in an uninterruptible state (D). The kworkers waiting on ring exit are not progressing as the task cannot proceed. When the user task is continued (e.g. get SIGCONT after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers then can finish the ring exit. We need a special handling for this case to avoid khungtaskd complaint. Currently we don't have the fix for this. The dmesg says: [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds. [247390.432314] Tainted: G OE 5.15.0-stable axboe#5 [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432329] task:kworker/u8:2 state:D stack: 0 pid:358488 ppid: 2 flags:0x00004000 [247390.432341] Workqueue: events_unbound io_ring_exit_work [247390.432354] Call Trace: [247390.432368] __schedule+0x453/0x1850 [247390.432388] ? lock_acquire+0xc8/0x2d0 [247390.432404] ? usleep_range+0x90/0x90 [247390.432412] schedule+0x59/0xc0 [247390.432420] schedule_timeout+0x1aa/0x1f0 [247390.432429] ? mark_held_locks+0x49/0x70 [247390.432439] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432445] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432456] __wait_for_common+0xc2/0x170 [247390.432473] io_ring_exit_work+0x1d9/0x750 [247390.432486] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432502] ? verify_cpu+0xf0/0x100 [247390.432520] process_one_work+0x23b/0x550 [247390.432540] worker_thread+0x55/0x3c0 [247390.432546] ? process_one_work+0x550/0x550 [247390.432556] kthread+0x140/0x160 [247390.432564] ? set_kthread_struct+0x40/0x40 [247390.432574] ret_from_fork+0x1f/0x30 [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds. [247390.432613] Tainted: G OE 5.15.0-stable axboe#5 [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [247390.432626] task:kworker/u8:0 state:D stack: 0 pid:359615 ppid: 2 flags:0x00004000 [247390.432635] Workqueue: events_unbound io_ring_exit_work [247390.432643] Call Trace: [247390.432653] __schedule+0x453/0x1850 [247390.432676] ? usleep_range+0x90/0x90 [247390.432684] schedule+0x59/0xc0 [247390.432691] schedule_timeout+0x1aa/0x1f0 [247390.432700] ? mark_held_locks+0x49/0x70 [247390.432710] ? lockdep_hardirqs_on_prepare+0xff/0x180 [247390.432717] ? _raw_spin_unlock_irq+0x24/0x40 [247390.432728] __wait_for_common+0xc2/0x170 [247390.432744] io_ring_exit_work+0x1d9/0x750 [247390.432758] ? io_uring_del_tctx_node+0xe0/0xe0 [247390.432772] ? verify_cpu+0xf0/0x100 [247390.432788] process_one_work+0x23b/0x550 [247390.432807] worker_thread+0x55/0x3c0 [247390.432813] ? process_one_work+0x550/0x550 [247390.432824] kthread+0x140/0x160 [247390.432830] ? set_kthread_struct+0x40/0x40 [247390.432839] ret_from_fork+0x1f/0x30 [247390.432870] Showing all locks held in the system: [247390.432877] 1 lock held by khungtaskd/40: [247390.432880] #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174 [247390.432911] 1 lock held by in:imklog/922: [247390.432915] #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60 [247390.432977] 2 locks held by pager/318088: [247390.432981] #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50 [247390.433001] axboe#1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660 [247390.433024] 1 lock held by htop/341462: [247390.433032] 2 locks held by kworker/u8:2/358488: [247390.433035] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433053] axboe#1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433071] 2 locks held by kworker/u8:0/359615: [247390.433075] #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433092] axboe#1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550 [247390.433110] 1 lock held by dmesg/361178: [247390.433113] #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230 [247390.433134] ============================================= Cc: Pavel Begunkov <asml.silence@gmail.com> Link: axboe#448 Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Ctrl+C
should terminate the program but it doesn't. Similar code works for epoll: https://gist.github.com/CarterLi/b8db2fcfea689b96eeae382c38130afbThe text was updated successfully, but these errors were encountered: