Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: IORING_OP_POLL_ADD with signalfd #5

Closed
CarterLi opened this issue Sep 18, 2019 · 13 comments
Closed

Issue: IORING_OP_POLL_ADD with signalfd #5

CarterLi opened this issue Sep 18, 2019 · 13 comments

Comments

@CarterLi
Copy link
Contributor

CarterLi commented Sep 18, 2019

#include <unistd.h>
#include <sys/signalfd.h>
#include <sys/poll.h>

#include <liburing.h>

int main() {
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);

    sigprocmask(SIG_BLOCK, &mask, NULL);
    int sfd = signalfd(-1, &mask, SFD_NONBLOCK);

    struct io_uring ring;
    io_uring_queue_init(32, &ring, 0);

    struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
    io_uring_prep_poll_add(sqe, sfd, POLLIN);
    io_uring_submit(&ring);

    struct io_uring_cqe *cqe;
    io_uring_wait_cqe(&ring, &cqe);
    io_uring_cqe_seen(&ring, cqe);
    io_uring_queue_exit(&ring);

    close(sfd);
    return 0;
}

Ctrl+C should terminate the program but it doesn't. Similar code works for epoll: https://gist.github.com/CarterLi/b8db2fcfea689b96eeae382c38130afb

Linux Ubuntu 5.3.0-10-generic #11-Ubuntu SMP Mon Sep 9 15:12:17 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
@axboe
Copy link
Owner

axboe commented Sep 18, 2019

I tested your test app, I get the same behavior with io_uring and aio - ctrl-c does not wake either of them up?

@CarterLi
Copy link
Contributor Author

CarterLi commented Sep 18, 2019

I tested your test app, I get the same behavior with io_uring and aio - ctrl-c does not wake either of them up?

I got expected behavior for epoll and plain read.

#include <unistd.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <signal.h>

int main() {
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);

    sigprocmask(SIG_BLOCK, &mask, NULL);
    int sfd = signalfd(-1, &mask, 0);
    struct signalfd_siginfo res;
    read(sfd, &res, sizeof(res));
    perror("read");
    close(sfd);
    return 0;
}

@CarterLi
Copy link
Contributor Author

Any updates? @axboe

@axboe
Copy link
Owner

axboe commented Nov 14, 2019

I need to look closer at this, because it seems to me like we should get the ctrl-c initially. Once the work has moved off to async, if it does, then that's a different story. I've flushed out my backlog of stuff for 5.5 fixes today, so I'll get on this tomorrow.

@axboe
Copy link
Owner

axboe commented Nov 14, 2019

Tomorrow came early, here's what I found. The initial ->poll() call, which ends up with signalfd for this case, is using the correct task. But we don't find any signals, so we queue this up for async execution. When the signal then comes in, it's delivered to the original task, it's woken up, and we call ->poll() to check for events. This time the task is the async worker, and it has no signals pending. Hence still no POLLIN, and we swallowed the SIGINT.

I hacked with it a bit, this is what I came up with. It works for me, but not yet sure how palatable it'll be to the greater kernel community. I'll check.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d8ea9b4f83a7..d9a4c9aac958 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -299,6 +299,7 @@ struct io_poll_iocb {
 	bool				done;
 	bool				canceled;
 	struct wait_queue_entry		wait;
+	struct task_struct		*task;
 };
 
 struct io_timeout {
@@ -2021,7 +2022,10 @@ static void io_poll_complete_work(struct io_wq_work **workptr)
 	struct io_wq_work *work = *workptr;
 	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
 	struct io_poll_iocb *poll = &req->poll;
-	struct poll_table_struct pt = { ._key = poll->events };
+	struct poll_table_struct pt = {
+		._key = poll->events,
+		.task = poll->task
+	};
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *nxt = NULL;
 	__poll_t mask = 0;
@@ -2139,9 +2143,11 @@ static int io_poll_add(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	poll->head = NULL;
 	poll->done = false;
 	poll->canceled = false;
+	poll->task = current;
 
 	ipt.pt._qproc = io_poll_queue_proc;
 	ipt.pt._key = poll->events;
+	ipt.pt.task = poll->task;
 	ipt.req = req;
 	ipt.error = -EINVAL; /* same as no support for IOCB_CMD_POLL */
 
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 44b6845b071c..a7f31758db1a 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -61,16 +61,17 @@ static int signalfd_release(struct inode *inode, struct file *file)
 static __poll_t signalfd_poll(struct file *file, poll_table *wait)
 {
 	struct signalfd_ctx *ctx = file->private_data;
+	struct task_struct *tsk = wait->task ?: current;
 	__poll_t events = 0;
 
-	poll_wait(file, &current->sighand->signalfd_wqh, wait);
+	poll_wait(file, &tsk->sighand->signalfd_wqh, wait);
 
-	spin_lock_irq(&current->sighand->siglock);
-	if (next_signal(&current->pending, &ctx->sigmask) ||
-	    next_signal(&current->signal->shared_pending,
+	spin_lock_irq(&tsk->sighand->siglock);
+	if (next_signal(&tsk->pending, &ctx->sigmask) ||
+	    next_signal(&tsk->signal->shared_pending,
 			&ctx->sigmask))
 		events |= EPOLLIN;
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock_irq(&tsk->sighand->siglock);
 
 	return events;
 }
diff --git a/include/linux/poll.h b/include/linux/poll.h
index 1cdc32b1f1b0..6d2b6d923b2b 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -43,6 +43,7 @@ typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_
 typedef struct poll_table_struct {
 	poll_queue_proc _qproc;
 	__poll_t _key;
+	struct task_struct *task;
 } poll_table;
 
 static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
@@ -76,6 +77,7 @@ static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 {
 	pt->_qproc = qproc;
 	pt->_key   = ~(__poll_t)0; /* all events enabled */
+	pt->task = NULL;
 }
 
 static inline bool file_can_poll(struct file *file)

@axboe
Copy link
Owner

axboe commented Nov 14, 2019

Here's a cleaner variant, that just fixes signalfd instead:

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 44b6845b071c..cc72b5b08946 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -50,6 +50,7 @@ void signalfd_cleanup(struct sighand_struct *sighand)
 
 struct signalfd_ctx {
 	sigset_t sigmask;
+	struct task_struct *task;
 };
 
 static int signalfd_release(struct inode *inode, struct file *file)
@@ -63,14 +64,14 @@ static __poll_t signalfd_poll(struct file *file, poll_table *wait)
 	struct signalfd_ctx *ctx = file->private_data;
 	__poll_t events = 0;
 
-	poll_wait(file, &current->sighand->signalfd_wqh, wait);
+	poll_wait(file, &ctx->task->sighand->signalfd_wqh, wait);
 
-	spin_lock_irq(&current->sighand->siglock);
-	if (next_signal(&current->pending, &ctx->sigmask) ||
-	    next_signal(&current->signal->shared_pending,
+	spin_lock_irq(&ctx->task->sighand->siglock);
+	if (next_signal(&ctx->task->pending, &ctx->sigmask) ||
+	    next_signal(&ctx->task->signal->shared_pending,
 			&ctx->sigmask))
 		events |= EPOLLIN;
-	spin_unlock_irq(&current->sighand->siglock);
+	spin_unlock_irq(&ctx->task->sighand->siglock);
 
 	return events;
 }
@@ -280,6 +281,7 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
 			return -ENOMEM;
 
 		ctx->sigmask = *mask;
+		ctx->task = current;
 
 		/*
 		 * When we call this, the initialization must be complete, since

@axboe
Copy link
Owner

axboe commented Nov 14, 2019

I feel it's important to point out that it's signalfd that's broken here, it's not aio or io_uring. signalfd should be looking at the task that created the signalfd in the first place, not whatever process happens to be calling ->poll() on its behalf.

@axboe
Copy link
Owner

axboe commented Nov 14, 2019

Since this might take a bit of time to hash out, I'd recommend just using a sigset_t with io_uring_enter() instead.

@CarterLi
Copy link
Contributor Author

Since this might take a bit of time to hash out, I'd recommend just using a sigset_t with io_uring_enter() instead.

Using io_uring_enter directly is rather complex.

I can wait for it. Thanks.

@axboe
Copy link
Owner

axboe commented Nov 15, 2019

Using io_uring_enter directly is rather complex.

You don't have to use it directly, there is also io_uring_wait_cqe_timeout()
But point taken, I'll work on getting the signalfd stuff fixed upstream.

@CarterLi
Copy link
Contributor Author

CarterLi commented Nov 15, 2019

Since we now support non-iter based read/write. Does IORING_OP_READV with signalfd have similar problem like IORING_OP_POLL_ADD?

@CarterLi
Copy link
Contributor Author

I tested IORING_OP_READV with signalfd on Linux 5.4.0. It didn't work too.

#define _POSIX_C_SOURCE 1
#include <unistd.h>
#include <sys/signalfd.h>
#include <stdio.h>
#include <liburing.h>

int main() {
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);

    sigprocmask(SIG_BLOCK, &mask, NULL);
    int sfd = signalfd(-1, &mask, 0);

    struct io_uring ring;
    io_uring_queue_init(32, &ring, 0);

    struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);

    struct signalfd_siginfo res;
    struct iovec iov = { &res, sizeof(res) };
    io_uring_prep_readv(sqe, sfd, &iov, 1, 0);
    io_uring_submit(&ring);

    struct io_uring_cqe *cqe;
    io_uring_wait_cqe(&ring, &cqe);
    io_uring_cqe_seen(&ring, cqe);
    io_uring_queue_exit(&ring);

    printf("%d, %d\n", SIGINT, res.ssi_signo);
    puts("OK");
    close(sfd);
}

Any progress? @axboe

@axboe
Copy link
Owner

axboe commented Nov 27, 2019 via email

axboe pushed a commit that referenced this issue Nov 9, 2021
This is the reproducer for the kworker hang bug.

Reproduction Steps:
  1) A user task calls io_uring_queue_exit().

  2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is
     finished (do it as soon as step (1) is done).

  3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds
     elapsed.

  4) We get a complaint from the khungtaskd because the kworker is
     stuck in an uninterruptible state (D).

The kworkers waiting on ring exit are not progressing as the task
cannot proceed. When the user task is continued (e.g. get SIGCONT
after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers
then can finish the ring exit.

We need a special handling for this case to avoid khungtaskd
complaint. Currently we don't have the fix for this.

The dmesg says:

  [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds.
  [247390.432314]       Tainted: G           OE     5.15.0-stable #5
  [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432329] task:kworker/u8:2    state:D stack:    0 pid:358488 ppid:     2 flags:0x00004000
  [247390.432341] Workqueue: events_unbound io_ring_exit_work
  [247390.432354] Call Trace:
  [247390.432368]  __schedule+0x453/0x1850
  [247390.432388]  ? lock_acquire+0xc8/0x2d0
  [247390.432404]  ? usleep_range+0x90/0x90
  [247390.432412]  schedule+0x59/0xc0
  [247390.432420]  schedule_timeout+0x1aa/0x1f0
  [247390.432429]  ? mark_held_locks+0x49/0x70
  [247390.432439]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432445]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432456]  __wait_for_common+0xc2/0x170
  [247390.432473]  io_ring_exit_work+0x1d9/0x750
  [247390.432486]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432502]  ? verify_cpu+0xf0/0x100
  [247390.432520]  process_one_work+0x23b/0x550
  [247390.432540]  worker_thread+0x55/0x3c0
  [247390.432546]  ? process_one_work+0x550/0x550
  [247390.432556]  kthread+0x140/0x160
  [247390.432564]  ? set_kthread_struct+0x40/0x40
  [247390.432574]  ret_from_fork+0x1f/0x30
  [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds.
  [247390.432613]       Tainted: G           OE     5.15.0-stable #5
  [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432626] task:kworker/u8:0    state:D stack:    0 pid:359615 ppid:     2 flags:0x00004000
  [247390.432635] Workqueue: events_unbound io_ring_exit_work
  [247390.432643] Call Trace:
  [247390.432653]  __schedule+0x453/0x1850
  [247390.432676]  ? usleep_range+0x90/0x90
  [247390.432684]  schedule+0x59/0xc0
  [247390.432691]  schedule_timeout+0x1aa/0x1f0
  [247390.432700]  ? mark_held_locks+0x49/0x70
  [247390.432710]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432717]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432728]  __wait_for_common+0xc2/0x170
  [247390.432744]  io_ring_exit_work+0x1d9/0x750
  [247390.432758]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432772]  ? verify_cpu+0xf0/0x100
  [247390.432788]  process_one_work+0x23b/0x550
  [247390.432807]  worker_thread+0x55/0x3c0
  [247390.432813]  ? process_one_work+0x550/0x550
  [247390.432824]  kthread+0x140/0x160
  [247390.432830]  ? set_kthread_struct+0x40/0x40
  [247390.432839]  ret_from_fork+0x1f/0x30
  [247390.432870]
                  Showing all locks held in the system:
  [247390.432877] 1 lock held by khungtaskd/40:
  [247390.432880]  #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174
  [247390.432911] 1 lock held by in:imklog/922:
  [247390.432915]  #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60
  [247390.432977] 2 locks held by pager/318088:
  [247390.432981]  #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50
  [247390.433001]  #1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660
  [247390.433024] 1 lock held by htop/341462:
  [247390.433032] 2 locks held by kworker/u8:2/358488:
  [247390.433035]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433053]  #1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433071] 2 locks held by kworker/u8:0/359615:
  [247390.433075]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433092]  #1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433110] 1 lock held by dmesg/361178:
  [247390.433113]  #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230

  [247390.433134] =============================================

Cc: Pavel Begunkov <asml.silence@gmail.com>
Link: #448
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ammarfaizi2 added a commit to ammarfaizi2/liburing that referenced this issue Nov 23, 2021
This is the reproducer for the kworker hang bug.

Reproduction Steps:
  1) A user task calls io_uring_queue_exit().

  2) Suspend the task with SIGSTOP / SIGTRAP before the ring exit is
     finished (do it as soon as step (1) is done).

  3) Wait for `/proc/sys/kernel/hung_task_timeout_secs` seconds
     elapsed.

  4) We get a complaint from the khungtaskd because the kworker is
     stuck in an uninterruptible state (D).

The kworkers waiting on ring exit are not progressing as the task
cannot proceed. When the user task is continued (e.g. get SIGCONT
after SIGSTOP, or continue after SIGTRAP breakpoint), the kworkers
then can finish the ring exit.

We need a special handling for this case to avoid khungtaskd
complaint. Currently we don't have the fix for this.

The dmesg says:

  [247390.432294] INFO: task kworker/u8:2:358488 blocked for more than 10 seconds.
  [247390.432314]       Tainted: G           OE     5.15.0-stable axboe#5
  [247390.432322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432329] task:kworker/u8:2    state:D stack:    0 pid:358488 ppid:     2 flags:0x00004000
  [247390.432341] Workqueue: events_unbound io_ring_exit_work
  [247390.432354] Call Trace:
  [247390.432368]  __schedule+0x453/0x1850
  [247390.432388]  ? lock_acquire+0xc8/0x2d0
  [247390.432404]  ? usleep_range+0x90/0x90
  [247390.432412]  schedule+0x59/0xc0
  [247390.432420]  schedule_timeout+0x1aa/0x1f0
  [247390.432429]  ? mark_held_locks+0x49/0x70
  [247390.432439]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432445]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432456]  __wait_for_common+0xc2/0x170
  [247390.432473]  io_ring_exit_work+0x1d9/0x750
  [247390.432486]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432502]  ? verify_cpu+0xf0/0x100
  [247390.432520]  process_one_work+0x23b/0x550
  [247390.432540]  worker_thread+0x55/0x3c0
  [247390.432546]  ? process_one_work+0x550/0x550
  [247390.432556]  kthread+0x140/0x160
  [247390.432564]  ? set_kthread_struct+0x40/0x40
  [247390.432574]  ret_from_fork+0x1f/0x30
  [247390.432605] INFO: task kworker/u8:0:359615 blocked for more than 10 seconds.
  [247390.432613]       Tainted: G           OE     5.15.0-stable axboe#5
  [247390.432620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [247390.432626] task:kworker/u8:0    state:D stack:    0 pid:359615 ppid:     2 flags:0x00004000
  [247390.432635] Workqueue: events_unbound io_ring_exit_work
  [247390.432643] Call Trace:
  [247390.432653]  __schedule+0x453/0x1850
  [247390.432676]  ? usleep_range+0x90/0x90
  [247390.432684]  schedule+0x59/0xc0
  [247390.432691]  schedule_timeout+0x1aa/0x1f0
  [247390.432700]  ? mark_held_locks+0x49/0x70
  [247390.432710]  ? lockdep_hardirqs_on_prepare+0xff/0x180
  [247390.432717]  ? _raw_spin_unlock_irq+0x24/0x40
  [247390.432728]  __wait_for_common+0xc2/0x170
  [247390.432744]  io_ring_exit_work+0x1d9/0x750
  [247390.432758]  ? io_uring_del_tctx_node+0xe0/0xe0
  [247390.432772]  ? verify_cpu+0xf0/0x100
  [247390.432788]  process_one_work+0x23b/0x550
  [247390.432807]  worker_thread+0x55/0x3c0
  [247390.432813]  ? process_one_work+0x550/0x550
  [247390.432824]  kthread+0x140/0x160
  [247390.432830]  ? set_kthread_struct+0x40/0x40
  [247390.432839]  ret_from_fork+0x1f/0x30
  [247390.432870]
                  Showing all locks held in the system:
  [247390.432877] 1 lock held by khungtaskd/40:
  [247390.432880]  #0: ffffffff82976700 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x174
  [247390.432911] 1 lock held by in:imklog/922:
  [247390.432915]  #0: ffff8881041cfcf0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4a/0x60
  [247390.432977] 2 locks held by pager/318088:
  [247390.432981]  #0: ffff8881208d4898 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x50
  [247390.433001]  axboe#1: ffffc900010fd2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x49e/0x660
  [247390.433024] 1 lock held by htop/341462:
  [247390.433032] 2 locks held by kworker/u8:2/358488:
  [247390.433035]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433053]  axboe#1: ffffc90003797e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433071] 2 locks held by kworker/u8:0/359615:
  [247390.433075]  #0: ffff888100106938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433092]  axboe#1: ffffc90003597e70 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x1c1/0x550
  [247390.433110] 1 lock held by dmesg/361178:
  [247390.433113]  #0: ffff88810b5300d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x4b/0x230

  [247390.433134] =============================================

Cc: Pavel Begunkov <asml.silence@gmail.com>
Link: axboe#448
Signed-off-by: Ammar Faizi <ammar.faizi@students.amikom.ac.id>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants