Skip to content

Commit

Permalink
Debug zio_wait() hangs (executor/waiter)
Browse files Browse the repository at this point in the history
There exists a plausible cache concurrency issue with zio_wait().
This might occur because the zio->io_waiter to not assigned under
a lock in zio_wait(), is not checked under a lock in zio_done(),
and the zio may be dispatched to another thread for handling.
That said, none of the actual crash dumps I've looked at show
that this has ever occurred.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2523
  • Loading branch information
behlendorf committed Sep 8, 2014
1 parent 59e9418 commit c034020
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion module/zfs/zio.c
Original file line number Diff line number Diff line change
Expand Up @@ -1377,7 +1377,9 @@ __attribute__((always_inline))
static inline void
__zio_execute(zio_t *zio)
{
mutex_enter(&zio->io_lock);
zio->io_executor = curthread;
mutex_exit(&zio->io_lock);

while (zio->io_stage < ZIO_STAGE_DONE) {
enum zio_stage pipeline = zio->io_pipeline;
Expand Down Expand Up @@ -1455,7 +1457,9 @@ zio_wait(zio_t *zio)
ASSERT(zio->io_stage == ZIO_STAGE_OPEN);
ASSERT(zio->io_executor == NULL);

mutex_enter(&zio->io_lock);
zio->io_waiter = curthread;
mutex_exit(&zio->io_lock);

__zio_execute(zio);

Expand Down Expand Up @@ -3316,12 +3320,13 @@ zio_done(zio_t *zio)
zio_notify_parent(pio, zio, ZIO_WAIT_DONE);
}

mutex_enter(&zio->io_lock);
if (zio->io_waiter != NULL) {
mutex_enter(&zio->io_lock);
zio->io_executor = NULL;
cv_broadcast(&zio->io_cv);
mutex_exit(&zio->io_lock);
} else {
mutex_exit(&zio->io_lock);
zio_destroy(zio);
}

Expand Down

1 comment on commit c034020

@nedbass
Copy link

@nedbass nedbass commented on c034020 Sep 8, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good. My only concern is that io_waiter and io_executor are checked outside io_lock in a few other places, so the concurrency rules for these fields may become unclear with this change.

Please sign in to comment.