nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

ygtzf · 2024-04-22T12:37:07Z

Sighting report

When I delete the nvme controller on the vhost, I often encounter the problem of vhost getting stuck.
vhost has the following logs：

spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16udxfq48htf1-disk0' still exists after delete
spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16kfmswb5u6t1-disk0' still exists after delete
spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16unvqhmrwmn0-disk0' still exists after delete

Through gdb, I saw that the reactor_2 thread was stuck. It seemed that the rdma_destroy_id function was deadlocked.

Thread 4 (Thread 0x7f3d3d108640 (LWP 487503) "reactor_2"):
#0  0x00007f3d3e2f039a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f3d3e2f2ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f3d400c760b in rdma_destroy_id () from /lib64/librdmacm.so.1
#3  0x0000000000593115 in nvme_rdma_qpair_destroy (rqpair=0x200065908900) at nvme_rdma.c:1913
#4  0x000000000059330b in nvme_rdma_qpair_wait_until_quiet (rqpair=0x200065908900) at nvme_rdma.c:1971
#5  0x00000000005934ab in nvme_rdma_ctrlr_disconnect_qpair_poll (ctrlr=0x2000658ff500, qpair=0x200065908900) at nvme_rdma.c:2024
#6  0x0000000000596472 in nvme_rdma_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=128, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_rdma.c:3136
Missing separate debuginfos, use: dnf debuginfo-install sina-nbs-vhost-2.0.0-1.el9.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00000000005756a1 in nvme_transport_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_transport.c:714
#8  0x0000000000588c38 in spdk_nvme_poll_group_process_completions (group=0x7f3d30442250, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_poll_group.c:157
#9  0x000000000041698b in bdev_nvme_poll (arg=0x7f3d30197b00) at bdev_nvme.c:1630
#10 0x00000000005a4de8 in thread_execute_poller (thread=0x26b5ad0, poller=0x7f3d303bfd50) at thread.c:953
#11 0x00000000005a5399 in thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1079
#12 0x00000000005a5645 in spdk_thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1163
#13 0x00000000004fb227 in _reactor_run (reactor=0x1798880) at reactor.c:914
#14 0x00000000004fb31b in reactor_run (arg=0x1798880) at reactor.c:952
#15 0x0000000000606ea1 in eal_thread_loop (arg=0x2) at ../lib/eal/common/eal_common_thread.c:212
#16 0x0000000000621832 in eal_worker_thread_loop (arg=0x2) at ../lib/eal/linux/eal.c:916
#17 0x00007f3d3e2f3802 in start_thread () from /lib64/libc.so.6
#18 0x00007f3d3e293450 in clone3 () from /lib64/libc.so.6

Expected Behavior

nvme controller can be deleted successfully and vhost runs normally.

Current Behavior

A thread of vhost is stuck

Possible Solution

Before rdma_destroy_id, there are undisconnected qpairs, or there are unprocessed events, resulting in rdma_destroy_id deadlock.

Steps to Reproduce

attach nvme controller
create vhost blk controller
start a VM
run fio
stop a VM
delete vhost blk controller
detach nvme controller

Context (Environment including OS version, SPDK version, etc.)

OS Version：AlmaLinux release 9.1 (Lime Lynx)
Kernel Version: 5.14.0-162.23.1.el9
NVMf target Version: V23.01
Vhost Version: V24.01

The text was updated successfully, but these errors were encountered:

ygtzf · 2024-04-22T12:41:58Z

In this code, if the cm event processing times out, the qpair will be recycled directly. In this case, will rdma_destroy_id keep waiting?

nvme_rdma_qpair_wait_until_quiet(struct nvme_rdma_qpair *rqpair)
{
	if (spdk_get_ticks() < rqpair->evt_timeout_ticks &&
	    (rqpair->current_num_sends != 0 ||
	     (!rqpair->srq && rqpair->rsps->current_num_recvs != 0))) {
		return -EAGAIN;
	}
	rqpair->state = NVME_RDMA_QPAIR_STATE_EXITED;

	nvme_rdma_qpair_abort_reqs(&rqpair->qpair, 0);
	nvme_rdma_qpair_destroy(rqpair);
	nvme_transport_ctrlr_disconnect_qpair_done(&rqpair->qpair);

	return 0;
}

ygtzf · 2024-05-07T12:12:30Z

@jimharris @AlekseyMarchuk When a deadlock occurs, events_completed is 4, and the function ucma_destroy_kern_id returns 5, and then it enters the condition wait. If the thread receives the event later, a deadlock will occur.

I don't know which cm event is not completed. Normally, the last event is RDMA_CM_EVENT_TIMEWAIT_EXIT.

In addition, I found that in this nvme_rdma_poll_group_process_completions function, disconnected_qpairs is processed first, and then other CM events are processed. Is it possible that the event is processed after rdma_destroy_id?

Finally, the current combination of vhost v24.01 and target v24.01 does not have this problem. Only the combination of vhost v23.01 and target v23.01 has this problem.

AlekseyMarchuk · 2024-05-13T06:12:19Z

I can't identify a commit which could fix this problem, there were several commits related to the qpair disconnect process. What was the back trace when the issue happened? Just to double check, v24.01 works fine?

ygtzf · 2024-05-15T01:39:36Z

The back trace I can see is：

Thread 4 (Thread 0x7f3d3d108640 (LWP 487503) "reactor_2"):
#0  0x00007f3d3e2f039a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f3d3e2f2ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f3d400c760b in rdma_destroy_id () from /lib64/librdmacm.so.1
#3  0x0000000000593115 in nvme_rdma_qpair_destroy (rqpair=0x200065908900) at nvme_rdma.c:1913
#4  0x000000000059330b in nvme_rdma_qpair_wait_until_quiet (rqpair=0x200065908900) at nvme_rdma.c:1971
#5  0x00000000005934ab in nvme_rdma_ctrlr_disconnect_qpair_poll (ctrlr=0x2000658ff500, qpair=0x200065908900) at nvme_rdma.c:2024
#6  0x0000000000596472 in nvme_rdma_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=128, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_rdma.c:3136
#7  0x00000000005756a1 in nvme_transport_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_transport.c:714
#8  0x0000000000588c38 in spdk_nvme_poll_group_process_completions (group=0x7f3d30442250, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_poll_group.c:157
#9  0x000000000041698b in bdev_nvme_poll (arg=0x7f3d30197b00) at bdev_nvme.c:1630
#10 0x00000000005a4de8 in thread_execute_poller (thread=0x26b5ad0, poller=0x7f3d303bfd50) at thread.c:953
#11 0x00000000005a5399 in thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1079
#12 0x00000000005a5645 in spdk_thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1163
#13 0x00000000004fb227 in _reactor_run (reactor=0x1798880) at reactor.c:914
#14 0x00000000004fb31b in reactor_run (arg=0x1798880) at reactor.c:952
#15 0x0000000000606ea1 in eal_thread_loop (arg=0x2) at ../lib/eal/common/eal_common_thread.c:212
#16 0x0000000000621832 in eal_worker_thread_loop (arg=0x2) at ../lib/eal/linux/eal.c:916
#17 0x00007f3d3e2f3802 in start_thread () from /lib64/libc.so.6
#18 0x00007f3d3e293450 in clone3 () from /lib64/libc.so.6

I ran the test for 3 days in the v24.01 environment and this problem did not occur.
I personally think that if the target version is v24.01, there should be no such problem.

AlekseyMarchuk · 2024-05-15T12:14:21Z

Thank you for your reply. I'll check behaviour of nvme_rdma_qpair_wait_until_quiet in v23 and v24 later this week

dwgseu · 2024-05-20T02:05:18Z

I encountered the same problem with spdk v23.01. I think the reason is the cm channel is shared between all qpairs, the cm events for one io qpairs will be polled in any other io thread.
For example, if the qpair A is in thread 1, but the RDMA_CM_EVENT_TIMEWAIT_EXIT event is polled in thread 2, if the event is slower than nvme_rdma_qpair_destroy process in thread 1, then the problem will happen.

AlekseyMarchuk · 2024-05-21T06:17:52Z

@dwgseu Thanks for your comment! You are right, that is a race condition. But normally cm_events are polled under controller lock and nvme_rdma_qpair_destroy function was designed to be called under the lock too. But it seems we didn't lock the mutex we processed NVME_RDMA_QPAIR_STATE_LINGERING state. I'll check it and prepare a fix soon

dwgseu · 2024-05-21T12:00:45Z

@AlekseyMarchuk Thanks a lot! Your comment reminded me, I reorganized the code logic, as you said, NVME_RDMA_QPAIR_STATE_LINGERING state is not under controller lock. And in other state, it's always under the controller lock.

AlekseyMarchuk · 2024-05-22T13:45:29Z

Fix can be found here: https://review.spdk.io/gerrit/c/spdk/spdk/+/23324
@ygtzf @dwgseu could you verify it please if you have the chance?

ygtzf · 2024-05-23T02:50:06Z

@AlekseyMarchuk Thanks, let me test it in our environment.

dwgseu · 2024-05-27T01:57:05Z

@AlekseyMarchuk I test with the same reproduce steps in my environment, this patch works well.

ygtzf · 2024-05-28T11:12:32Z

This patch works well, I‘m closing this issue now.

ygtzf added the Sighting label Apr 22, 2024

jimharris assigned AlekseyMarchuk Apr 30, 2024

AlekseyMarchuk added bug Medium Medium Priority Bug Patches out for review and removed Sighting labels May 22, 2024

ygtzf closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

ygtzf commented Apr 22, 2024

ygtzf commented Apr 22, 2024

ygtzf commented May 7, 2024 •

edited

AlekseyMarchuk commented May 13, 2024

ygtzf commented May 15, 2024

AlekseyMarchuk commented May 15, 2024

dwgseu commented May 20, 2024

AlekseyMarchuk commented May 21, 2024

dwgseu commented May 21, 2024

AlekseyMarchuk commented May 22, 2024

ygtzf commented May 23, 2024

dwgseu commented May 27, 2024

ygtzf commented May 28, 2024

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

Comments

ygtzf commented Apr 22, 2024

Sighting report

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment including OS version, SPDK version, etc.)

ygtzf commented Apr 22, 2024

ygtzf commented May 7, 2024 • edited

AlekseyMarchuk commented May 13, 2024

ygtzf commented May 15, 2024

AlekseyMarchuk commented May 15, 2024

dwgseu commented May 20, 2024

AlekseyMarchuk commented May 21, 2024

dwgseu commented May 21, 2024

AlekseyMarchuk commented May 22, 2024

ygtzf commented May 23, 2024

dwgseu commented May 27, 2024

ygtzf commented May 28, 2024

ygtzf commented May 7, 2024 •

edited