Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

Closed
ygtzf opened this issue Apr 22, 2024 · 12 comments
Closed

nvme/rdma: rdma_destroy_id deadlock when remove nvme controller #3347

ygtzf opened this issue Apr 22, 2024 · 12 comments
Assignees
Labels

Comments

@ygtzf
Copy link

ygtzf commented Apr 22, 2024

Sighting report

When I delete the nvme controller on the vhost, I often encounter the problem of vhost getting stuck.
vhost has the following logs:

spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16udxfq48htf1-disk0' still exists after delete
spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16kfmswb5u6t1-disk0' still exists after delete
spdk[487481]: bdev_nvme.c:6229:bdev_nvme_delete_complete_poll: *ERROR*: NVMe path 'nvm-16unvqhmrwmn0-disk0' still exists after delete

Through gdb, I saw that the reactor_2 thread was stuck. It seemed that the rdma_destroy_id function was deadlocked.

Thread 4 (Thread 0x7f3d3d108640 (LWP 487503) "reactor_2"):
#0  0x00007f3d3e2f039a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f3d3e2f2ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f3d400c760b in rdma_destroy_id () from /lib64/librdmacm.so.1
#3  0x0000000000593115 in nvme_rdma_qpair_destroy (rqpair=0x200065908900) at nvme_rdma.c:1913
#4  0x000000000059330b in nvme_rdma_qpair_wait_until_quiet (rqpair=0x200065908900) at nvme_rdma.c:1971
#5  0x00000000005934ab in nvme_rdma_ctrlr_disconnect_qpair_poll (ctrlr=0x2000658ff500, qpair=0x200065908900) at nvme_rdma.c:2024
#6  0x0000000000596472 in nvme_rdma_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=128, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_rdma.c:3136
Missing separate debuginfos, use: dnf debuginfo-install sina-nbs-vhost-2.0.0-1.el9.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00000000005756a1 in nvme_transport_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_transport.c:714
#8  0x0000000000588c38 in spdk_nvme_poll_group_process_completions (group=0x7f3d30442250, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_poll_group.c:157
#9  0x000000000041698b in bdev_nvme_poll (arg=0x7f3d30197b00) at bdev_nvme.c:1630
#10 0x00000000005a4de8 in thread_execute_poller (thread=0x26b5ad0, poller=0x7f3d303bfd50) at thread.c:953
#11 0x00000000005a5399 in thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1079
#12 0x00000000005a5645 in spdk_thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1163
#13 0x00000000004fb227 in _reactor_run (reactor=0x1798880) at reactor.c:914
#14 0x00000000004fb31b in reactor_run (arg=0x1798880) at reactor.c:952
#15 0x0000000000606ea1 in eal_thread_loop (arg=0x2) at ../lib/eal/common/eal_common_thread.c:212
#16 0x0000000000621832 in eal_worker_thread_loop (arg=0x2) at ../lib/eal/linux/eal.c:916
#17 0x00007f3d3e2f3802 in start_thread () from /lib64/libc.so.6
#18 0x00007f3d3e293450 in clone3 () from /lib64/libc.so.6

Expected Behavior

nvme controller can be deleted successfully and vhost runs normally.

Current Behavior

A thread of vhost is stuck

Possible Solution

Before rdma_destroy_id, there are undisconnected qpairs, or there are unprocessed events, resulting in rdma_destroy_id deadlock.

Steps to Reproduce

  1. attach nvme controller
  2. create vhost blk controller
  3. start a VM
  4. run fio
  5. stop a VM
  6. delete vhost blk controller
  7. detach nvme controller

Context (Environment including OS version, SPDK version, etc.)

OS Version:AlmaLinux release 9.1 (Lime Lynx)
Kernel Version: 5.14.0-162.23.1.el9
NVMf target Version: V23.01
Vhost Version: V24.01

@ygtzf ygtzf added the Sighting label Apr 22, 2024
@ygtzf
Copy link
Author

ygtzf commented Apr 22, 2024

In this code, if the cm event processing times out, the qpair will be recycled directly. In this case, will rdma_destroy_id keep waiting?

nvme_rdma_qpair_wait_until_quiet(struct nvme_rdma_qpair *rqpair)
{
	if (spdk_get_ticks() < rqpair->evt_timeout_ticks &&
	    (rqpair->current_num_sends != 0 ||
	     (!rqpair->srq && rqpair->rsps->current_num_recvs != 0))) {
		return -EAGAIN;
	}
	rqpair->state = NVME_RDMA_QPAIR_STATE_EXITED;

	nvme_rdma_qpair_abort_reqs(&rqpair->qpair, 0);
	nvme_rdma_qpair_destroy(rqpair);
	nvme_transport_ctrlr_disconnect_qpair_done(&rqpair->qpair);

	return 0;
}

@ygtzf
Copy link
Author

ygtzf commented May 7, 2024

@jimharris @AlekseyMarchuk When a deadlock occurs, events_completed is 4, and the function ucma_destroy_kern_id returns 5, and then it enters the condition wait. If the thread receives the event later, a deadlock will occur.
image

I don't know which cm event is not completed. Normally, the last event is RDMA_CM_EVENT_TIMEWAIT_EXIT.
image

In addition, I found that in this nvme_rdma_poll_group_process_completions function, disconnected_qpairs is processed first, and then other CM events are processed. Is it possible that the event is processed after rdma_destroy_id?

Finally, the current combination of vhost v24.01 and target v24.01 does not have this problem. Only the combination of vhost v23.01 and target v23.01 has this problem.

@AlekseyMarchuk
Copy link
Member

I can't identify a commit which could fix this problem, there were several commits related to the qpair disconnect process. What was the back trace when the issue happened? Just to double check, v24.01 works fine?

@ygtzf
Copy link
Author

ygtzf commented May 15, 2024

The back trace I can see is:

Thread 4 (Thread 0x7f3d3d108640 (LWP 487503) "reactor_2"):
#0  0x00007f3d3e2f039a in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f3d3e2f2ba0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f3d400c760b in rdma_destroy_id () from /lib64/librdmacm.so.1
#3  0x0000000000593115 in nvme_rdma_qpair_destroy (rqpair=0x200065908900) at nvme_rdma.c:1913
#4  0x000000000059330b in nvme_rdma_qpair_wait_until_quiet (rqpair=0x200065908900) at nvme_rdma.c:1971
#5  0x00000000005934ab in nvme_rdma_ctrlr_disconnect_qpair_poll (ctrlr=0x2000658ff500, qpair=0x200065908900) at nvme_rdma.c:2024
#6  0x0000000000596472 in nvme_rdma_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=128, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_rdma.c:3136
#7  0x00000000005756a1 in nvme_transport_poll_group_process_completions (tgroup=0x7f3d30440c10, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_transport.c:714
#8  0x0000000000588c38 in spdk_nvme_poll_group_process_completions (group=0x7f3d30442250, completions_per_qpair=0, disconnected_qpair_cb=0x4166b7 <bdev_nvme_disconnected_qpair_cb>) at nvme_poll_group.c:157
#9  0x000000000041698b in bdev_nvme_poll (arg=0x7f3d30197b00) at bdev_nvme.c:1630
#10 0x00000000005a4de8 in thread_execute_poller (thread=0x26b5ad0, poller=0x7f3d303bfd50) at thread.c:953
#11 0x00000000005a5399 in thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1079
#12 0x00000000005a5645 in spdk_thread_poll (thread=0x26b5ad0, max_msgs=0, now=28250992758714600) at thread.c:1163
#13 0x00000000004fb227 in _reactor_run (reactor=0x1798880) at reactor.c:914
#14 0x00000000004fb31b in reactor_run (arg=0x1798880) at reactor.c:952
#15 0x0000000000606ea1 in eal_thread_loop (arg=0x2) at ../lib/eal/common/eal_common_thread.c:212
#16 0x0000000000621832 in eal_worker_thread_loop (arg=0x2) at ../lib/eal/linux/eal.c:916
#17 0x00007f3d3e2f3802 in start_thread () from /lib64/libc.so.6
#18 0x00007f3d3e293450 in clone3 () from /lib64/libc.so.6

I ran the test for 3 days in the v24.01 environment and this problem did not occur.
I personally think that if the target version is v24.01, there should be no such problem.

@AlekseyMarchuk
Copy link
Member

Thank you for your reply. I'll check behaviour of nvme_rdma_qpair_wait_until_quiet in v23 and v24 later this week

@dwgseu
Copy link

dwgseu commented May 20, 2024

I encountered the same problem with spdk v23.01. I think the reason is the cm channel is shared between all qpairs, the cm events for one io qpairs will be polled in any other io thread.
For example, if the qpair A is in thread 1, but the RDMA_CM_EVENT_TIMEWAIT_EXIT event is polled in thread 2, if the event is slower than nvme_rdma_qpair_destroy process in thread 1, then the problem will happen.

@AlekseyMarchuk
Copy link
Member

@dwgseu Thanks for your comment! You are right, that is a race condition. But normally cm_events are polled under controller lock and nvme_rdma_qpair_destroy function was designed to be called under the lock too. But it seems we didn't lock the mutex we processed NVME_RDMA_QPAIR_STATE_LINGERING state. I'll check it and prepare a fix soon

@dwgseu
Copy link

dwgseu commented May 21, 2024

@AlekseyMarchuk Thanks a lot! Your comment reminded me, I reorganized the code logic, as you said, NVME_RDMA_QPAIR_STATE_LINGERING state is not under controller lock. And in other state, it's always under the controller lock.

@AlekseyMarchuk
Copy link
Member

Fix can be found here: https://review.spdk.io/gerrit/c/spdk/spdk/+/23324
@ygtzf @dwgseu could you verify it please if you have the chance?

@ygtzf
Copy link
Author

ygtzf commented May 23, 2024

@AlekseyMarchuk Thanks, let me test it in our environment.

@dwgseu
Copy link

dwgseu commented May 27, 2024

@AlekseyMarchuk I test with the same reproduce steps in my environment, this patch works well.

@ygtzf
Copy link
Author

ygtzf commented May 28, 2024

This patch works well, I‘m closing this issue now.

@ygtzf ygtzf closed this as completed May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 14.05
Development

No branches or pull requests

3 participants