bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

tone-zhang · 2018-12-20T10:29:02Z

[bluestore/NVMEDevice]: [fix NVMEManager thread halt]

When enable SPDK in Ceph and start up Ceph development cluster, met NVMEManager thread halt.

On aarch64 platform, the log as below:

Starting SPDK v18.04.1 / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid16987 ]
EAL: Detected 46 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid16987/mp_socket
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: using IOMMU type 1 (Type 1)
^C

The reason is that pthread_cond_destroy() cannot destroy the active condition_variable parameter.

Also on x86 debug builds we get the following error messages due to probe_queue_lock still being active during ~NVMEManager:

/home/ubuntu/ceph/src/common/mutex_debug.h: 114: FAILED ceph_assert(r == 0)
ceph version 14.0.1-1862-g403622b (403622b) nautilus (dev)

The change fixes the issue.

Fixes: http://tracker.ceph.com/issues/37720

Signed-off-by: tone.zhang tone.zhang@arm.com
Signed-off-by: Steve Capper steve.capper@arm.com

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

tchaikov · 2018-12-20T11:17:44Z

src/os/bluestore/NVMEDevice.cc

@@ -590,7 +602,7 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
          spdk_nvme_retry_count = SPDK_NVME_DEFAULT_RETRY_COUNT;

        std::unique_lock l(probe_queue_lock);
-        while (true) {
+        while (should_probe {


this does not compile.

Thanks. Will update soon.

tone-zhang · 2019-01-02T00:27:44Z

@tchaikov could you please have a review the patch? Thanks a lot! :)

tchaikov · 2019-01-02T03:50:54Z

src/os/bluestore/NVMEDevice.cc

@@ -486,6 +487,17 @@ class NVMEManager {

 public:
  NVMEManager() {}
+  ~NVMEManager() {
+    if (!init)


we still need to wake up the try_get() waiters, so i think we should

remove the member variable of init, and instead check if the dpdk_thread is joinable, before executing dpdk_thread, and join it in the dtor only if it's joinable.

set all pending ctx->done to true, before notifying all try_get() waiters in dpdk_thread.

might want to rename should_probe to stopping.

Thanks Kefu.
For point 2, I will promote ctx as NVMEManager member variable then the destroyer can access it.

I will update the change soon. Thanks a lot!

i don't follow you. ctxs are elements of probe_queue. so i think we already have access to them? also, i'd recommend set them in dpdk_thread before it exits. it's more readable this way, as we notify the waiter in a single place.

Got it. I mentioned another ctx. I will update code.

Thanks a lot! :)

tone-zhang · 2019-01-04T06:23:37Z

@tchaikov Kefu, could you please have a review the update? Thanks a lot!

tchaikov · 2019-01-04T06:25:54Z

src/os/bluestore/NVMEDevice.cc

@@ -477,7 +477,7 @@ class NVMEManager {

 private:
  ceph::mutex lock = ceph::make_mutex("NVMEManager::lock");
-  bool init = false;
+  std::atomic<bool> stopping = true;


i think stopping should be false when the NVMEManager starts?

It is more reasonable. Thanks. Will update the following code.

tchaikov · 2019-01-04T06:26:15Z

src/os/bluestore/NVMEDevice.cc

+      return;
+    {
+      std::lock_guard guard(probe_queue_lock);
+      stopping = false;


Suggested change

stopping = false;

stopping = true;

tchaikov · 2019-01-04T06:26:29Z

src/os/bluestore/NVMEDevice.cc

@@ -590,7 +599,7 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
          spdk_nvme_retry_count = SPDK_NVME_DEFAULT_RETRY_COUNT;

        std::unique_lock l(probe_queue_lock);
-        while (true) {
+        while (stopping) {


Suggested change

while (stopping) {

while (!stopping) {

tchaikov · 2019-01-04T06:28:52Z

src/os/bluestore/NVMEDevice.cc

@@ -600,14 +609,15 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
              derr << __func__ << " device probe nvme failed" << dendl;
            }
            ctxt->done = true;
+            for (auto p : probe_queue)
+              p->done = true;


please don't set all waiting probe context done here. you need to do so after the while (!stopping) block.

Thanks. Will update the code soon.

tchaikov · 2019-01-04T13:05:08Z

src/os/bluestore/NVMEDevice.cc

@@ -605,9 +614,10 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
            probe_queue_cond.wait(l);
          }
        }
+        for (auto p : probe_queue)
+          p->done = true;


please add

probe_queue_cond.notify_all();

to notify all waiters.

@tchaikov Kefu, thanks for your comments.

tchaikov

aside from the atomic nit, lgtm

tchaikov · 2019-01-07T02:26:42Z

src/os/bluestore/NVMEDevice.cc

@@ -477,7 +477,7 @@ class NVMEManager {

 private:
  ceph::mutex lock = ceph::make_mutex("NVMEManager::lock");
-  bool init = false;
+  std::atomic<bool> stopping = false;


nit, we don't need to make this an atomic<>. as accesses to this variable are always protected by probe_queue_lock.

Ack. Will update soon. Thanks.

tchaikov · 2019-01-07T03:11:33Z

src/os/bluestore/NVMEDevice.cc

+        for (auto p : probe_queue) {
+          p->done = true;
+          probe_queue_cond.notify_all();
+        }


@tone-zhang sorry, i missed this. you might need to do the notify_all() out of the for (auto p : probe_queue) loop.

@tchaikov thanks!

When enable SPDK in Ceph and start up Ceph development cluster, met NVMEManager thread halt. On aarch64 platform, the log as below: Starting SPDK v18.04.1 / DPDK 18.05.0 initialization... [ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid16987 ] EAL: Detected 46 lcore(s) EAL: Detected 1 NUMA nodes EAL: Multi-process socket /var/run/dpdk/spdk_pid16987/mp_socket EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:953 spdk_nvme EAL: using IOMMU type 1 (Type 1) ^C The reason is that pthread_cond_destroy() cannot destroy the active condition_variable parameter. Also on x86 debug builds we get the following error messages due to probe_queue_lock still being active during ~NVMEManager: /home/ubuntu/ceph/src/common/mutex_debug.h: 114: FAILED ceph_assert(r == 0) ceph version 14.0.1-1862-g403622b (403622b) nautilus (dev) The change fixes the issue. Fixes: http://tracker.ceph.com/issues/37720 Signed-off-by: tone.zhang <tone.zhang@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com>

tchaikov · 2019-01-07T06:08:23Z

retest this please.

tone-zhang · 2019-01-07T07:45:49Z

@tchaikov Kefu, thanks a lot!

tchaikov reviewed Dec 20, 2018

View reviewed changes

batrick added needs-review bluestore labels Dec 20, 2018

tone-zhang force-pushed the nvme-hang branch from 7baf182 to 5d3ae13 Compare December 21, 2018 00:16

tchaikov requested review from yuyuyu101 and tchaikov December 23, 2018 15:22

yuyuyu101 approved these changes Dec 23, 2018

View reviewed changes

tchaikov reviewed Jan 2, 2019

View reviewed changes

tone-zhang force-pushed the nvme-hang branch from 5d3ae13 to fedad86 Compare January 3, 2019 10:24

tchaikov reviewed Jan 4, 2019

View reviewed changes

tone-zhang force-pushed the nvme-hang branch from fedad86 to 3994b26 Compare January 4, 2019 07:09

tchaikov reviewed Jan 4, 2019

View reviewed changes

tone-zhang force-pushed the nvme-hang branch from 3994b26 to ab4cf1a Compare January 7, 2019 01:17

tchaikov reviewed Jan 7, 2019

View reviewed changes

tone-zhang force-pushed the nvme-hang branch from ab4cf1a to 67a8c9a Compare January 7, 2019 02:35

tchaikov reviewed Jan 7, 2019

View reviewed changes

tone-zhang force-pushed the nvme-hang branch from 67a8c9a to 4c0fb6c Compare January 7, 2019 03:16

tchaikov approved these changes Jan 7, 2019

View reviewed changes

tchaikov changed the title ~~bluestore/NVMEDevice.cc: fix NVMEManager thread halt~~ bluestore/NVMEDevice.cc: fix NVMEManager thread hang Jan 7, 2019

tchaikov merged commit fa24a03 into ceph:master Jan 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

tone-zhang commented Dec 20, 2018

tchaikov Dec 20, 2018

tone-zhang Dec 21, 2018

tone-zhang commented Jan 2, 2019

tchaikov Jan 2, 2019

tone-zhang Jan 3, 2019

tchaikov Jan 3, 2019

tone-zhang Jan 3, 2019

tone-zhang commented Jan 4, 2019

tchaikov Jan 4, 2019

tone-zhang Jan 4, 2019

tchaikov Jan 4, 2019

tchaikov Jan 4, 2019

tchaikov Jan 4, 2019

tone-zhang Jan 4, 2019

tchaikov Jan 4, 2019

tone-zhang Jan 7, 2019

tchaikov left a comment

tchaikov Jan 7, 2019

tone-zhang Jan 7, 2019

tchaikov Jan 7, 2019

tone-zhang Jan 7, 2019

tchaikov commented Jan 7, 2019

tone-zhang commented Jan 7, 2019

bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

Conversation

tone-zhang commented Dec 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tone-zhang commented Jan 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tone-zhang commented Jan 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Jan 7, 2019

tone-zhang commented Jan 7, 2019