Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bluestore/NVMEDevice.cc: fix NVMEManager thread hang #25646

Merged
merged 1 commit into from
Jan 7, 2019

Conversation

tone-zhang
Copy link
Contributor

[bluestore/NVMEDevice]: [fix NVMEManager thread halt]

When enable SPDK in Ceph and start up Ceph development cluster, met NVMEManager thread halt.

On aarch64 platform, the log as below:

Starting SPDK v18.04.1 / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid16987 ]
EAL: Detected 46 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid16987/mp_socket
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
EAL: using IOMMU type 1 (Type 1)
^C

The reason is that pthread_cond_destroy() cannot destroy the active condition_variable parameter.

Also on x86 debug builds we get the following error messages due to probe_queue_lock still being active during ~NVMEManager:

/home/ubuntu/ceph/src/common/mutex_debug.h: 114: FAILED ceph_assert(r == 0)
ceph version 14.0.1-1862-g403622b (403622b) nautilus (dev)

The change fixes the issue.

Fixes: http://tracker.ceph.com/issues/37720

Signed-off-by: tone.zhang tone.zhang@arm.com
Signed-off-by: Steve Capper steve.capper@arm.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@@ -590,7 +602,7 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
spdk_nvme_retry_count = SPDK_NVME_DEFAULT_RETRY_COUNT;

std::unique_lock l(probe_queue_lock);
while (true) {
while (should_probe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will update soon.

@tone-zhang
Copy link
Contributor Author

@tchaikov could you please have a review the patch? Thanks a lot! :)

@@ -486,6 +487,17 @@ class NVMEManager {

public:
NVMEManager() {}
~NVMEManager() {
if (!init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need to wake up the try_get() waiters, so i think we should

  1. remove the member variable of init, and instead check if the dpdk_thread is joinable, before executing dpdk_thread, and join it in the dtor only if it's joinable.
  2. set all pending ctx->done to true, before notifying all try_get() waiters in dpdk_thread.
  3. might want to rename should_probe to stopping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kefu.
For point 2, I will promote ctx as NVMEManager member variable then the destroyer can access it.

I will update the change soon. Thanks a lot!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't follow you. ctxs are elements of probe_queue. so i think we already have access to them? also, i'd recommend set them in dpdk_thread before it exits. it's more readable this way, as we notify the waiter in a single place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I mentioned another ctx. I will update code.

Thanks a lot! :)

@tone-zhang
Copy link
Contributor Author

@tchaikov Kefu, could you please have a review the update? Thanks a lot!

@@ -477,7 +477,7 @@ class NVMEManager {

private:
ceph::mutex lock = ceph::make_mutex("NVMEManager::lock");
bool init = false;
std::atomic<bool> stopping = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think stopping should be false when the NVMEManager starts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more reasonable. Thanks. Will update the following code.

return;
{
std::lock_guard guard(probe_queue_lock);
stopping = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stopping = false;
stopping = true;

@@ -590,7 +599,7 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
spdk_nvme_retry_count = SPDK_NVME_DEFAULT_RETRY_COUNT;

std::unique_lock l(probe_queue_lock);
while (true) {
while (stopping) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (stopping) {
while (!stopping) {

@@ -600,14 +609,15 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
derr << __func__ << " device probe nvme failed" << dendl;
}
ctxt->done = true;
for (auto p : probe_queue)
p->done = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't set all waiting probe context done here. you need to do so after the while (!stopping) block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will update the code soon.

@@ -605,9 +614,10 @@ int NVMEManager::try_get(const spdk_nvme_transport_id& trid, SharedDriverData **
probe_queue_cond.wait(l);
}
}
for (auto p : probe_queue)
p->done = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add

probe_queue_cond.notify_all();

to notify all waiters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov Kefu, thanks for your comments.

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from the atomic nit, lgtm

@@ -477,7 +477,7 @@ class NVMEManager {

private:
ceph::mutex lock = ceph::make_mutex("NVMEManager::lock");
bool init = false;
std::atomic<bool> stopping = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, we don't need to make this an atomic<>. as accesses to this variable are always protected by probe_queue_lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will update soon. Thanks.

for (auto p : probe_queue) {
p->done = true;
probe_queue_cond.notify_all();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tone-zhang sorry, i missed this. you might need to do the notify_all() out of the for (auto p : probe_queue) loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov thanks!

When enable SPDK in Ceph and start up Ceph development cluster, met
NVMEManager thread halt.

On aarch64 platform, the log as below:

Starting SPDK v18.04.1 / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: nvme-device-manager -c 0x1 -m 2048 --file-prefix=spdk_pid16987 ]
EAL: Detected 46 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid16987/mp_socket
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:953 spdk_nvme
EAL:   using IOMMU type 1 (Type 1)
^C

The reason is that pthread_cond_destroy() cannot destroy the active
condition_variable parameter.

Also on x86 debug builds we get the following error messages due to
probe_queue_lock still being active during ~NVMEManager:

/home/ubuntu/ceph/src/common/mutex_debug.h: 114: FAILED ceph_assert(r == 0)
 ceph version 14.0.1-1862-g403622b (403622b) nautilus (dev)

The change fixes the issue.

Fixes: http://tracker.ceph.com/issues/37720

Signed-off-by: tone.zhang <tone.zhang@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
@tchaikov
Copy link
Contributor

tchaikov commented Jan 7, 2019

retest this please.

@tone-zhang
Copy link
Contributor Author

@tchaikov Kefu, thanks a lot!

@tchaikov tchaikov changed the title bluestore/NVMEDevice.cc: fix NVMEManager thread halt bluestore/NVMEDevice.cc: fix NVMEManager thread hang Jan 7, 2019
@tchaikov tchaikov merged commit fa24a03 into ceph:master Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants