Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: bail from _committed_osd_maps inside osd_lock #15710

Merged
merged 1 commit into from Jun 19, 2017

Conversation

Projects
None yet
4 participants
@liewegas
Copy link
Member

liewegas commented Jun 15, 2017

thread A:

  • _committed_osd_maps starts
  • checks is_stopping(), false
    thread B:
  • calls shutdown()
  • takes osd_lock
  • drains/clear peering_wq, etc.
    thread A:
  • finally gets osd_lock
  • queues new peering_wq events

Eventually the dtor on the peering wq/tp asserts out.

Fix by moving the is_stopping() check inside of osd_lock
so that it is ordered wrt the shutdown() call. This
matches (almost) every other site checking is_stopping()
directly after taking osd_lock or heartbeat_lock.

Fixes: http://tracker.ceph.com/issues/20273
Signed-off-by: Sage Weil sage@redhat.com

@xiexingguo

This comment has been minimized.

Copy link
Member

xiexingguo commented Jun 16, 2017

IIRC #8267 already fixed this.

Why are we here, again?

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jun 16, 2017

@yuyuyu101

This comment has been minimized.

Copy link
Member

yuyuyu101 commented Jun 16, 2017

@liewegas yes, we need to check before and after osd_lock

osd: bail from _committed_osd_maps inside osd_lock
thread A:
 - _committed_osd_maps starts
 - checks is_stopping(), false
thread B:
 - calls shutdown()
 - takes osd_lock
 - drains/clear peering_wq, etc.
thread A:
 - finally gets osd_lock
 - queues new peering_wq events

Eventually the dtor on the peering wq/tp asserts out.

We still need to check before taking the lock to resolve a deadlock
with msgr shutdown; see aa8f2f1.

Fix by check both outside and inside of osd_lock
so that it is ordered wrt the shutdown() call.  This
matches (almost) every other site checking is_stopping()
directly after taking osd_lock or heartbeat_lock.

Fixes: http://tracker.ceph.com/issues/20273
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-20273 branch from a148401 to 0267110 Jun 16, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

liewegas commented Jun 17, 2017

retest this please

@liewegas liewegas merged commit 132c626 into ceph:master Jun 19, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
arm64 make check arm64 make check succeeded
Details
make check make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.