New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: osd: race condition opening heartbeat connection #25026

Merged
merged 2 commits into from Nov 26, 2018

Conversation

@smithfarm
Copy link
Contributor

smithfarm commented Nov 10, 2018

@smithfarm smithfarm self-assigned this Nov 10, 2018

@smithfarm smithfarm added this to the mimic milestone Nov 10, 2018

@smithfarm smithfarm requested review from liewegas , neha-ojha and gregsfortytwo Nov 10, 2018

@@ -4756,9 +4756,9 @@ void OSD::heartbeat()

bool OSD::heartbeat_reset(Connection *con)
{
std::lock_guard l(heartbeat_lock);

This comment has been minimized.

@smithfarm

smithfarm Nov 10, 2018

Contributor

According to https://en.cppreference.com/w/cpp/thread/lock_guard std::lock_guard was introduced in C++17?

I guess this won't work in mimic, and definitely not in luminous.

This comment has been minimized.

@liewegas

liewegas Nov 10, 2018

Member

use Mutex::Locker instead

osd: take heartbeat_lock before checking for session
When we open a connection, there is a short window before we attach
the session.  If a fault happens quickly, we won't get the reset, and
will persistently fail to send osd pings.

Move the lock up to avoid this.  Note that we should rarely really see
connections without sessions here anyway (except when this specific
race happens), so this should have no negative impact (by taking the lock
when we weren't before).

Fixes: http://tracker.ceph.com/issues/36602
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 51d8e24)

Conflicts:
    - src/osd/OSD.cc
- use Mutex::Locker instead of std::lock_guard (latter is a C++17ism)

@smithfarm smithfarm force-pushed the smithfarm:wip-36637-mimic branch from 2c276ed to 8bb61f7 Nov 10, 2018

Show resolved Hide resolved src/osd/OSD.cc Outdated
osd: fix heartbeat_reset unlock
Fixes 51d8e24, which moved to lock_guard
but didn't remove the unlock call on this exit path.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 1a0e2f7)
@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Nov 10, 2018

@gregsfortytwo Fixed - please take a look. And thanks!

@gregsfortytwo
Copy link
Member

gregsfortytwo left a comment

Looks good now. :)

@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Nov 16, 2018

@yuriw yuriw merged commit 403f8cc into ceph:mimic Nov 26, 2018

4 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details

@smithfarm smithfarm deleted the smithfarm:wip-36637-mimic branch Nov 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment