Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/PG: discard msgs from down peers #17217

Merged
merged 1 commit into from Aug 30, 2017
Merged

Conversation

tchaikov
Copy link
Contributor

if a repop is replied after a replica goes down in a new osdmap, and
before the pg advances to this new osdmap, the repop replies before this
repop can be discarded by that replica OSD, because the primary resets the
connection to it when handling the new osdmap marking it down, and also
resets the messenger sesssion when the replica reconnects. to avoid the
out-of-order replies, the messages from that replica should be discarded.

Fixes: http://tracker.ceph.com/issues/19605
Signed-off-by: Kefu Chai kchai@redhat.com

// repop can be discarded by that replica OSD, because the primary resets the
// connection to it when handling the new osdmap marking it down, and also
// resets the messenger sesssion when the replica reconnects. to avoid the
// out-of-order replies, the messages from that replica should be discarded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain where the out-of-order reply comes from? AFAICS it should be fine to pretend we are still living in the past since the PG hasn't advanced yet. If the peer reconnects, it will get rejected... and if the peer restarts, any message it sends will be marked with a later epoch and we won't process it until we advance maps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas could you check the comments in http://tracker.ceph.com/issues/19605 ? i put some analysis out there.

If the peer reconnects, it will get rejected...

no, it does not get rejected.

src/osd/PG.cc Outdated
// connection to it when handling the new osdmap marking it down, and also
// resets the messenger sesssion when the replica reconnects. to avoid the
// out-of-order replies, the messages from that replica should be discarded.
if (osd->get_osdmap()->get_down_at(from) >= m->map_epoch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asserts exists(osd)... which might not be true. We need to check for that, too!

@liewegas
Copy link
Member

Okay, this seems like a decent workaround. The real problem is in the messenger/osd combination, though. The RESETSESSION in the context of osds is just.. flawed. We really want it to permanently poison the peer session so that it can't reconnect at all, but the reset isn't sufficient to do that (and it's mostly unused for osd<->osd communication already).

I think the solution involves elevating the session management out of Messenger and into OSD so that the old and new connection can be disambiguated more clearly, but that's a big project. For now, this will do.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check exists() before calling get_down_at()

if a repop is replied after a replica goes down in a new osdmap, and
before the pg advances to this new osdmap, the repop replies before this
repop can be discarded by that replica OSD, because the primary resets the
connection to it when handling the new osdmap marking it down, and also
resets the messenger sesssion when the replica reconnects. to avoid the
out-of-order replies, the messages from that replica should be discarded.

Fixes: http://tracker.ceph.com/issues/19605
Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov
Copy link
Contributor Author

@liewegas fixed and repushed.

@liewegas liewegas merged commit 328ae40 into ceph:master Aug 30, 2017
@tchaikov tchaikov deleted the wip-19605 branch August 31, 2017 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants