New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg: mark daemons down on RST + ECONNREFUSED #8558
Conversation
This is more of PoC that doesn't take Async messenger into consideration (yet). Comments are welcome. |
ldout(msgr->cct, 10) << "connection refused!" << dendl; | ||
if (connection_state->clear_pipe(this)) | ||
msgr->dispatch_queue.queue_refused(connection_state.get()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a careful review, but the idea of this patch looks good.
This part isn't really safe though. We can get errors which don't directly call fault() or meander a bit before ending up here. We'll want some kind of shutdown error flag we can set at the failure site and check here or elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Tue, Apr 12, 2016 at 07:35:43AM -0700, Gregory Farnum wrote:
I haven't done a careful review, but the idea of this patch looks good.
This part isn't really safe though. We can get errors which don't directly call fault() or meander a bit before ending up here. We'll want some kind of shutdown error flag we can set at the failure site and check here or elsewhere.
I'll look into this, but I'm quite sure that ECONNREFUSED here would only
occur after attempt to connect() to port that isn't listening.
looks a interesting idea, refuse is a good semantic to decide behavior |
d49e353
to
35573de
Compare
Rebased and moved connection error checking to right after connect, that fixes some test failures. Thanks @gregsfortytwo! |
osdmap->get_inst(id), | ||
cct->_conf->osd_heartbeat_grace + 1, | ||
osdmap->get_epoch() | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's update teh message properly.. something like
diff --git a/src/messages/MOSDFailure.h b/src/messages/MOSDFailure.h index a1032e6..0ff1342 100644 --- a/src/messages/MOSDFailure.h +++ b/src/messages/MOSDFailure.h @@ -24,22 +24,34 @@ class MOSDFailure : public PaxosServiceMessage { static const int HEAD_VERSION = 3; public: + enum { + FLAG_FAILED = 1, // if set, failure; if not, recovery + FLAG_IMMEDIATE = 2, // known failure, not a timeout + }; + uuid_d fsid; entity_inst_t target_osd; - __u8 is_failed; + __u8 flags; epoch_t epoch; int32_t failed_for; // known to be failed since at least this long MOSDFailure() : PaxosServiceMessage(MSG_OSD_FAILURE, 0, HEAD_VERSION) { } MOSDFailure(const uuid_d &fs, const entity_inst_t& f, int duration, epoch_t e) : PaxosServiceMessage(MSG_OSD_FAILURE, e, HEAD_VERSION), - fsid(fs), target_osd(f), is_failed(true), epoch(e), failed_for(duration) { } + fsid(fs), target_osd(f), + flags(FLAG_FAILED), + epoch(e), failed_for(duration) { } private: ~MOSDFailure() {} public: entity_inst_t get_target() { return target_osd; } - bool if_osd_failed() { return is_failed; } + bool if_osd_failed() { + return flags & FLAG_FAILED; + } + bool is_immediate() { + return flags & FLAG_IMMEDIATE; + } epoch_t get_epoch() { return epoch; } void decode_payload() { @@ -49,9 +61,9 @@ public: ::decode(target_osd, p); ::decode(epoch, p); if (header.version >= 2) - ::decode(is_failed, p); + ::decode(flags, p); else - is_failed = true; + flags = FLAG_FAILED; if (header.version >= 3) ::decode(failed_for, p); else @@ -62,14 +74,15 @@ public: ::encode(fsid, payload); ::encode(target_osd, payload); ::encode(epoch, payload); - ::encode(is_failed, payload); + ::encode(flags, payload); ::encode(failed_for, payload); } const char *get_type_name() const { return "osd_failure"; } void print(ostream& out) const { out << "osd_failure(" - << (is_failed ? "failed " : "recovered ") + << (if_osd_failed() ? "failed " : "recovered ") + << (is_immediate() ? "immediate " : "timeout ") << target_osd << " for " << failed_for << "sec e" << epoch << " v" << version << ")"; }
then we can use this for other stuff too, like a watchdog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Fri, Apr 22, 2016 at 09:28:13AM -0700, Sage Weil wrote:
Let's update teh message properly.. something like
[..]
public:
- enum {
- FLAG_FAILED = 1, // if set, failure; if not, recovery
- FLAG_IMMEDIATE = 2, // known failure, not a timeout
- };
- [..]
then we can use this for other stuff too, like a watchdog.
And introduce other daemon failure modes, too.
Piotr Dałek
Looking at this again, I think this approach makes a lot of sense, and should be safe, too. Worst case, we mark an osd down quickly instead of slowly, which is not much of a problem. Can you break this apart into separate patches? e.g.,
|
b2765f4
to
d699678
Compare
@liewegas I have broken it into a bunch of separate commits, added new failure flag to MOSDFailure class and new option to disable new behaivior. |
back from ML, we can continue review this? |
<< ", " << cpp_strerror(rc) << dendl; | ||
<< ", " << cpp_strerror(stored_errno) << dendl; | ||
if (stored_errno == ECONNREFUSED) { | ||
ldout(msgr->cct, 10) << "connection refused!" << dendl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to set level to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go with 1 or 2
I like this better!
|
if you could add async msgr support, it would be good since it should be only two line code |
f25f438
to
470cf9c
Compare
@yuyuyu101 Done, please take a look. |
@branch-predictor I think we set peer osd id in Session instead of in Connection, it would be make more sense. |
Another question, do we really need immediate flag? If connection refused, we only send a normal MOSDFailure message. If osd down, just like heartbeat, multi osds will send MOSDFailure in handle_refuse function. So it won't occur too much latency? Because I concern the immediate flag will cause osd flipping in unknown env..... Like message delay? |
On Wed, May 25, 2016 at 03:09:05AM -0700, Haomai Wang wrote:
We set FLAG_IMMEDIATE only in ms_handle_refused as a signal that we're Piotr Dałek |
412debc
to
4873351
Compare
I felt a little nervous to pass osd id via caller code. what if use OSDMap::identify_osd(const entity_addr_t& addr) to get osd id in ms_handle_refused()? @liewegas |
hmm, I think we don't have a way let server side connection get peer id in messenger level. so I think identitfy_osd via entity_addr_t is a reasonable way |
@yuyuyu101 Yeah, I was a bit nervous about that too. That means we would drop 6841eff. |
086cd67
to
c8a9ce3
Compare
identify_osd() alone is not enough as it doesn't take hb addrs into account. I just added identify_osd_on_all_channels() to work around it, not influencing any existing users. |
@branch-predictor I am trying to rebuild with this PR and got conflicts: [yuriw@smithi018 build]$ git pull https://github.com/branch-predictor/ceph.git bp-mark-down-on-perm-rst
pls fix and assign "needs-qa" tag |
Added new callback (ms_handle_refused) to dispatchers. It is called once connection attempt fails with ECONNREFUSED. Also added dummy ms_handle_refused handlers across codebase. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
Added implementation of ms_handle_refused in OSD code, so it sends MOSDFailure message in case the peer connection fails with ECONNREFUSED *and* it is known to be up and new option "osd fast fail on connection refused" which enables or disables new behavior. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
…lure Change "is_failed" field to "flags" and use it to distinguish between timeout and immediate, known OSD failure. Then use that in OSD and MON, and make sure "min_reporters" don't affect known failures by actually going around failure heuristic code. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
This commit adds code that detects ECONNREFUSED and dispatches appropriate event further in Async messenger. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
Test checks both async and simple messenger and also checks whether disabling "osd fast fail on connection refused" option restores old behavior. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
Doesn't seem to be necessary anymore. Signed-off-by: Piotr Dałek <git@predictor.org.pl>
c8a9ce3
to
675a8de
Compare
@branch-predictor thx |
When a daemon goes down (because it is killed or suicided because of assert)
on otherwise healthy machine, TCP stack will take care of its connections
and send RST (connection reset) packets to all daemons that were connected
to downed daemon. OSDs already handle that and attempt to reconnect, with
grace timer starting to count. When the grace timer runs out and daemons
still cannot reconnect, OSD in question is marked down.
This changeset adds additional handler (handle_refused()) to the dispatchers
and code that detects when connection attempt fails with ECONNREFUSED error
(connection refused) which is a clear indication that host is alive, but
daemon isn't, so daemons can instantly mark the other side as undoubtly
downed without the need for grace timer.
This changeset also adds more info to connections so figuring out which OSD
actually failed is a bit easier.
In current state, only OSDs take advantage of handle_refused() facility,
but I don't see why other daemons shouldn't.
Signed-off-by: Piotr Dałek git@predictor.org.pl