mon: paxos add the timeout function when peon recovery #10359

songbaisen · 2016-07-20T06:25:55Z

mon: paxos add the timeout function when peon recovery

Now the peon only have 10s from init to active.But when the leader

from init to active he may start twice or more recovery collect when

meet a higher pn than himself.It means the leader from init to active

may spend more than 10S. So it is not reasonable for the peon only have

one 10s timeout event.

Signed-off-by: song baisen song.baisen@zte.com.cn

Now the peon only have 10s from init to active.But when the leader from init to active he may start twice or more recovery collect when meet a higher pn than himself.It means the leader from init to active may spend more than 10S.So it is not reasonable for the peon only have one 10s timeout event. Signed-off-by: song baisen <song.baisen@zte.com.cn>

gregsfortytwo · 2016-08-03T05:01:45Z

I don't think this is correct -- peons can't unilaterally grant themselves more time before their leases are stale and might be outdated. We also need to timeout the leader if it gets stuck in some kind of recovery loop, so that we do another election and pick one that's better-behaved.

Did you track this down as a result of some specific behavior, or was it by inspection?

songbaisen · 2016-08-03T09:03:05Z

@gregsfortytwo HI! Thank you for review. I think the here what I change is not conflict with what you said.The main reason I change here is because as what i said above.
All the recovery is as the below produce as below
leader peon
Leader_init Peon_init
OP_COLLECT------->
<-------- OP_LAST
OP_COMMIT------->
The leader when begin to collect it will have the 10S timeout until we get all the peon last message we cancel this timeout.But before that the leader will change if there some higher pns the leader will cancel the before timeout and generate a new time event.Begin to collect to all peons again and have a new 10s timeout. But the peon from init to active only have one timeout event 10s.It is not reasonable.
I think the peon from init---->hand_collect need one timeout to avoid the leader go to down. And from handle_colloct----->active should have another one timeout event.This will reduce the elect because if the leader can not finish the recovery in 10s the peon will start the new election and fast speed the recover progress.And the peon will not lead to outdated because it have timeout for finish the recover.If in the 10s the peon can not from recovery to active, it will generate new election too.

jecluis · 2016-08-26T07:58:06Z

I've seen this behavior being triggered in the past and I agree it's something we should fix.

For instance, if the leader takes some time before sending the collect messages, we may easily incur in a scenario in which the peons only handle the messages after their lease times out, or way before they get their proper lease. There may be way too much time in-between, triggering unnecessary elections. After all, we're still in the recovery state by this time, and there are no reads or writes going through paxos, we can be a bit lenient the lease timeout.

I think this patch solves the issue and will not affect the leases. After all, the patch itself is only resetting the lease timeout, not setting the lease (Paxos::reset_lease_timeout() does not set the lease, that happens only on Paxos::handle_lease()).

Also, there's no chance that if get stuck in a loop, unless we have a misbehaved leader that keeps on sending collects for all eternity. If the leader is somehow bad, we still have a timeout set that will trigger a new election after 10 seconds. The only difference now is that we will push the timeout window 10 seconds into the future by resetting the timeout each time we get a collect message (which makes sense, given not only the leader is resetting its timeout but we're going to do IO during this period, and IO may take time).

This LGTM but some testing would be nice.

jecluis · 2016-08-26T08:05:18Z

Also, forgot to mention that this patch will still maintain the same behavior of timing out after 10 seconds if the leader does not send the collect messages. If the leader is broken somehow, the peons will timeout after 10 seconds have passed between Paxos::peon_init() and waiting for the collect messages. The only thing that's different is that the peons will reset this timeout on each collect they get.

And while on one hand I would love to use 'collect_timeout_event' instead for this, I gather that using the lease timeout is the simpler, albeit not as intuitive, approach. After all, as it has been since time immemorial, lease timeouts only exist in a peon's context, while collect timeouts only exist in the leader's context, so not changing this to collect_timeout_event is not the worst thing ever.

yuriw · 2016-10-20T21:34:48Z

http://pulpito.front.sepia.ceph.com/yuriw-2016-10-20_04:33:22-rados-wip-yuri-testing2_2016_10_19-distro-basic-smithi/
and rerun
http://pulpito.ceph.com/yuriw-2016-10-20_17:45:28-rados-wip-yuri-testing2_2016_10_19-distro-basic-smithi/

gregsfortytwo assigned jecluis Aug 24, 2016

gregsfortytwo added core needs-review labels Aug 24, 2016

jecluis added the needs-qa label Aug 26, 2016

yuriw added the wip-yuri2-testing label Sep 22, 2016

yuriw merged commit d9489d1 into ceph:master Oct 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon: paxos add the timeout function when peon recovery #10359

mon: paxos add the timeout function when peon recovery #10359

songbaisen commented Jul 20, 2016

gregsfortytwo commented Aug 3, 2016

songbaisen commented Aug 3, 2016

jecluis commented Aug 26, 2016

jecluis commented Aug 26, 2016

yuriw commented Oct 20, 2016

mon: paxos add the timeout function when peon recovery #10359

mon: paxos add the timeout function when peon recovery #10359

Conversation

songbaisen commented Jul 20, 2016

gregsfortytwo commented Aug 3, 2016

songbaisen commented Aug 3, 2016

jecluis commented Aug 26, 2016

jecluis commented Aug 26, 2016

yuriw commented Oct 20, 2016