Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: paxos add the timeout function when peon recovery #10359

Merged
merged 1 commit into from
Oct 20, 2016

Conversation

songbaisen
Copy link

mon: paxos add the timeout function when peon recovery

Now the peon only have 10s from init to active.But when the leader

from init to active he may start twice or more recovery collect when

meet a higher pn than himself.It means the leader from init to active

may spend more than 10S. So it is not reasonable for the peon only have

one 10s timeout event.

Signed-off-by: song baisen song.baisen@zte.com.cn

 Now the peon only have 10s from init to active.But when the leader

 from init to active he may start twice or more recovery collect when

 meet a higher pn than himself.It means the leader from init to active

 may spend more than 10S.So it is not reasonable for the peon only have

 one 10s timeout event.

Signed-off-by: song baisen <song.baisen@zte.com.cn>
@gregsfortytwo
Copy link
Member

I don't think this is correct -- peons can't unilaterally grant themselves more time before their leases are stale and might be outdated. We also need to timeout the leader if it gets stuck in some kind of recovery loop, so that we do another election and pick one that's better-behaved.

Did you track this down as a result of some specific behavior, or was it by inspection?

@songbaisen
Copy link
Author

@gregsfortytwo HI! Thank you for review. I think the here what I change is not conflict with what you said.The main reason I change here is because as what i said above.
All the recovery is as the below produce as below
leader peon
Leader_init Peon_init
OP_COLLECT------->
<-------- OP_LAST
OP_COMMIT------->
The leader when begin to collect it will have the 10S timeout until we get all the peon last message we cancel this timeout.But before that the leader will change if there some higher pns the leader will cancel the before timeout and generate a new time event.Begin to collect to all peons again and have a new 10s timeout. But the peon from init to active only have one timeout event 10s.It is not reasonable.
I think the peon from init---->hand_collect need one timeout to avoid the leader go to down. And from handle_colloct----->active should have another one timeout event.This will reduce the elect because if the leader can not finish the recovery in 10s the peon will start the new election and fast speed the recover progress.And the peon will not lead to outdated because it have timeout for finish the recover.If in the 10s the peon can not from recovery to active, it will generate new election too.

@jecluis
Copy link
Member

jecluis commented Aug 26, 2016

I've seen this behavior being triggered in the past and I agree it's something we should fix.

For instance, if the leader takes some time before sending the collect messages, we may easily incur in a scenario in which the peons only handle the messages after their lease times out, or way before they get their proper lease. There may be way too much time in-between, triggering unnecessary elections. After all, we're still in the recovery state by this time, and there are no reads or writes going through paxos, we can be a bit lenient the lease timeout.

I think this patch solves the issue and will not affect the leases. After all, the patch itself is only resetting the lease timeout, not setting the lease (Paxos::reset_lease_timeout() does not set the lease, that happens only on Paxos::handle_lease()).

Also, there's no chance that if get stuck in a loop, unless we have a misbehaved leader that keeps on sending collects for all eternity. If the leader is somehow bad, we still have a timeout set that will trigger a new election after 10 seconds. The only difference now is that we will push the timeout window 10 seconds into the future by resetting the timeout each time we get a collect message (which makes sense, given not only the leader is resetting its timeout but we're going to do IO during this period, and IO may take time).

This LGTM but some testing would be nice.

@jecluis
Copy link
Member

jecluis commented Aug 26, 2016

Also, forgot to mention that this patch will still maintain the same behavior of timing out after 10 seconds if the leader does not send the collect messages. If the leader is broken somehow, the peons will timeout after 10 seconds have passed between Paxos::peon_init() and waiting for the collect messages. The only thing that's different is that the peons will reset this timeout on each collect they get.

And while on one hand I would love to use 'collect_timeout_event' instead for this, I gather that using the lease timeout is the simpler, albeit not as intuitive, approach. After all, as it has been since time immemorial, lease timeouts only exist in a peon's context, while collect timeouts only exist in the leader's context, so not changing this to collect_timeout_event is not the worst thing ever.

@yuriw yuriw merged commit d9489d1 into ceph:master Oct 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants