Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Don't convert to candidate while entries are being persisted #464

Closed
wants to merge 1 commit into from

Conversation

cole-miller
Copy link
Contributor

Fixes #386, I think. This implements the second strategy from my comment there. I've added a new test and deleted an existing one that relied on the old behavior.

Signed-off-by: Cole Miller cole.miller@canonical.com

Signed-off-by: Cole Miller <cole.miller@canonical.com>
@cole-miller
Copy link
Contributor Author

please test downstream

@MathieuBordere
Copy link
Contributor

I think conceptually it makes sense to wait for the entries to persist before converting to candidate, makes the logic somewhat simpler. Will revisit PR in a bit, to have a second look and let it simmer.

@freeekanayaka
Copy link
Contributor

My gut feeling (and something I have thought before) is that uncommitted configuration changes should probably be applied immediately when they are received via an AppendEntries message, without waiting for them to be persisted. However I didn't think about that much yet.

@cole-miller
Copy link
Contributor Author

My gut feeling (and something I have thought before) is that uncommitted configuration changes should probably be applied immediately when they are received via an AppendEntries message, without waiting for them to be persisted.

Yeah, this makes some sense to me too.

@cole-miller
Copy link
Contributor Author

Marking as draft until I resolve problems from the downstream checks (https://github.com/canonical/raft/actions/runs/5720306674)

@cole-miller cole-miller marked this pull request as draft August 1, 2023 12:52
@cole-miller
Copy link
Contributor Author

Ah, need to address the handling of TimeoutNow as well.

@freeekanayaka
Copy link
Contributor

My gut feeling (and something I have thought before) is that uncommitted configuration changes should probably be applied immediately when they are received via an AppendEntries message, without waiting for them to be persisted.

Yeah, this makes some sense to me too.

The rational would be that persistency only counts for commitment, not for pending configuration changes. A configuration should be applied immediately, and it's safe to do so because only one pending configuration change is allowed at a time.

@cole-miller
Copy link
Contributor Author

@freeekanayaka I tried to go in and implement the alternative approach here that applies new configurations eagerly, but there's an issue with the handling of the commit index. We only update the commit index in appendFollowerCb (after the new entries have been successfully persisted), but that interacts poorly with updating our configuration in replicationAppend: if we're sent a batch of entries containing several configs C_1, C_2, ..., C_n, where all of these except C_n are committed, we don't want to just apply C_1 through C_n without also updating our commit index to point between C_{n - 1} and C_n, so that rollbacks will work correctly. Moving the commit index update earlier seems like a can of worms because it implicates the handling of non-config entries as well. Since I don't see a straightforward way out of this, I'm somewhat inclined to stick with the original approach, since it addresses a known bug. But maybe you can see how to avoid this problem?

@cole-miller
Copy link
Contributor Author

I guess it would be possible to decouple the last committed config/uncommitted config from the actual commit index, and update only the former in replicationAppend, but seems like we might be relying on the relationship between these in other places.

@freeekanayaka
Copy link
Contributor

I guess it would be possible to decouple the last committed config/uncommitted config from the actual commit index, and update only the former in replicationAppend, but seems like we might be relying on the relationship between these in other places.

This one feels like a workaround.

@freeekanayaka
Copy link
Contributor

@freeekanayaka I tried to go in and implement the alternative approach here that applies new configurations eagerly, but there's an issue with the handling of the commit index. We only update the commit index in appendFollowerCb (after the new entries have been successfully persisted), but that interacts poorly with updating our configuration in replicationAppend: if we're sent a batch of entries containing several configs C_1, C_2, ..., C_n, where all of these except C_n are committed, we don't want to just apply C_1 through C_n without also updating our commit index to point between C_{n - 1} and C_n, so that rollbacks will work correctly. Moving the commit index update earlier seems like a can of worms because it implicates the handling of non-config entries as well. Since I don't see a straightforward way out of this, I'm somewhat inclined to stick with the original approach, since it addresses a known bug. But maybe you can see how to avoid this problem?

I think that indeed we should update the commit index as soon as we see it, i.e. when receiving the AppendEntries message. That's because the core struct raft engine should be always up-to-date, even if I/O is lagging because of it being asynchronous. That simplifies reasoning in a lot of cases.

I actually have done work in a branch to have the commit index updated immediately, although that's part of a broader change I'm exploring (for v1). If there's interest I might try to extract that particular work around the commit index and push a PR.

More narrow alternatives that just fix the problem at hand in some way, are fine too for now. In the long term I'm generally interested in solutions that reduce the complexity of the code and the simplify reasoning around it.

@cole-miller
Copy link
Contributor Author

cole-miller commented Aug 2, 2023

More narrow alternatives that just fix the problem at hand in some way, are fine too for now. In the long term I'm generally interested in solutions that reduce the complexity of the code and the simplify reasoning around it.

I agree with this goal! And yeah, if the early commit index update can be made to work, I would support doing it that way, especially because it seems that (some of) the dqlite tests rely on followers being able to become candidates while entries are being persisted.

In the meantime, perhaps having candidates "drop out" if they apply a config change during their own election is a better stopgap fix for #386. No, I don't think this is the way to go.

@cole-miller
Copy link
Contributor Author

I might try to put together a v2 of this PR that fixes the commit index handling instead, is that okay with you @freeekanayaka?

@freeekanayaka
Copy link
Contributor

I might try to put together a v2 of this PR that fixes the commit index handling instead, is that okay with you @freeekanayaka?

Sure. FWIW freeekanayaka/raft@40c1c6e this is the commit I had. It is in the context of a broader change, so I'm not sure how cleanly it applies or if there are other aspects to take into account, but it might be a starting point if you wish to head into that direction.

@MathieuBordere
Copy link
Contributor

MathieuBordere commented Aug 3, 2023

I would probably just do what's in the paper

If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)

[Source: fig. 2 AppendEntries RPC]

I agree that the commitIndex is independent of the fact that the new log entries were persisted or not on the current node.

@freeekanayaka
Copy link
Contributor

I would probably just do what's in the paper

If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)

Yes, not sure if there are details that currently prevent that. If the current conditions are relaxed, there should be tests failing, indicating why those additional conditions have been added in the first place. With that information, it might be doable to simplify the check to match the one in the paper.

@cole-miller
Copy link
Contributor Author

Closing in favor of #465

@cole-miller cole-miller closed this Aug 4, 2023
@freeekanayaka
Copy link
Contributor

I think conceptually it makes sense to wait for the entries to persist before converting to candidate, makes the logic somewhat simpler. Will revisit PR in a bit, to have a second look and let it simmer.

I had some second thoughts about #465, and perhaps we should actually re-consider this option, and reason more broadly about this class of issues.

I don't have time right now to explore / articulate the arguments, but perhaps #465 should not be merged just yet.

I'll try to follow-up later today or tomorrow.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recvAppendEntries: Assertion r->state == RAFT_FOLLOWER || r->state == RAFT_CANDIDATE failed
3 participants