-
Notifications
You must be signed in to change notification settings - Fork 136
Don't convert to candidate while entries are being persisted #464
Conversation
Signed-off-by: Cole Miller <cole.miller@canonical.com>
please test downstream |
I think conceptually it makes sense to wait for the entries to persist before converting to candidate, makes the logic somewhat simpler. Will revisit PR in a bit, to have a second look and let it simmer. |
My gut feeling (and something I have thought before) is that uncommitted configuration changes should probably be applied immediately when they are received via an AppendEntries message, without waiting for them to be persisted. However I didn't think about that much yet. |
Yeah, this makes some sense to me too. |
Marking as draft until I resolve problems from the downstream checks (https://github.com/canonical/raft/actions/runs/5720306674) |
Ah, need to address the handling of TimeoutNow as well. |
The rational would be that persistency only counts for commitment, not for pending configuration changes. A configuration should be applied immediately, and it's safe to do so because only one pending configuration change is allowed at a time. |
@freeekanayaka I tried to go in and implement the alternative approach here that applies new configurations eagerly, but there's an issue with the handling of the commit index. We only update the commit index in appendFollowerCb (after the new entries have been successfully persisted), but that interacts poorly with updating our configuration in replicationAppend: if we're sent a batch of entries containing several configs C_1, C_2, ..., C_n, where all of these except C_n are committed, we don't want to just apply C_1 through C_n without also updating our commit index to point between C_{n - 1} and C_n, so that rollbacks will work correctly. Moving the commit index update earlier seems like a can of worms because it implicates the handling of non-config entries as well. Since I don't see a straightforward way out of this, I'm somewhat inclined to stick with the original approach, since it addresses a known bug. But maybe you can see how to avoid this problem? |
I guess it would be possible to decouple the last committed config/uncommitted config from the actual commit index, and update only the former in replicationAppend, but seems like we might be relying on the relationship between these in other places. |
This one feels like a workaround. |
I think that indeed we should update the commit index as soon as we see it, i.e. when receiving the AppendEntries message. That's because the core I actually have done work in a branch to have the commit index updated immediately, although that's part of a broader change I'm exploring (for v1). If there's interest I might try to extract that particular work around the commit index and push a PR. More narrow alternatives that just fix the problem at hand in some way, are fine too for now. In the long term I'm generally interested in solutions that reduce the complexity of the code and the simplify reasoning around it. |
I agree with this goal! And yeah, if the early commit index update can be made to work, I would support doing it that way, especially because it seems that (some of) the dqlite tests rely on followers being able to become candidates while entries are being persisted.
|
I might try to put together a v2 of this PR that fixes the commit index handling instead, is that okay with you @freeekanayaka? |
Sure. FWIW freeekanayaka/raft@40c1c6e this is the commit I had. It is in the context of a broader change, so I'm not sure how cleanly it applies or if there are other aspects to take into account, but it might be a starting point if you wish to head into that direction. |
I would probably just do what's in the paper
[Source: fig. 2 AppendEntries RPC] I agree that the commitIndex is independent of the fact that the new log entries were persisted or not on the current node. |
Yes, not sure if there are details that currently prevent that. If the current conditions are relaxed, there should be tests failing, indicating why those additional conditions have been added in the first place. With that information, it might be doable to simplify the check to match the one in the paper. |
Closing in favor of #465 |
I had some second thoughts about #465, and perhaps we should actually re-consider this option, and reason more broadly about this class of issues. I don't have time right now to explore / articulate the arguments, but perhaps #465 should not be merged just yet. I'll try to follow-up later today or tomorrow. |
Fixes #386, I think. This implements the second strategy from my comment there. I've added a new test and deleted an existing one that relied on the old behavior.
Signed-off-by: Cole Miller cole.miller@canonical.com