Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshotting and log compaction #32

Merged
merged 4 commits into from Jan 8, 2022
Merged

Snapshotting and log compaction #32

merged 4 commits into from Jan 8, 2022

Conversation

andreev-io
Copy link
Owner

@andreev-io andreev-io commented Dec 16, 2021

This PR implements section 7 of the Raft paper, i.e. Log Compaction.
Now, Replicas will merge logs into snapshots to avoid accumulating too
many logs that hoard memory. Replicas now support InstallSnapshot RPCs,
meaning that a Leader can detect when a Follower is so far behind that
the logs it's missing have already been compacted, and then transmit the
snapshot to the follower instead of transmitting a particular log entry.
Non-leader nodes now know how to treat and respond to InstallSnapshot
RPCs.

The StateMachine trait has been extended to provide Little Raft user
with ability to save and load state to and from permanent storage. The
raft_unstable test has been updated to randomly drop and shuffle
messages, causing Replicas to retransmit snapshots, asserting the
behavior of the new InstallSnapshot RPC.

@andreev-io andreev-io linked an issue Dec 17, 2021 that may be closed by this pull request
@andreev-io andreev-io marked this pull request as draft December 17, 2021 17:21
@andreev-io andreev-io force-pushed the ilya/snapshotting branch 2 times, most recently from 65c8ba0 to 9a2f45e Compare December 18, 2021 12:07
@andreev-io andreev-io marked this pull request as ready for review December 18, 2021 12:09
@andreev-io andreev-io changed the title WIP snapshotting Snapshotting Dec 18, 2021
@andreev-io andreev-io changed the title Snapshotting Snapshotting and log compaction Dec 18, 2021
@andreev-io
Copy link
Owner Author

@penberg This is ready for review. Please let me know your thoughts on the new methods in the StateMachine trait and if you get a chance to test this version of Little Raft. It would be great if you could plug it into ChiselStore and see if you observe any problems.

@andreev-io
Copy link
Owner Author

andreev-io commented Dec 18, 2021

I also opened #35 to add support for joint consensus. I think it should be addressed separately, in a different PR.

@andreev-io
Copy link
Owner Author

Also, I didn't implement the logic to split Snapshots into multiple messages. Thinking about this more, I think this should be handled entirely by the network / Little Raft user, and not by the protocol itself. What do you think @penberg?

&mut self,
last_included_index: usize,
last_included_term: usize,
) -> Snapshot;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ChiselStore, create_snapshot() would make an online backup of the current SQLite database to serve as an immutable snapshot:

https://www.sqlite.org/c3ref/backup_finish.html#sqlite3backupinit

However, Little Raft now stores a Bytes blob in Snapshot, which seems little cumbersome to work with. Perhaps it makes more sense to add a type parameter to Snapshot that allows Little Raft users to do whatever they want to represent the snapshot data?

offset: 0,
data: snapshot.data.clone(),
done: true,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how to use this API with ChiselStore because Little Raft maintains a single snapshot and that's going to be a whole database (that can keep on growing in size). So whenever we compact the log on the leader, we have to send the whole database over?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, for clarification, log compaction happens on every node independently, not just leaders. Leaders are only special in that they might retransmit a snapshot to followers that are lagging too far behind.

Now to your question: if the whole database is the state, then it sounds like yes, you would have to transmit it. Conceptually, how else can you recover your state?

@penberg
Copy link
Collaborator

penberg commented Dec 22, 2021

@andreev-io As this pull request is already quite large, I think we could go forward with it as-is, and address the issues I pointed out incrementally.

This commit implements section 7 of the Raft paper, i.e. Log Compaction.
Now, Replicas will merge logs into snapshots to avoid accumulating too
many logs that hoard memory. Replicas now support InstallSnapshot RPCs,
meaning that a Leader can detect when a Follower is so far behind that
the logs it's missing have already been compacted, and then transmit the
snapshot to the follower instead of transmitting a particular log entry.
Non-leader nodes now know how to treat and respond to InstallSnapshot
RPCs.

The StateMachine trait has been extended to provide Little Raft user
with ability to save and load state to and from permanent storage. The
raft_unstable test has been updated to randomly drop and shuffle
messages, causing Replicas to retransmit snapshots, asserting the
behavior of the new InstallSnapshot RPC.
Due to GitHub quirks, this action will only run on pull requests NOT
from a forked repository, which renders it mostly useless. Let's simply
remove it.
@andreev-io
Copy link
Owner Author

@andreev-io As this pull request is already quite large, I think we could go forward with it as-is, and address the issues I pointed out incrementally.

Agreed. But I also pushed a quick fixup to make Bytes a type argument. Can you take a look? One thing I don't like about this is that now you have to provide a type even if you are not using snapshotting. Any thoughts on how to make the interfaces better?

@penberg
Copy link
Collaborator

penberg commented Jan 4, 2022

The type argument to Snapshot looks good and should be nicely extendable.

For users who don't need snapshots, maybe Little Raft could provide a "NoSnapshots" default implementation of the trait?

@andreev-io
Copy link
Owner Author

Great suggestion. I wouldn't have thought of that myself. I opened a separate issue for this idea: #39.

Let's merge this PR if you think it's good to go (I'd like to get your approval @penberg).

@andreev-io andreev-io merged commit f6364b0 into master Jan 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement log compaction
2 participants