-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for snapshotting and compaction #4531
Conversation
4b213f0
to
74c7404
Compare
74c7404
to
f13895f
Compare
@deepthidevaki @npepinpe someone has time to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good from my side, but there's a bunch of new things I don't know, do you think we could take 5 minutes to run through it together? 😅
atomix/cluster/src/test/java/io/atomix/raft/RaftFailOverTest.java
Outdated
Show resolved
Hide resolved
of course! when? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
shouldDoSnapshot
and shouldCompactLogOnSnapshot
both seems to be testing RaftRule instead of testing the real implementation, unless I misunderstood something. Both can be removed.
atomix/cluster/src/test/java/io/atomix/raft/RaftFailOverTest.java
Outdated
Show resolved
Hide resolved
Hey guys I incorporated your review hints and added additional tests for data loss and also to verify that entries after snapshots are only replicated. @deepthidevaki would be cool if you could take an additional look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
The bug that we had before was if the leader replicated the snapshot to the follower once, it won't replicate it again if the follower restarts with data loss. The test we had before was covering this scenario. It would be good to test it so that we don't introduce that bug again. The previous test was as follows:
- Ensure snapshot is replicated by the leader to the follower :
Append some entries and take snapshot
Follower restart with data loss
Assert follower received snapshot - Verify snapshot is replicated again :
The same follower restart with data loss
Assert follower received snapshot
Hey @deepthidevaki thanks I wanted to do that with this #4468 |
Ok. I'm losing track of all refactoring issues 😄 |
No problem 😄 thanks for the review 👍 |
bors r+ |
4531: Add tests for snapshotting and compaction r=Zelldon a=Zelldon ## Description Improved further the `RaftRule` such that we are able to trigger snapshotting, which will create snapshots on all nodes. After the snapshot is taken the snapshot listeners are called, which trigger compaction. This setup reflects now how our system works, but it is much easier to test I think. Normally we do a snapshot on the Leader and replicate that. The snapshot replication call at the end also `SnapshotStore#newSnapshot`. With the current test approach we were able to simplify that approach and test the same behavior. I added a test for: * normal snapshot taking and verify that all nodes have this snapshot * snapshotting triggers compaction verify log is compacted * snapshot is send on rejoin cluster I plan to add further tests, more complex ones probably. I checked #4467 and the problem was that the node was not able to truncate his log because it was at zero index and the leader was not able to send a snapshot, because it had none. The test was wrongly written I would say. <!-- Please explain the changes you made here. --> ## Related issues <!-- Which issues are closed by this PR or are related --> closes #4467 # 4582: chore(deps): update version.sbe to v1.18.1 r=npepinpe a=renovate[bot] This PR contains the following updates: | Package | Update | Change | |---|---|---| | [uk.co.real-logic:sbe-tool](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` | | [uk.co.real-logic:sbe-all](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` | --- ### Release Notes <details> <summary>real-logic/simple-binary-encoding</summary> ### [`v1.18.1`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.1) [Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.18.0...1.18.1) - Fix case of importing buffers when var data is used in nested groups for Java codecs. Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe) ### [`v1.18.0`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.0) [Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.17.0...1.18.0) - Only generate imports for Java codecs when required to address warnings. - Access fixed length arrays as Spans in C# codecs. [PR #​780](https://togithub.com/real-logic/simple-binary-encoding/pull/780). - Add `SbeSchemaId` and `SbeSchemaVersion` as constants in fixed flyweights for C# codecs. - Generate source docs from schema descriptions for C# codecs. [PR #​778](https://togithub.com/real-logic/simple-binary-encoding/pull/778). - Add offset and wrap methods to C# codecs for ease of use. [PR #​777](https://togithub.com/real-logic/simple-binary-encoding/pull/777). - Support non-standard message headers form the C++ codecs. [PR #​775](https://togithub.com/real-logic/simple-binary-encoding/pull/775). - Fix version support for enums in C codecs. [Issue #​773](https://togithub.com/real-logic/simple-binary-encoding/issues/773). - Improve formatting of generated C codecs. - Require a strict dependency on Agrona. - Upgrade to Agrona 1.5.0. - Upgrade to javadoc-links 5.1.0. - Upgrade to JUnit 5.6.2. - Upgrade to Gradle 6.4.1. Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#zeebe-io/zeebe). Co-authored-by: Christopher Zell <zelldon91@googlemail.com> Co-authored-by: Renovate Bot <bot@renovateapp.com>
Build failed (retrying...) |
Build succeeded |
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Description
Improved further the
RaftRule
such that we are able to trigger snapshotting, which will create snapshots on all nodes. After the snapshot is taken the snapshot listeners are called, which trigger compaction. This setup reflects now how our system works, but it is much easier to test I think.Normally we do a snapshot on the Leader and replicate that. The snapshot replication call at the end also
SnapshotStore#newSnapshot
. With the current test approach we were able to simplify that approach and test the same behavior.I added a test for:
I plan to add further tests, more complex ones probably.
I checked #4467 and the problem was that the node was not able to truncate his log because it was at zero index and the leader was not able to send a snapshot, because it had none. The test was wrongly written I would say.
Related issues
closes #4467
Pull Request Checklist
mvn clean install -DskipTests
locally before committing