Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for snapshotting and compaction #4531

Merged
merged 2 commits into from
May 25, 2020
Merged

Conversation

Zelldon
Copy link
Member

@Zelldon Zelldon commented May 15, 2020

Description

Improved further the RaftRule such that we are able to trigger snapshotting, which will create snapshots on all nodes. After the snapshot is taken the snapshot listeners are called, which trigger compaction. This setup reflects now how our system works, but it is much easier to test I think.

Normally we do a snapshot on the Leader and replicate that. The snapshot replication call at the end also SnapshotStore#newSnapshot. With the current test approach we were able to simplify that approach and test the same behavior.

I added a test for:

  • normal snapshot taking and verify that all nodes have this snapshot
  • snapshotting triggers compaction verify log is compacted
  • snapshot is send on rejoin cluster

I plan to add further tests, more complex ones probably.

I checked #4467 and the problem was that the node was not able to truncate his log because it was at zero index and the leader was not able to send a snapshot, because it had none. The test was wrongly written I would say.

Related issues

closes #4467

Pull Request Checklist

  • All commit messages match our commit message guidelines
  • The submitting code follows our code style
  • If submitting code, please run mvn clean install -DskipTests locally before committing

@Zelldon
Copy link
Member Author

Zelldon commented May 18, 2020

@deepthidevaki @npepinpe someone has time to review this?

Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good from my side, but there's a bunch of new things I don't know, do you think we could take 5 minutes to run through it together? 😅

@Zelldon
Copy link
Member Author

Zelldon commented May 19, 2020

Generally looks good from my side, but there's a bunch of new things I don't know, do you think we could take 5 minutes to run through it together? sweat_smile

of course! when?

Copy link
Contributor

@deepthidevaki deepthidevaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
shouldDoSnapshot and shouldCompactLogOnSnapshot both seems to be testing RaftRule instead of testing the real implementation, unless I misunderstood something. Both can be removed.

@Zelldon
Copy link
Member Author

Zelldon commented May 20, 2020

Hey guys I incorporated your review hints and added additional tests for data loss and also to verify that entries after snapshots are only replicated.

@deepthidevaki would be cool if you could take an additional look.

Copy link
Contributor

@deepthidevaki deepthidevaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

The bug that we had before was if the leader replicated the snapshot to the follower once, it won't replicate it again if the follower restarts with data loss. The test we had before was covering this scenario. It would be good to test it so that we don't introduce that bug again. The previous test was as follows:

  1. Ensure snapshot is replicated by the leader to the follower :
    Append some entries and take snapshot
    Follower restart with data loss
    Assert follower received snapshot
  2. Verify snapshot is replicated again :
    The same follower restart with data loss
    Assert follower received snapshot

@Zelldon
Copy link
Member Author

Zelldon commented May 25, 2020

Hey @deepthidevaki thanks I wanted to do that with this #4468

@deepthidevaki
Copy link
Contributor

Ok. I'm losing track of all refactoring issues 😄

@Zelldon
Copy link
Member Author

Zelldon commented May 25, 2020

No problem 😄 thanks for the review 👍

@Zelldon
Copy link
Member Author

Zelldon commented May 25, 2020

bors r+

zeebe-bors bot added a commit that referenced this pull request May 25, 2020
4531: Add tests for snapshotting and compaction r=Zelldon a=Zelldon

## Description

Improved further the `RaftRule` such that we are able to trigger snapshotting, which will create snapshots on all nodes. After the snapshot is taken the snapshot listeners are called, which trigger compaction. This setup reflects now how our system works, but it is much easier to test I think.

Normally we do a snapshot on the Leader and replicate that. The snapshot replication call at the end also `SnapshotStore#newSnapshot`. With the current test approach we were able to simplify that approach and test the same behavior. 

I added a test for:

 * normal snapshot taking and verify that all nodes have this snapshot
 * snapshotting triggers compaction verify log is compacted
 * snapshot is send on rejoin cluster

I plan to add further tests, more complex ones probably.

I checked #4467 and the problem was that the node was not able to truncate his log because it was at zero index and the leader was not able to send a snapshot, because it had none. The test was wrongly written I would say.

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #4467

#

4582: chore(deps): update version.sbe to v1.18.1 r=npepinpe a=renovate[bot]

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [uk.co.real-logic:sbe-tool](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` |
| [uk.co.real-logic:sbe-all](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` |

---

### Release Notes

<details>
<summary>real-logic/simple-binary-encoding</summary>

### [`v1.18.1`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.1)

[Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.18.0...1.18.1)

-   Fix case of importing buffers when var data is used in nested groups for Java codecs.

Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe)

### [`v1.18.0`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.0)

[Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.17.0...1.18.0)

-   Only generate imports for Java codecs when required to address warnings.
-   Access fixed length arrays as Spans in C# codecs. [PR #&#8203;780](https://togithub.com/real-logic/simple-binary-encoding/pull/780).
-   Add `SbeSchemaId` and `SbeSchemaVersion` as constants in fixed flyweights for C# codecs.
-   Generate source docs from schema descriptions for C# codecs. [PR #&#8203;778](https://togithub.com/real-logic/simple-binary-encoding/pull/778).
-   Add offset and wrap methods to C# codecs for ease of use. [PR #&#8203;777](https://togithub.com/real-logic/simple-binary-encoding/pull/777).
-   Support non-standard message headers form the C++ codecs. [PR #&#8203;775](https://togithub.com/real-logic/simple-binary-encoding/pull/775).
-   Fix version support for enums in C codecs. [Issue #&#8203;773](https://togithub.com/real-logic/simple-binary-encoding/issues/773).
-   Improve formatting of generated C codecs.
-   Require a strict dependency on Agrona.
-   Upgrade to Agrona 1.5.0.
-   Upgrade to javadoc-links 5.1.0.
-   Upgrade to JUnit 5.6.2.
-   Upgrade to Gradle 6.4.1.

Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#zeebe-io/zeebe).

Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
@zeebe-bors
Copy link
Contributor

zeebe-bors bot commented May 25, 2020

Build failed (retrying...)

@zeebe-bors
Copy link
Contributor

zeebe-bors bot commented May 25, 2020

Build succeeded

@zeebe-bors zeebe-bors bot merged commit e41e758 into develop May 25, 2020
@zeebe-bors zeebe-bors bot deleted the 4467-snapshot-test branch May 25, 2020 08:31
github-merge-queue bot pushed a commit that referenced this pull request Apr 16, 2024
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate testNodeCatchUpAfterCompaction
3 participants