Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate testNodeCatchUpAfterCompaction #4467

Closed
Zelldon opened this issue May 7, 2020 · 0 comments · Fixed by #4531
Closed

Investigate testNodeCatchUpAfterCompaction #4467

Zelldon opened this issue May 7, 2020 · 0 comments · Fixed by #4531
Assignees
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog

Comments

@Zelldon
Copy link
Member

Zelldon commented May 7, 2020

Description

We have currently an test which is called testNodeCatchUpAfterCompaction in the raft tests, which doesn't work anymore as expected.

With refactoring of Atomix #4466 we migrated this test to rely on the actual log, but it seems to bring a problem or shows a bug it is currently not clear.

  @Test
  @Ignore
  public void testNodeCatchUpAfterCompaction() throws Exception {
    // given
    raftRule.shutdownServer("1");
    raftRule.awaitNewLeader();
    raftRule.appendEntries(100);
    raftRule.tryToCompactLogsOnServersExcept("1", 100).join();

    // when
    final var future = raftRule.startServer("1");

    // then
    future.join();
  }

If we run this test we see the following exception on rejoining of Member one.

06:39:40.505 [] ERROR io.atomix.raft.impl.RaftContext - RaftServer{1} - An uncaught exception occurred, transition to inactive role
java.lang.IndexOutOfBoundsException: Cannot truncate committed index: 0
	at io.atomix.storage.journal.SegmentedJournalWriter.truncate(SegmentedJournalWriter.java:107) ~[classes/:?]
	at io.atomix.storage.journal.SegmentedJournalWriter.reset(SegmentedJournalWriter.java:99) ~[classes/:?]
	at io.atomix.storage.journal.DelegatingJournalWriter.reset(DelegatingJournalWriter.java:61) ~[classes/:?]
	at io.atomix.raft.roles.PassiveRole.appendEntries(PassiveRole.java:523) ~[classes/:?]
	at io.atomix.raft.roles.PassiveRole.handleAppend(PassiveRole.java:387) ~[classes/:?]
	at io.atomix.raft.roles.ActiveRole.onAppend(ActiveRole.java:51) ~[classes/:?]
	at io.atomix.raft.roles.FollowerRole.onAppend(FollowerRole.java:197) ~[classes/:?]
	at io.atomix.raft.impl.RaftContext.lambda$registerHandlers$12(RaftContext.java:222) ~[classes/:?]
	at io.atomix.raft.impl.RaftContext.lambda$runOnContext$19(RaftContext.java:233) ~[classes/:?]
	at io.atomix.utils.concurrent.SingleThreadContext$WrappedRunnable.run(SingleThreadContext.java:188) [classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]

We have to investigate how problematic this really is and what it means.

@Zelldon Zelldon added Status: Ready kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog labels May 7, 2020
@Zelldon Zelldon self-assigned this May 7, 2020
zeebe-bors bot added a commit that referenced this issue May 14, 2020
4466: Refactor Atomix r=Zelldon a=Zelldon

## Description

In order to fix an BUG (#4439) and test it properly we had to migrate our `ZeebeStateMachine` to atomix.
This bring a bunch of other problems and issues, which resulted in a bigger refactoring which was already due. Sorry to the reviewers.

**This PR does the following:**

1. Migrates  the ZeebeStateMachine to atomix, which means the old is replaced, which was only used in RaftTests
2. Moves everything what is snapshot related to atomix and some parts to the broker. This merging makes it now easier to merge the snapshot replication stuff (#4424) and removes the parts from the LogStream. The engine doesn't know anymore about snapshoting, since it doesn't need to.
3. Removes everything primitive and session related. This was a big dead part of the code base, which made the code much complexer as it needed to be.
4. Iterate over the RaftTests. 
    1. Adds new tests which relies on the actual log and replication. The test verifies that entries are commited and written to the log. The Ordering of the entries is also taken into account. Not all tests are migrated, but at least adjusted that they no longer rely on sessions.
    2. Remove some tests which doesn't make sense anymore or need to be rewritten.
    3. Migrate one test `testNodeCatchUpAfterCompaction`, which is currently on ignore because it seem to fail. We need to investigate that #4467 
    4. Remove snapshot tests from the Raft module, which where not easy to migrate. It was for now easier to just remove them. We need to investigate if it makes sense to write new in the raft module or maybe in other modules like IT or Broker. Maybe we already have enough, this needs to be investigated. #4468 

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #4388
closes #4432
closes #4344
closes #4258

#

Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
zeebe-bors bot added a commit that referenced this issue May 25, 2020
4531: Add tests for snapshotting and compaction r=Zelldon a=Zelldon

## Description

Improved further the `RaftRule` such that we are able to trigger snapshotting, which will create snapshots on all nodes. After the snapshot is taken the snapshot listeners are called, which trigger compaction. This setup reflects now how our system works, but it is much easier to test I think.

Normally we do a snapshot on the Leader and replicate that. The snapshot replication call at the end also `SnapshotStore#newSnapshot`. With the current test approach we were able to simplify that approach and test the same behavior. 

I added a test for:

 * normal snapshot taking and verify that all nodes have this snapshot
 * snapshotting triggers compaction verify log is compacted
 * snapshot is send on rejoin cluster

I plan to add further tests, more complex ones probably.

I checked #4467 and the problem was that the node was not able to truncate his log because it was at zero index and the leader was not able to send a snapshot, because it had none. The test was wrongly written I would say.

<!-- Please explain the changes you made here. -->

## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #4467

#

4582: chore(deps): update version.sbe to v1.18.1 r=npepinpe a=renovate[bot]

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [uk.co.real-logic:sbe-tool](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` |
| [uk.co.real-logic:sbe-all](https://togithub.com/real-logic/simple-binary-encoding) | minor | `1.17.0` -> `1.18.1` |

---

### Release Notes

<details>
<summary>real-logic/simple-binary-encoding</summary>

### [`v1.18.1`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.1)

[Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.18.0...1.18.1)

-   Fix case of importing buffers when var data is used in nested groups for Java codecs.

Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe)

### [`v1.18.0`](https://togithub.com/real-logic/simple-binary-encoding/releases/1.18.0)

[Compare Source](https://togithub.com/real-logic/simple-binary-encoding/compare/1.17.0...1.18.0)

-   Only generate imports for Java codecs when required to address warnings.
-   Access fixed length arrays as Spans in C# codecs. [PR #&#8203;780](https://togithub.com/real-logic/simple-binary-encoding/pull/780).
-   Add `SbeSchemaId` and `SbeSchemaVersion` as constants in fixed flyweights for C# codecs.
-   Generate source docs from schema descriptions for C# codecs. [PR #&#8203;778](https://togithub.com/real-logic/simple-binary-encoding/pull/778).
-   Add offset and wrap methods to C# codecs for ease of use. [PR #&#8203;777](https://togithub.com/real-logic/simple-binary-encoding/pull/777).
-   Support non-standard message headers form the C++ codecs. [PR #&#8203;775](https://togithub.com/real-logic/simple-binary-encoding/pull/775).
-   Fix version support for enums in C codecs. [Issue #&#8203;773](https://togithub.com/real-logic/simple-binary-encoding/issues/773).
-   Improve formatting of generated C codecs.
-   Require a strict dependency on Agrona.
-   Upgrade to Agrona 1.5.0.
-   Upgrade to javadoc-links 5.1.0.
-   Upgrade to JUnit 5.6.2.
-   Upgrade to Gradle 6.4.1.

Java binaries can be found [here...](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22uk.co.real-logic%22%20sbe)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#zeebe-io/zeebe).

Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
Co-authored-by: Renovate Bot <bot@renovateapp.com>
@zeebe-bors zeebe-bors bot closed this as completed in e41e758 May 25, 2020
github-merge-queue bot pushed a commit that referenced this issue Mar 14, 2024
* refactor: convert optional filters button to Carbon

* chore: remove fill color from icon

* chore: move unselectedOptionalFilters inside OptionalFilters component

* chore: add tooltips for menu items

* fix: remove overflow menu pointer

* chore: hide tooltip, increase max-width, remove titles from list items

* style: add 40px spacing between checkboxes and optional filters button

* chore: keep aria-labelledby
github-merge-queue bot pushed a commit that referenced this issue Apr 16, 2024
…4467)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. scope/broker Marks an issue or PR to appear in the broker section of the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants