Adds new experimental flag to disable explicit flush in the consensus module #5576

npepinpe · 2020-10-12T15:42:20Z

Description

This PR adds a new experimental flag as zeebe.broker.experimental.disableExplicitRaftFlush, which defaults to false. When true, it will disable explicit flushing on the Raft side - flushing on commit on the leader, and flushing on append on the follower.

The flag is worded in the negative, as the default behaviour is to always explicitly flush for correctness, and users should take care when disabling this. There is also a warning printed out if replication is enabled, as this can lead to inconsistency issues; when replication factor is 1, at worst you simply suffer data loss on crash.

There are unfortunately no acceptance tests, as I with the current set up I found it quite hard to add. Let me know if you have an idea there.

Ideally we'd like this in 0.25 so users can already start experimenting with this - in some use cases, this can give a nice performance boost, and the consequences (e.g. data loss when replication is disabled) can be ignored.

Related issues

closes #5570

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement

npepinpe · 2020-10-12T15:47:36Z

I'm running 4 benchmarks at the moment to verify this:

np-flush-flag-no-repl : mmap, flush disabled, replication factor 1
np-baseline-no-repl: mmap, flush enabled, replication factor 1
np-flush-flag-repl: mmap, flush disabled, replication factor 3
np-baseline-repl: mmap, flush enabled, replication factor 3

Would probably be interesting to also see the differences without mmap, as flushing in either case means a different syscall.

npepinpe · 2020-10-13T09:52:20Z

Results of benchmarking this:

Set up is the classic load test setup, a single task process and attempts to start/complete as many workflows as possible; in each the overall back pressure was dropping between 65-80% depending on different times.

Setup	Throughput	Latency
Namespace: `np-flush-flag-no-repl`: flush disabled, mmap on, replication factor 1
Namespace: `np-baseline-no-repl`: flush enabled, mmap on, replication factor 1
Namespace: `np-flush-flag-repl`: flush disabled, mmap on, replication factor 3
Namespace: `np-baseline-repl`: flush enabled, mmap on, replication factor 3

Results confirm what we expect, which is there's a definite performance penalty to flushing. For users where small data loss is fine (e.g. throw away WFIs which can be repeated if they fail/disappear), the performance gain may be beneficial, especially if they are running with replication factor 1.

npepinpe · 2020-10-13T10:08:22Z

As Deepthi is having internet trouble, @korthout - do you think you can review this? We'd like to merge this in 0.25 so SQOs can already see if improvements in the flush behaviour would help their latency use cases

deepthidevaki · 2020-10-13T10:55:19Z

@npepinpe I can review it. Didn't realize it has to be in 0.25.

deepthidevaki

LGTM 🎉 Would you add documentation also?

deepthidevaki · 2020-10-13T11:20:10Z

broker/src/main/java/io/zeebe/broker/system/SystemContext.java

@@ -119,6 +122,10 @@ private void validateConfiguration() {
              "diskUsageCommandWatermark (%f) must be less than diskUsageReplicationWatermark (%f)",
              diskUsageCommandWatermark, diskUsageReplicationWatermark));
    }
+
+    if (replicationFactor > 1 && experimental.isDisableExplicitRaftFlush()) {
+      LOG.warn(REPLICATION_WITH_DISABLED_FLUSH_WARNING);


It is not safe to enable this without replication also. I would warn also for replicationFactor = 1.

@deepthidevaki how come?

@korthout When a write is not flushed to the disk, it is not guaranteed to be persisted over a failure. So it can happen that we write a command, raft commits it but not flush it, processor process it and send a response to the client, and then the node is crashed. After restart, that command does not exists in the log because it was not flushed. So we might have responded to the client that a workflow was created, but after a failure that workflow does not exists.

@deepthidevaki thanks. That makes a lot of sense

deepthidevaki · 2020-10-13T11:27:41Z

atomix/cluster/src/main/java/io/atomix/raft/storage/RaftStorage.java

@@ -389,7 +389,7 @@ public boolean isRetainStaleSnapshots() {
    private int maxEntrySize = DEFAULT_MAX_ENTRY_SIZE;
    private int maxEntriesPerSegment = DEFAULT_MAX_ENTRIES_PER_SEGMENT;
    private long freeDiskSpace = DEFAULT_FREE_DISK_SPACE;
-    private boolean flushOnCommit = DEFAULT_FLUSH_ON_COMMIT;
+    private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT;


Suggested change

private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT;

private boolean flushExplicitly = DEFAULT_FLUSH_EXPLICITLY;

Will do, but I've already learned not to commit such fixes directly from Github since it'll break the build 😅

broker/src/main/java/io/zeebe/broker/system/SystemContext.java

npepinpe · 2020-10-13T14:00:38Z

Applied feedback 🙂

deepthidevaki

Thanks @npepinpe . Please update the warning message and then it is good to merge.

deepthidevaki · 2020-10-13T14:22:34Z

broker/src/main/java/io/zeebe/broker/system/SystemContext.java

@@ -33,6 +33,9 @@
      "Snapshot period %s needs to be larger then or equals to one minute.";
  private static final String MAX_BATCH_SIZE_ERROR_MSG =
      "Expected to have an append batch size maximum which is non negative and smaller then '%d', but was '%s'.";
+  private static final String REPLICATION_WITH_DISABLED_FLUSH_WARNING =
+      "Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies "


Please update the message.

Sorry, I meant

Suggested change

"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies "

"Disabling explicit flushing is an experimental feature and can lead to inconsistencies "

- new flag which controls if we flush explicitly on the leader commit and follower appends; defaults to true - removes old flushOnCommit config flag

…lush

npepinpe · 2020-10-13T14:46:26Z

bors r+

ghost · 2020-10-13T15:12:07Z

Build succeeded:

continuous-integration/jenkins/branch

npepinpe requested a review from deepthidevaki October 12, 2020 15:42

npepinpe self-assigned this Oct 12, 2020

npepinpe requested review from korthout and removed request for deepthidevaki October 13, 2020 10:07

deepthidevaki requested changes Oct 13, 2020

View reviewed changes

npepinpe removed the request for review from korthout October 13, 2020 12:15

npepinpe requested a review from deepthidevaki October 13, 2020 14:00

deepthidevaki approved these changes Oct 13, 2020

View reviewed changes

npepinpe added 3 commits October 13, 2020 16:40

chore(atomix): introduce flushExplicitly flag

6e5a864

- new flag which controls if we flush explicitly on the leader commit and follower appends; defaults to true - removes old flushOnCommit config flag

chore(logstreams): fix to use correct flag

8ecaa3e

chore(broker): add new experimental config to control Raft explicit f…

c15dbe3

…lush

npepinpe force-pushed the 5570-flush-flag branch from 9a78520 to c15dbe3 Compare October 13, 2020 14:40

ghost merged commit 7927a3d into develop Oct 13, 2020

ghost deleted the 5570-flush-flag branch October 13, 2020 15:12

MiguelPires added the Release: 0.25.0 label Oct 22, 2020

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds new experimental flag to disable explicit flush in the consensus module #5576

Adds new experimental flag to disable explicit flush in the consensus module #5576

npepinpe commented Oct 12, 2020

npepinpe commented Oct 12, 2020

npepinpe commented Oct 13, 2020 •

edited

Loading

npepinpe commented Oct 13, 2020

deepthidevaki commented Oct 13, 2020

deepthidevaki left a comment

deepthidevaki Oct 13, 2020

korthout Oct 13, 2020

deepthidevaki Oct 13, 2020

korthout Oct 13, 2020

deepthidevaki Oct 13, 2020

npepinpe Oct 13, 2020

npepinpe commented Oct 13, 2020

deepthidevaki left a comment

deepthidevaki Oct 13, 2020

deepthidevaki Oct 13, 2020

npepinpe commented Oct 13, 2020

ghost commented Oct 13, 2020

	private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT;
	private boolean flushExplicitly = DEFAULT_FLUSH_EXPLICITLY;

	"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies "
	"Disabling explicit flushing is an experimental feature and can lead to inconsistencies "

Adds new experimental flag to disable explicit flush in the consensus module #5576

Adds new experimental flag to disable explicit flush in the consensus module #5576

Conversation

npepinpe commented Oct 12, 2020

Description

Related issues

Definition of Done

npepinpe commented Oct 12, 2020

npepinpe commented Oct 13, 2020 • edited Loading

npepinpe commented Oct 13, 2020

deepthidevaki commented Oct 13, 2020

deepthidevaki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npepinpe commented Oct 13, 2020

deepthidevaki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npepinpe commented Oct 13, 2020

ghost commented Oct 13, 2020

npepinpe commented Oct 13, 2020 •

edited

Loading