Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new experimental flag to disable explicit flush in the consensus module #5576

Merged
3 commits merged into from
Oct 13, 2020

Conversation

npepinpe
Copy link
Member

Description

This PR adds a new experimental flag as zeebe.broker.experimental.disableExplicitRaftFlush, which defaults to false. When true, it will disable explicit flushing on the Raft side - flushing on commit on the leader, and flushing on append on the follower.

The flag is worded in the negative, as the default behaviour is to always explicitly flush for correctness, and users should take care when disabling this. There is also a warning printed out if replication is enabled, as this can lead to inconsistency issues; when replication factor is 1, at worst you simply suffer data loss on crash.

There are unfortunately no acceptance tests, as I with the current set up I found it quite hard to add. Let me know if you have an idea there.

Ideally we'd like this in 0.25 so users can already start experimenting with this - in some use cases, this can give a nice performance boost, and the consequences (e.g. data loss when replication is disabled) can be ignored.

Related issues

closes #5570

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement

@npepinpe npepinpe self-assigned this Oct 12, 2020
@npepinpe
Copy link
Member Author

I'm running 4 benchmarks at the moment to verify this:

  1. np-flush-flag-no-repl : mmap, flush disabled, replication factor 1
  2. np-baseline-no-repl: mmap, flush enabled, replication factor 1
  3. np-flush-flag-repl: mmap, flush disabled, replication factor 3
  4. np-baseline-repl: mmap, flush enabled, replication factor 3

Would probably be interesting to also see the differences without mmap, as flushing in either case means a different syscall.

@npepinpe
Copy link
Member Author

npepinpe commented Oct 13, 2020

Results of benchmarking this:

Set up is the classic load test setup, a single task process and attempts to start/complete as many workflows as possible; in each the overall back pressure was dropping between 65-80% depending on different times.

Setup Throughput Latency
Namespace: np-flush-flag-no-repl: flush disabled, mmap on, replication factor 1 np-flush-flag-no-repl-throughput np-flush-flag-no-repl-latency
Namespace: np-baseline-no-repl: flush enabled, mmap on, replication factor 1 np-baseline-no-repl-throughput np-baseline-no-repl-latency
Namespace: np-flush-flag-repl: flush disabled, mmap on, replication factor 3 np-flush-flag-repl np-flush-flag-repl-latency
Namespace: np-baseline-repl: flush enabled, mmap on, replication factor 3 np-baseline-repl-throughput np-baseline-repl-latency

Results confirm what we expect, which is there's a definite performance penalty to flushing. For users where small data loss is fine (e.g. throw away WFIs which can be repeated if they fail/disappear), the performance gain may be beneficial, especially if they are running with replication factor 1.

@npepinpe npepinpe requested review from korthout and removed request for deepthidevaki October 13, 2020 10:07
@npepinpe
Copy link
Member Author

As Deepthi is having internet trouble, @korthout - do you think you can review this? We'd like to merge this in 0.25 so SQOs can already see if improvements in the flush behaviour would help their latency use cases

@deepthidevaki
Copy link
Contributor

@npepinpe I can review it. Didn't realize it has to be in 0.25.

Copy link
Contributor

@deepthidevaki deepthidevaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 Would you add documentation also?

@@ -119,6 +122,10 @@ private void validateConfiguration() {
"diskUsageCommandWatermark (%f) must be less than diskUsageReplicationWatermark (%f)",
diskUsageCommandWatermark, diskUsageReplicationWatermark));
}

if (replicationFactor > 1 && experimental.isDisableExplicitRaftFlush()) {
LOG.warn(REPLICATION_WITH_DISABLED_FLUSH_WARNING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not safe to enable this without replication also. I would warn also for replicationFactor = 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepthidevaki how come?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@korthout When a write is not flushed to the disk, it is not guaranteed to be persisted over a failure. So it can happen that we write a command, raft commits it but not flush it, processor process it and send a response to the client, and then the node is crashed. After restart, that command does not exists in the log because it was not flushed. So we might have responded to the client that a workflow was created, but after a failure that workflow does not exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepthidevaki thanks. That makes a lot of sense

@@ -389,7 +389,7 @@ public boolean isRetainStaleSnapshots() {
private int maxEntrySize = DEFAULT_MAX_ENTRY_SIZE;
private int maxEntriesPerSegment = DEFAULT_MAX_ENTRIES_PER_SEGMENT;
private long freeDiskSpace = DEFAULT_FREE_DISK_SPACE;
private boolean flushOnCommit = DEFAULT_FLUSH_ON_COMMIT;
private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT;
private boolean flushExplicitly = DEFAULT_FLUSH_EXPLICITLY;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, but I've already learned not to commit such fixes directly from Github since it'll break the build 😅

@npepinpe npepinpe removed the request for review from korthout October 13, 2020 12:15
@npepinpe
Copy link
Member Author

Applied feedback 🙂

Copy link
Contributor

@deepthidevaki deepthidevaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @npepinpe . Please update the warning message and then it is good to merge.

@@ -33,6 +33,9 @@
"Snapshot period %s needs to be larger then or equals to one minute.";
private static final String MAX_BATCH_SIZE_ERROR_MSG =
"Expected to have an append batch size maximum which is non negative and smaller then '%d', but was '%s'.";
private static final String REPLICATION_WITH_DISABLED_FLUSH_WARNING =
"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant

Suggested change
"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies "
"Disabling explicit flushing is an experimental feature and can lead to inconsistencies "

- new flag which controls if we flush explicitly on the leader commit
  and follower appends; defaults to true
- removes old flushOnCommit config flag
@npepinpe
Copy link
Member Author

bors r+

@ghost
Copy link

ghost commented Oct 13, 2020

Build succeeded:

@ghost ghost merged commit 7927a3d into develop Oct 13, 2020
@ghost ghost deleted the 5570-flush-flag branch October 13, 2020 15:12
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an experimental configuration flag to disable explicit flushing in Raft
4 participants