-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds new experimental flag to disable explicit flush in the consensus module #5576
Conversation
I'm running 4 benchmarks at the moment to verify this:
Would probably be interesting to also see the differences without mmap, as flushing in either case means a different syscall. |
As Deepthi is having internet trouble, @korthout - do you think you can review this? We'd like to merge this in 0.25 so SQOs can already see if improvements in the flush behaviour would help their latency use cases |
@npepinpe I can review it. Didn't realize it has to be in 0.25. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 Would you add documentation also?
@@ -119,6 +122,10 @@ private void validateConfiguration() { | |||
"diskUsageCommandWatermark (%f) must be less than diskUsageReplicationWatermark (%f)", | |||
diskUsageCommandWatermark, diskUsageReplicationWatermark)); | |||
} | |||
|
|||
if (replicationFactor > 1 && experimental.isDisableExplicitRaftFlush()) { | |||
LOG.warn(REPLICATION_WITH_DISABLED_FLUSH_WARNING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not safe to enable this without replication also. I would warn also for replicationFactor = 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthidevaki how come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@korthout When a write is not flushed to the disk, it is not guaranteed to be persisted over a failure. So it can happen that we write a command, raft commits it but not flush it, processor process it and send a response to the client, and then the node is crashed. After restart, that command does not exists in the log because it was not flushed. So we might have responded to the client that a workflow was created, but after a failure that workflow does not exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepthidevaki thanks. That makes a lot of sense
@@ -389,7 +389,7 @@ public boolean isRetainStaleSnapshots() { | |||
private int maxEntrySize = DEFAULT_MAX_ENTRY_SIZE; | |||
private int maxEntriesPerSegment = DEFAULT_MAX_ENTRIES_PER_SEGMENT; | |||
private long freeDiskSpace = DEFAULT_FREE_DISK_SPACE; | |||
private boolean flushOnCommit = DEFAULT_FLUSH_ON_COMMIT; | |||
private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private boolean flushExplicitly = DEFAULT_FLUSH_ON_COMMIT; | |
private boolean flushExplicitly = DEFAULT_FLUSH_EXPLICITLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, but I've already learned not to commit such fixes directly from Github since it'll break the build 😅
Applied feedback 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @npepinpe . Please update the warning message and then it is good to merge.
@@ -33,6 +33,9 @@ | |||
"Snapshot period %s needs to be larger then or equals to one minute."; | |||
private static final String MAX_BATCH_SIZE_ERROR_MSG = | |||
"Expected to have an append batch size maximum which is non negative and smaller then '%d', but was '%s'."; | |||
private static final String REPLICATION_WITH_DISABLED_FLUSH_WARNING = | |||
"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant
"Disabling explicit flushing with replication enabled is an experimental feature and can lead to inconsistencies " | |
"Disabling explicit flushing is an experimental feature and can lead to inconsistencies " |
- new flag which controls if we flush explicitly on the leader commit and follower appends; defaults to true - removes old flushOnCommit config flag
9a78520
to
c15dbe3
Compare
bors r+ |
Build succeeded: |
Description
This PR adds a new experimental flag as
zeebe.broker.experimental.disableExplicitRaftFlush
, which defaults to false. When true, it will disable explicit flushing on the Raft side - flushing on commit on the leader, and flushing on append on the follower.The flag is worded in the negative, as the default behaviour is to always explicitly flush for correctness, and users should take care when disabling this. There is also a warning printed out if replication is enabled, as this can lead to inconsistency issues; when replication factor is 1, at worst you simply suffer data loss on crash.
There are unfortunately no acceptance tests, as I with the current set up I found it quite hard to add. Let me know if you have an idea there.
Ideally we'd like this in 0.25 so users can already start experimenting with this - in some use cases, this can give a nice performance boost, and the consequences (e.g. data loss when replication is disabled) can be ignored.
Related issues
closes #5570
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
Testing:
Documentation: