New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/util/log: enable bufferedSink usage with fileSink as experimental #120428
Conversation
d609729
to
14d9e78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming it looks good to @xinhaoz !
Reviewed 5 of 5 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
-- commits
line 65 at r1:
Since it's already opt-in, would we be able to avoid the additional environment variable requirement? It seems like something we could document for the buffering
option.
Or, even more aggressively, could we not mark it as experimental but perhaps not document very loudly. Our WAL failover option will not be considered experimental, so it's a bit unfortunate if we have to force them into using experimental options at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @xinhaoz)
Previously, jbowens (Jackson Owens) wrote…
Since it's already opt-in, would we be able to avoid the additional environment variable requirement? It seems like something we could document for the
buffering
option.Or, even more aggressively, could we not mark it as experimental but perhaps not document very loudly. Our WAL failover option will not be considered experimental, so it's a bit unfortunate if we have to force them into using experimental options at the same time.
Let me discuss with the team, I'm open to ditching the env var and perhaps not advertising the feature quite yet.
14d9e78
to
e37f897
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens and @xinhaoz)
Previously, abarganier (Alex Barganier) wrote…
Let me discuss with the team, I'm open to ditching the env var and perhaps not advertising the feature quite yet.
@jbowens envvar removed! We like the idea of not documenting this loudly.
PTAL, and let us know how your testing goes.
Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Now, users can leverage the `bufferedSink` in `file-defaults` and `file-groups by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.
e37f897
to
0dfbc08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
Previously, abarganier (Alex Barganier) wrote…
@jbowens envvar removed! We like the idea of not documenting this loudly.
PTAL, and let us know how your testing goes.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the detailed commit messages 💯. just a few wording nits.
Reviewed 1 of 7 files at r3, 5 of 5 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @abarganier, @jbowens, and @xinhaoz)
-- commits
line 82 at r6:
nit: grammar
pkg/util/log/logconfig/testdata/validate
line 797 at r5 (raw file):
max-group-size: 100MiB # Check that buffering works with file sinks when the correct env var is set.
nit: remove reference to env var
As we introduce the ability to enable `buffering` on file-based log sinks as experimental, there's a need for a new `BufferFormat`. Previously, the default format was to introduce a newline between each log message flushed to the underlying log sink as a delimiter. However, when flushing to files, this is not desireable behavior as it causes the log file to look like: ``` I240313 16:38:33.355448 113 1@gossip/gossip.go:1374 ⋮ [T1,Vsystem,n1] 80 node has connected to cluster via gossip I240313 16:38:33.360943 113 kv/kvserver/stores.go:282 ⋮ [T1,Vsystem,n1] 81 wrote 0 node addresses to persistent storage ``` Empty lines do not play well with log parsers, so we introduce a new `BufferFormat`, `BufferFmtNone`, which uses no delimiters, prefixes, or suffixes. With this new option, log files no longer see these empty lines. The validation code for file log sinks forces the `BufferFmtNone` into the buffering configuration, so this is not something that needs to explicitly be configured by the user. Release note: none
0dfbc08
to
8a2087a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
I've created #120759 to track follow up work to enable this feature in roachtests, to help put the new config option through its paces.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @dhartunian, @jbowens, and @xinhaoz)
Previously, dhartunian (David Hartunian) wrote…
nit: grammar
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @dhartunian, @jbowens, and @xinhaoz)
pkg/util/log/logconfig/testdata/validate
line 797 at r5 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: remove reference to env var
Done.
bors r=dhartunian,jbowens |
Build failed (retrying...): |
Build failed (retrying...): |
Addresses: cockroachdb#120759 Previously, the `network_logging` roachtest was created to put the `bufferedSink` (an async producer+consumer buffer that we primarily use to wrap network log sinks) through its paces. Originally, this functionality was only available to network (`fluent`,`http`) log sinks, and so the test did not include any file based log sinks. As of cockroachdb#120428, we now support the ability to use async buffering on `file-group` log sinks as well, via the same `bufferedSink`. Therefore, it makes sense that we should expand our existing `network_logging` roachtest to also include file sinks. This commit does so by expanding the log config to enable async buffering for files by default, and adds three `file-groups` log sinks to the config that capture all log channels. This will now run alongside the existing network logging sinks. As a final step, since this test is no longer specific to network logging, we rename the roachtest from `network_logging` to `buffered_logging`. Release note: none
123431: roachtest: change `network_logging` to `buffered_logging`, include files r=dhartunian a=abarganier Addresses: #120759 Previously, the `network_logging` roachtest was created to put the `bufferedSink` (an async producer+consumer buffer that we primarily use to wrap network log sinks) through its paces. Originally, this functionality was only available to network (`fluent`,`http`) log sinks, and so the test did not include any file based log sinks. As of #120428, we now support the ability to use async buffering on `file-group` log sinks as well, via the same `bufferedSink`. Therefore, it makes sense that we should expand our existing `network_logging` roachtest to also include file sinks. This commit does so by expanding the log config to enable async buffering for files by default, and adds three `file-groups` log sinks to the config that capture all log channels. This will now run alongside the existing network logging sinks. As a final step, since this test is no longer specific to network logging, we rename the roachtest from `network_logging` to `buffered_logging`. Release note: none Epic: CRDB-35401 Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com>
123431: roachtest: change `network_logging` to `buffered_logging`, include files r=dhartunian a=abarganier Addresses: #120759 Previously, the `network_logging` roachtest was created to put the `bufferedSink` (an async producer+consumer buffer that we primarily use to wrap network log sinks) through its paces. Originally, this functionality was only available to network (`fluent`,`http`) log sinks, and so the test did not include any file based log sinks. As of #120428, we now support the ability to use async buffering on `file-group` log sinks as well, via the same `bufferedSink`. Therefore, it makes sense that we should expand our existing `network_logging` roachtest to also include file sinks. This commit does so by expanding the log config to enable async buffering for files by default, and adds three `file-groups` log sinks to the config that capture all log channels. This will now run alongside the existing network logging sinks. As a final step, since this test is no longer specific to network logging, we rename the roachtest from `network_logging` to `buffered_logging`. Release note: none Epic: CRDB-35401 Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com>
Informs: #72452
Epic: CRDB-35401
For some time, two buffering mechanisms have existed within the log
package. One is controlled by
buffered-writes
, which is only used bythe file sink.
buffered-writes
buffers writes until a buffer is full,after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.
Separately, the
bufferedSink
was introduced to provide an asyncbuffering mechanism. This was originally intended for network log sinks
such as the
fluentSink
andhttpSink
. Originally, thebufferedSink
was not allowed for use with the
fileSink
as we already had abuffering mechanism available.
Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.
Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using
buffered-writes: true
.As a way to alleviate this pain, this patch experimentally enables the
usage of the
bufferedSink
with thefileSink
. ThebufferedSink
isused to "wrap" an underlying sink (such as a
fileSink
), andasynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.
Users can leverage the
bufferedSink
by using thebuffering
config option.For example:
Note that we don't allow both
buffered-writes
andbuffering
to beused together. This patch adds a validation step to ensure this.
Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of
file-group
logsinks via the
buffering
configurationoptions.
The
buffering
configuration can be applied tofile-defaults
orindividual
file-groups
as needed.Note however that
buffering
andbuffered-writes
are incompatible forthe time being. If you'd like to try the
buffering
option on a filelog sink, it's necessary to explicitly set
buffered-writes: false
. Forexample:
We recommend a
max-staleness: 1s
andflush-trigger-size: 256KiB
whenusing this feature with file log sinks.