New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/util/log: change default httpSink timeout to 2s #109264
pkg/util/log: change default httpSink timeout to 2s #109264
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
9bab107
to
6e22968
Compare
cc @florence-crl - this will require a docs change to update the |
6e22968
to
7657f58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have same thing with fluent-servers
?
Reviewed 4 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
fluent-servers
don't have timeout config options, but they do have hardcoded dial and write timeouts (5 & 1 seconds, respectively):
cockroach/pkg/util/log/fluent_client.go
Lines 38 to 39 in abd9c99
const fluentDialTimeout = 5 * time.Second | |
const fluentWriteTimeout = time.Second |
This makes me wonder - should the default http timeout be more aggressive than 60s
? Maybe 5s
?
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
7657f58
to
b27e26b
Compare
Based on some internal discussion, I've dropped the default from |
Previously, the default timeout for the httpSink was for there to be no timeout at all. This means that the first call to `output()` on where the http target was unavailable would hang forever. This would deadlock the calling goroutine, whether that's the bufferedSink flush goroutine, or (even worse) a server goroutine in the event that the httpSink is not buffered. Our default timeout should not deadlock in the worst case scenario. Admittedly, `2s` would also cause a noticeable performance degradation in the event that the httpSink was unbuffered, but it would at least be able to emit logs indicating the timeout as the cause. Availability would also be maintained to some degree. Previously, the deadlocks due to no timeout being set by default meant that no indication was ever given that the httpSink was unable to reach the http target. Release note (ops change): The default value of `timeout` for `http-servers` logging sinks has been changed from "no timeout" to `2s`. This will be reflected in the `http-defaults` section of the log configuration. Users still maintain the ability to override the timeout, or disable it by explicitly setting it to `0` (e.g. `timeout: 0`).
b27e26b
to
91dcefd
Compare
Why align with our network timeouts vs |
See discussion here: https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1692804661841219 We don't have precedent for setting conditional defaults for log configs, AFAIK (e.g. using a The async buffer (if enabled, which it is by default, but can be disabled) is leaky once a limit is reached. Regarding metrics for this, see #72453. The infra wasn't there to record this until quite recently, see #106607. |
bors r=dhartunian |
Build succeeded: |
Previously, the default timeout for the httpSink was
for there to be no timeout at all.
This means that the first call to
output()
on wherethe http target was unavailable would hang forever.
This would deadlock the calling goroutine, whether
that's the bufferedSink flush goroutine, or (even worse)
a server goroutine in the event that the httpSink is not
buffered.
Our default timeout should not deadlock in the worst case
scenario. Admittedly,
2s
would also cause a noticeableperformance degradation in the event that the httpSink was
unbuffered, but it would at least be able to emit logs
indicating the timeout as the cause. Availability would also
be maintained to some degree. Previously, the deadlocks
due to no timeout being set by default meant that no
indication was ever given that the httpSink was unable
to reach the http target.
Release note (ops change): The default value of
timeout
for
http-servers
logging sinks has been changed from"no timeout" to
2s
. This will be reflected in thehttp-defaults
section of the log configuration. Usersstill maintain the ability to override the timeout, or
disable it by explicitly setting it to
0
(e.g.timeout: 0
).Fixes: #109263