-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: reduce log spam during network breakdown #51937
Conversation
By the way, the frequency (30s) is something I pulled out of thin air. I can revive the goid based duplication if asked, or even elevate the specific warnings to errors now that they're a bit more infrequent. The PR was basically reviving what we already had in place with this |
8876595
to
e1c39b6
Compare
LGTM - no objection on maintaining this technical debt I wonder though if it's possible to identify the error by looking at the call stack instead: if one of the caller is the connect function, don't log (or something like that). You'll need to rebase to address the CI failure (or merge as-is: TestComposeGSS is now green on master) |
e1c39b6
to
2ff9827
Compare
TFTR, I've rebased. I thought about trying extract a more specific error and convinced myself that the callstack would be as fragile (ish) as what we've done here. Was also being lazy. bors r+ |
Flaked on #52057, retrying. |
Touches cockroachdb#32581. We currently spam a lot whenever there is any issues with network connectivity. Seems like we used to dedup this log spam away before, but the strings we were matching against have gone stale (there's no more `addConn.resetTransport` in grpc code). Also, we used to dedup log spam away by looking at goroutine thread ID, instead of the message itself, which this patch does instead. This reduces spam in situations where we liberally construct one-off connections to downed/partitioned nodes (for e.g., during start up). ``` W200727 19:02:29.108962 392 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 grpc: addrConn.createTransport failed to connect to {127.0.0.1:26257 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:26257: connect: connection refused". Reconnecting... W200727 19:02:30.111670 392 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 grpc: addrConn.createTransport failed to connect to {127.0.0.1:26257 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting... ``` Release note (general change): Previously CRDB would spam its logs when undergoing network connectivity issues. This patch reduces the frequency of said spam.
2ff9827
to
e3940e2
Compare
bors r+ |
Flaked on #52026. |
Build succeeded: |
Touches #32581. We currently spam a lot whenever there is any issues
with network connectivity. Seems like we used to dedup this log spam
away before, but the strings we were matching against have gone stale
(there's no more
addConn.resetTransport
in grpc code). Also, we usedto dedup log spam away by looking at goroutine thread ID, instead of the
message itself, which this patch does instead. This reduces spam in
situations where we liberally construct one-off connections to
downed/partitioned nodes (for e.g., during start up).
Release note (general change): Previously CRDB would spam its logs
when undergoing network connectivity issues. This patch reduces the
frequency of said spam.