-
Notifications
You must be signed in to change notification settings - Fork 838
Avoid unnecessary RST_STREAM and PING frames causing GOAWAY messages #4480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…EAM and PING frames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prevents GOAWAY errors from API servers by ensuring HTTP response bodies are fully consumed before closing connections. The changes implement a workaround for HTTP/2 connection management issues where closing a partially-read response body triggers unnecessary RST_STREAM and PING frames that can cause servers to send GOAWAY messages.
Key changes:
- Added
io.Copy(io.Discard, resp.Body)calls before closing response bodies in error paths and after successful parsing - Modified deferred
resp.Body.Close()calls to deferred functions that first drain the body - Added the
ioimport to files where it was previously not needed
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/rpc/json.go | Added body draining in error paths and after successful JSON decoding |
| utils/dynamicip/ifconfig_resolver.go | Changed defer statement to drain body before closing |
| tests/fixture/tmpnet/node.go | Changed defer statement to drain body before closing |
| tests/fixture/tmpnet/monitor_processes.go | Changed defer statement to drain body before closing |
| tests/fixture/tmpnet/check_monitoring.go | Changed defer statement to drain body before closing |
| api/metrics/client.go | Added body draining in error paths and after successful metric parsing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
utils/rpc/json.go
Outdated
| // CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by ensuring | ||
| // the whole body is read before being closed. | ||
| // See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive | ||
| func CleanlyCloseBody(body io.ReadCloser) error { | ||
| _, _ = io.Copy(io.Discard, body) | ||
| return body.Close() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember when first adding the error returning logic to the prod clients that it was kind of "we should probably return the error if one happens"... But honestly, if the client was able to get the response... I feel like dropping the error isn't unreasonable.
Should we just ignore this everywhere and make this function not return an error? We could ignore the error once here, and then everywhere just defer CleanlyCloseBody(resp.Body)
| // CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by ensuring | |
| // the whole body is read before being closed. | |
| // See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive | |
| func CleanlyCloseBody(body io.ReadCloser) error { | |
| _, _ = io.Copy(io.Discard, body) | |
| return body.Close() | |
| } | |
| // CleanlyCloseBody avoids sending unnecessary RST_STREAM and PING frames by | |
| // ensuring the whole body is read before being closed. | |
| // See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive | |
| func CleanlyCloseBody(body io.ReadCloser) { | |
| _, _ = io.Copy(io.Discard, body) | |
| _ = body.Close() | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very reasonable and I'm happy to make this change because it cleans up the code, though I do think it's a potential behavior change outside the scope of the intention of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my testing, but io.Copy can return a cancelation error since it respects the context - so we might want to bubble up that error instead.
| _ = resp.Body.Close() | ||
| }() | ||
|
|
||
| ipBytes, err := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already read until EOF here - do we still need to drain the response body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the CleanlyCloseBody helper, I think it's most defensive to use it uniformly across the board rather than depending on the fact that the body is always fully read otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment in this diff in this package w.r.t io.ReadAll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, now that we have the CleanlyCloseBody helper, I think it's most defensive to use it uniformly across the board rather than depending on the fact that the body is always fully read otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this client behavior update in this file? This is only used in e2e tests so I feel like this is over-kill for an environment that is just a smoke test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes a difference either way practically speaking, but I think applying best practices everywhere is better than not, particularly now that we're just uniformly deferring the call to the helper everywhere (which I realize is a change from when this comment was left). Thoughts?
Co-authored-by: Stephen Buttolph <stephen@avalabs.org>
StephenButtolph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Hopefully this will be avoided upstream. But until then this seems to be a reasonable fix.
Why this should be merged
Avoids GOAWAY errors from various API servers by avoiding client implementations that send extra unnecessary RST_STREAM and PING frames. The workaround is adopted from the Cloudflare blog linked below, and in comments.
How this works
See https://blog.cloudflare.com/go-and-enhance-your-calm/#reading-bodies-in-go-can-be-unintuitive
How this was tested
Using a minimal reproduction script here. On that branch, running the script with the changes in
json.goshould yield no errors, but removing the changes injson.goand running the script consistently results in ~125 GOAWAY errors.Open to suggestions for how to better test it in an automated fashion. I could not think of a good way to do so myself since it depends on the external server setup.
Need to be documented in RELEASES.md?
No