Fix unexpected errors when cancelling QUIC stream mid RPC#408
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes QUIC stream-close behavior to avoid corrupting in-flight RPC reads on cancelled streams, and improves QUIC/WebTransport connection-close observability by adding close reasons.
Changes:
- Update
stream.Close()to cancel the read side explicitly instead of attempting to drain bytes. - Add descriptive reason strings to
CloseWithErrorcalls for QUIC/WebTransport connections and sessions. - Add changeset entries documenting both changes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
rhp/v4/quic/quic.go |
Prevents concurrent close/drain from stealing bytes; adds QUIC close reason strings for better logging/debuggability. |
.changeset/pass_proper_descriptions_when_closing_quic_connection.md |
Documents the addition of close reason strings. |
.changeset/always_cancel_read_side_of_stream_when_calling_streamclose.md |
Documents the stream read-cancellation change on close. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
.changeset/pass_proper_descriptions_when_closing_quic_connection.md
Outdated
Show resolved
Hide resolved
.changeset/always_cancel_read_side_of_stream_when_calling_streamclose.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
rhp/v4/quic/quic.go
Outdated
| conn, err := quic.DialAddr(ctx, addr, tc, qc) | ||
| // Dial the QUIC connection using 0-RTT | ||
| // https://quic-go.net/docs/quic/client/#0rtt-security | ||
| conn, err := quic.DialAddrEarly(ctx, addr, tc, qc) |
There was a problem hiding this comment.
Due to the design of the TLS protocol, clients cannot directly request session tickets or unilaterally enable 0-RTT. These capabilities depend on the server’s configuration and support.
It needs to use TLS session resumption. There’s no way to use 0-RTT without a TLS session ticket.
The server’s support for session resumption, indicated by the session ticket issued on the initial connection, must be present.
The ALPN (configured using tls.Config.NextProtos) on the new connection must be the same.
Are we sure 0-RTT will "just work"? Otherwise LGTM
There was a problem hiding this comment.
It looks like we need to manually handle rejections. Probably drop the 0-RTT change for now.
conn, err := tr.DialEarly(ctx, <server address>, <tls.Config>, <quic.Config>)
// ... error handling
_, err :=conn.AcceptStream()
if errors.Is(err, quic.Err0RTTRejected) {
// The server rejected 0-RTT. None of the data sent so far was processed.
newConn := conn.NextConnection()
}
There was a problem hiding this comment.
I'll leave this out for now and only enable it explicitly on the server side as that is the default for the http3.Server anyway.
This PR is related to this other open PR: https://github.com/SiaFoundation/indexd/pull/864
When testing it out I noticed errors in the logs such as
These errors were always happening on cancelled streams.
The problem is that
Closeread up to 4kib of data while an ongoing RPC might be reading data at the same time. Depending on the timing that would corrupt response data in different ways. By always explicitly closing the reading side, those errors disappeared because we didn't "steal" data from the stream and the RPC still got unblocked.The reason these errors are bad in the first place is that they are neither context errors, quic errors nor rpc errors. So they lead to the transport being reset which harmed performance.
The other part of this PR is adding descriptions whenever we call
CloseWithError. That's just useful information to have when logging a QUIC error.