Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference and gRPC servers can send independent header and trailer metadata #840

Conversation

jhump
Copy link
Member

@jhump jhump commented Mar 28, 2024

The connect-go API doesn't provide a handler a direct way to send independent response headers and trailers from a unary or client-stream RPC. This is because a handler may return either a *connect.Response or a *connect.Error, but not both. But a *connect.Error does not allow setting both headers and trailers: it has only a Meta() accessor, which ostensibly could be serialized using either headers or trailers.

The reason this is the case is because frameworks can sometimes decide to send all metadata together, such as in a "trailers-only" response (allowed by the gRPC and gRPC-Web protocols for more compact replies when there is no message data). Because a framework decision could change whether a piece of metadata shows up as a header or a trailer, clients should generally provide a combined view of headers and trailers in the face of an error, to avoid bugs where the client looks for the metadata as a header, but it was actually serialized as a trailer (i.e. in a "trailers-only" response).

Having said all that, it can be surprising for users to see headers and trailers combined in the response from the reference server, in an HTTP trace. So to make the reference server's behavior more intuitive, we send headers and trailers separately -- by going around the direct APIs offered by connect-go.

For client streams, there is an indirect way provided: one can access the underlying connect.StreamingHandlerConn and then directly interact with response headers and trailers.

But for unary RPCs, we have to be sneakier. For the Connect unary protocol, all metadata gets serialized as headers. And trailers are differentiated via a "trailer-" prefix. So we can just prefix the metadata that is intended to be trailers with "trailer-". For the gRPC and gRPC-Web protocols, there are no simple hacks to get the connect-go framework to reply as desired. So we leverage the existing "raw response" functionality to go around the framework and directly control what is written to the wire for the response for these cases.


resp := connect.NewResponse(&conformancev1.UnaryResponse{
Payload: payload,
return doUnary(ctx, req, s.referenceMode, func(payload *conformancev1.ConformancePayload) *conformancev1.UnaryResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤩

@jhump jhump merged commit 06820d5 into main Apr 2, 2024
6 checks passed
@jhump jhump deleted the jh/server-can-send-independent-headers-and-trailers-in-reference-mode branch April 2, 2024 15:51
jhump added a commit to connectrpc/connect-kotlin that referenced this pull request Apr 2, 2024
We are hoping to cut a _final_ v1.0.0 of the conformance suite this
week. It has a couple of minor changes from rc4. I wanted to make sure
they wouldn't cause issues (particularly, [this pending
PR](connectrpc/conformance#840), which could
possibly impact some of the logic I just added in #248 related to
reporting trailers and merging headers+trailers in error metadata).

And... this ended up exposing some issues.

For one, I had not updated the Connect stream protocol in the same way
as the gRPC-Web and gRPC protocols regarding merging of headers+trailers
in metadata. I didn't notice in the previous PR because it was actually
the "trailers only" responses that tickled test failures related to this
(the connect-go reference server will only use "trailers only" responses
for gRPC-Web, not for other protocols). For two, this code path was not
correctly setting the trailers of the `StreamResult.Completed` object.
Oops 🤦.

After I fixed that, I just happened to observe a conformance test flake
that caused the client to hang. It turned out to be due to uncaught
exception while trying to drain the response body in a unary RPC. So
that's what warranted the other changes in here, adding `try/catch` to
the unary flow of `ConnectOkHttpClient` and extracting the
`safeTrailers` helper so it can also be used from that unary flow.

While making the above change, I greatly simplified the `safeTrailers`
helper because it seemed to be incorrect: it was almost always computing
empty trailers in these unary flows (which causes issues for gRPC calls,
which rely on trailers to convey the status code). After re-reading the
code, it's not clear why it was wrong, and I didn't have the patience to
try to put it in a debugger to step through it and understand it. I just
simplified it, removing the questionable conditional, and that fixed it.
jhump added a commit that referenced this pull request Apr 5, 2024
I thought I had tested the change in #840 against a bunch of other
implementations, but I clearly missed this. In that PR, the code was
not correctly percent-encoding the "grpc-message" trailer. It turns out
that most implementations (including grpc-java, grpc-go, and connect-go)
were all lenient in the face of this issue, which is _how_ I missed
this.

What was happening is that `net/http` was _dropping_
the invalid header (since one of the tests has a newline in the message,
which is not allowed in headers) but it was still being picked up from
the `grpc-status-details-bin` header. Connect-kotlin seems to be
the only implementation that insists on using the `grpc-message`
value as the error message, regardless of what's in
`grpc-status-details-bin`, so a test case failed since the error message
it reported did not match the expected value. (I could have _sworn_
I tested that commit against kotlin last week, but I guess I didn't
actually do that correctly.)

So this fixes the issue: when the reference server synthesizes a raw
response, to reply with independent header and trailer metadata, it now
correctly encodes the "grpc-message" key.

The _bulk_ of this PR is actually adding new checks in the reference
client that could have caught this issue: it examines the "grpc-status",
"grpc-message", and "grpc-status-details-bin" values to verify that
the server implementation adheres to the specs. It also applies
similar checks to other binary metadata.

There are a handful of small other changes in here too that I addressed
while troubleshooting and fixing this issue.
pkwarren pushed a commit to connectrpc/connect-kotlin that referenced this pull request Apr 19, 2024
We are hoping to cut a _final_ v1.0.0 of the conformance suite this
week. It has a couple of minor changes from rc4. I wanted to make sure
they wouldn't cause issues (particularly, [this pending
PR](connectrpc/conformance#840), which could
possibly impact some of the logic I just added in #248 related to
reporting trailers and merging headers+trailers in error metadata).

And... this ended up exposing some issues.

For one, I had not updated the Connect stream protocol in the same way
as the gRPC-Web and gRPC protocols regarding merging of headers+trailers
in metadata. I didn't notice in the previous PR because it was actually
the "trailers only" responses that tickled test failures related to this
(the connect-go reference server will only use "trailers only" responses
for gRPC-Web, not for other protocols). For two, this code path was not
correctly setting the trailers of the `StreamResult.Completed` object.
Oops 🤦.

After I fixed that, I just happened to observe a conformance test flake
that caused the client to hang. It turned out to be due to uncaught
exception while trying to drain the response body in a unary RPC. So
that's what warranted the other changes in here, adding `try/catch` to
the unary flow of `ConnectOkHttpClient` and extracting the
`safeTrailers` helper so it can also be used from that unary flow.

While making the above change, I greatly simplified the `safeTrailers`
helper because it seemed to be incorrect: it was almost always computing
empty trailers in these unary flows (which causes issues for gRPC calls,
which rely on trailers to convey the status code). After re-reading the
code, it's not clear why it was wrong, and I didn't have the patience to
try to put it in a debugger to step through it and understand it. I just
simplified it, removing the questionable conditional, and that fixed it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants