Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add write queue to Http2Connection to optimize write handling #39166

Merged
merged 6 commits into from
Jul 14, 2020

Conversation

geoffkizer
Copy link
Contributor

This gives about 33% improvement for the raw variation and 120% improvement for the full GRPC client variation on the GRPC benchmark at https://github.com/JamesNK/Http2Perf.

@stephentoub @JamesNK @dotnet/ncl

@ghost
Copy link

ghost commented Jul 12, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

private async Task PerformWriteAsync<T>(int writeBytes, T state, Func<T, Memory<byte>, FlushTiming> lockedAction, CancellationToken cancellationToken = default)
PerformWriteAsync(0, 0, (_, __) => true, cancellationToken);

private abstract class WriteQueueEntry : TaskCompletionSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be part of this PR, but given we're now allocating this on every write, it'd be really good to look at changing the base type to be one that implements IValueTaskSource and making this reusable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though it gets a little weird because we have a bunch of different write operations with different kinds of state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a tradeoff to be made here. This is similar to what pipelines does. You don't wait on individual writes but you can have backpressure based on how much outstanding buffer is unflushed/unconsumed. That removes the per operation overhead and allocations.

Do you care about cancelling individual writes or do you care about stopping the overall writing operation?

@geoffkizer
Copy link
Contributor Author

Pushed changes to address everything except Channel<T>, which is more involved. Will try that change and see what happens...

@JamesNK
Copy link
Member

JamesNK commented Jul 12, 2020

I'm curious why the gRPC client gets such a large performance improvement compared to the "raw HttpClient" scenario. If anything I would have expected a smaller benefit to the gRPC client just because HttpClient+HTTP/2 makes up a smaller percentage of the work happening when it is used.

Have you profiled where the improvement comes from?

@geoffkizer
Copy link
Contributor Author

I'm curious why the gRPC client gets such a large performance improvement compared to the "raw HttpClient" scenario. If anything I would have expected a smaller benefit to the gRPC client just because HttpClient+HTTP/2 makes up a smaller percentage of the work happening when it is used.

My assumption is that the "raw" mode isn't actually equivalent to full GRPC client, in terms of HTTP2 usage. That said, I haven't looked at the code for either in detail. Are you aware of any differences here?

Have you profiled where the improvement comes from?

No.

@JamesNK
Copy link
Member

JamesNK commented Jul 13, 2020

gRPC client has a custom HttpContent and reads content from Stream.

If you check out the readme at https://github.com/JamesNK/Http2Perf you can see there are 4 raw options. The closest one to the gRPC client is "r-stream-all"

@geoffkizer
Copy link
Contributor Author

I'll try running with "r-stream-all" when my machine is back in a normal state (hopefully soon)

@davidfowl
Copy link
Member

davidfowl commented Jul 13, 2020

Feels like this could use a Channel<T>

edit: Just saw this #39166 (comment)

@geoffkizer
Copy link
Contributor Author

I see similar (but slightly better) results with "r-stream-all" as compared to "g":

r-stream-all
baseline: 56
this pr: 129
difference: 130%

g (full GRPC client)
baseline: 51
this pr: 113
difference: 122%

@JamesNK
Copy link
Member

JamesNK commented Jul 13, 2020

Ok! So this change offers bigger perf benefits to custom HttpContent implementations than to ByteArrayContent.

For reference: https://github.com/JamesNK/Http2Perf/blob/65e634321bdb76dcb6311dc2396a9e0504fbe189/GrpcSampleClient/PushUnaryContent.cs#L22-L36

Edit: Interesting, the benchmark still has the older copy of PushUnaryContent with multiple writes. They were merged into one write to speed up the gRPC client. Is there still performance hit from writing to the request stream multiple times?

@geoffkizer
Copy link
Contributor Author

Is there still performance hit from writing to the request stream multiple times?

There could be.

@geoffkizer
Copy link
Contributor Author

The Channel<T> stuff works nicely, and seems to be a wash in terms of perf.

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my HttpClient novice perspective 😄

@davidfowl
Copy link
Member

@geoffkizer would you be open to try a branch using pipelines internally (even if we don't check it in?) for a performance comparison?

The only reason I ask is because the Pipe implementation is really close to the usage here.

@stephentoub stephentoub merged commit a00b1e2 into dotnet:master Jul 14, 2020
@geoffkizer
Copy link
Contributor Author

@davidfowl I'd be happy to give it a try, but I'm not sure how you see pipelines fitting in here. Can you explain?

@davidfowl
Copy link
Member

The mix of ArrayBuffer + Channel is basically what the Pipe implementation gives you. This logic introduced into Http2Connection is what Kestrel does with a pipe. Pipe is used as the queue of bytes being shuffled between the application logic (the http2stream) and the logic writing to the socket (http2connection)

@geoffkizer
Copy link
Contributor Author

Yeah, but we are not just shuffling bytes here. The callback is responsible for a bunch of stuff other than just pushing bytes to the connection -- e.g. assigning stream ID. framing, etc.

@davidfowl
Copy link
Member

davidfowl commented Jul 14, 2020

Yeah, but we are not just shuffling bytes here. The callback is responsible for a bunch of stuff other than just pushing bytes to the connection -- e.g. assigning stream ID. framing, etc.

In the end its all bytes, you'd just shift from doing it earlier to doing it later. You enqueue the write and then later frame the buffers on the way out. This has the added benefit of potentially batching bigger writes together framed a single HTTP/2 data frame on the wire.

For reference, here's Kestrel's implementation:

This is where the loop is kicked off:

https://github.com/dotnet/aspnetcore/blob/a861d18d244a632281ef4a1402f29ad1051e5bf9/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L77

This is the loop itself:
https://github.com/dotnet/aspnetcore/blob/a861d18d244a632281ef4a1402f29ad1051e5bf9/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L392

Http2FrameWriter is the component that does the flow control and adds http framing to the bytes:
https://github.com/dotnet/aspnetcore/blob/a861d18d244a632281ef4a1402f29ad1051e5bf9/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L451

The callsite writes to the pipe directly and flushes:
https://github.com/dotnet/aspnetcore/blob/a861d18d244a632281ef4a1402f29ad1051e5bf9/src/Servers/Kestrel/Core/src/Internal/Http2/Http2OutputProducer.cs#L233

cc @halter73 to clarify if I got anything wrong

@geoffkizer
Copy link
Contributor Author

Are you talking about specifically for the request body (not headers etc)? Because if so, then I get it and I agree. I'm planning on investigating something like this shortly. There's some additional groundwork that needs to happen first, though.

@JamesNK
Copy link
Member

JamesNK commented Jul 15, 2020

Are you talking about specifically for the request body (not headers etc)? Because if so, then I get it and I agree.

Yes, although in Kestrel's case everything is reversed because it is the server. The pipe that is being read from contains body bytes written by the executing RequestDelegate. They are intended to be the response's DATA frames.

The ProcessDataWrites method is writing the bytes it reads from the pipe as DATA frames (I believe the first write will send headers) and waiting for ReadResult.IsComplete so it can end the stream (either with DATA + end stream flag or HEADER trailers + end stream flag)

@geoffkizer
Copy link
Contributor Author

Yes, we do something similar for reading from the connection today; what we don't do is we don't do any buffering when we write to the connection. Instead we just pass the user's buffer along and wait for the write to be processed. Part of what I'm planning to explore is to change this so we buffer the user writes, which potentially enables a couple things: (1) not having to wait for the write queue (assuming buffer space is not exhausted); (2) consolidate user writes, including the end stream flag; (3) simplify some handling in the connection write logic.

@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@geoffkizer geoffkizer deleted the http2writequeue branch November 7, 2020 23:42
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants