-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive allocations in HttpContent.ReadAsByteArrayAsync #81628
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionWhen using I'm trying to find ways to heavily reduce memory allocations while still having full access to the power Configuration.NET 7.0.101, Windows 11 Pro, build 22621, x64 Regression?Same thing observed with .NET 6, allocations almost identical. DataBenchmark: [MemoryDiagnoser]
public class HttpTests
{
private readonly HttpClient client;
private const string url = "http://sampleserver6.arcgisonline.com/arcgis/rest/services/Census/MapServer/3/query?where=1%3D1&text=&objectIds=&time=&timeRelation=esriTimeRelationOverlaps&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&havingClause=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&returnExtentOnly=false&sqlFormat=none&datumTransformation=¶meterValues=&rangeValues=&quantizationParameters=&featureEncoding=esriDefault&f=pjson";
public HttpTests()
{
client = new HttpClient();
}
[Benchmark]
public async Task<byte[]> Client_GetByteArrayAsync()
{
// Allocates ~1mb, about size of response
var response = await client.GetByteArrayAsync(url).ConfigureAwait(false);
return response;
}
[Benchmark]
public async Task<byte[]> Client_Send_GetByteArrayAsync()
{
// Allocates ~4mb, about 4 times the response size
var request = new HttpRequestMessage(HttpMethod.Get, url);
var response = await client.SendAsync(request);
var bytes = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
return bytes;
}
}
Now I was able to reduce memory by about 1mb, using this approach, but still quite short of the simple public async Task<int> ReadAsSpanAsync()
{
var request = new HttpRequestMessage(HttpMethod.Get, url);
var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false);
using MemoryStream stream = new MemoryStream();
await response.Content.CopyToAsync(stream);
ReadOnlyMemory<byte> bytes;
if (!stream.TryGetBuffer(out ArraySegment<byte> dataSegment))
throw new Exception("Can't get buffer");
bytes = dataSegment;
if(bytes.Span.Length == 0)
throw new Exception("Buffer empty");
return bytes.Span.Length;
}
AnalysisLooking at the HttpContent, lots of byte array copying is performed to ensure no access to the internal memorystream that was buffered to. It might be helpful to create some access to this data with
|
I'm unclear as to the issue being reported. The title says that HttpContent.GetByteArrayAsync had excessive allocation, but there isn't such a method, and the text of the issue says HttpClient.GetByteArrayAsync had the desired allocation. Is this issue asking how to write code that reads the response stream and achieves the same allocations as HttpClient.GetByteArrayAsync? Or did you intend to ask about HttpContent.ReadAsByteArrayAsync? |
@stephentoub this screen explains the issue in a glance https://twitter.com/dotMorten/status/1621672844760592385?t=mpRkFiS_w25HzqUxGsL1Tg&s=19 (Sorry for doing a Kramer entrance, but I'd like to improve my networking stack too 😉) |
@stephentoub sorry for the typo. Fixed the title. |
If you want this code to be efficient, don't allocate the |
I do want to allocate the byte array for the contents (it’ll be sent to native code in my case as is) but I don’t want to allocate it 4 times over. As shown above I was able to use stream and reduce a little bit but it is still 3 times as inefficient |
The difference comes from a few places:
In contrast, GetByteArrayAsync doesn't have to buffer in a way that's reusable, so it buffers into ArrayPool memory, creates a byte[] from that, and returns the temporary arrays to the pool. ReadAsByteArrayAsync could also use temporary ArrayPool buffers, but it needs to store the resulting temporary for any future calls (because of the "allow multiple future reads" design), which either it would need to hold onto the last ArrayPool array it got indefinitely and hope that the developer disposes of the response message appropriately (note your repro doesn't), or it would need to copy the final ArrayPool array into yet another temporary; this would enable pooling the temporaries along the way, but would still require copying the final one. We could explore doing using ArrayPool in that manner. We could also explore using a stream that uses a linked list of chunks rather than keeping the data contiguous. But frankly if you care about allocation, I'm with David here: consume the Stream yourself. At that point you can avoid allocating any temporary buffers for the response data. |
@stephentoub thanks for the detailed answer. I’m more than happy to use stream but I still don’t see how I can use SendAsync and consume the HttpContent’s stream and still get the same low allocation that GetByteArrayAsync does. As shown in my 3rd benchmark I can reduce it by 30% with stream still far from the 75% reduction that might be possible (and an array pool added that might become 50%). From what I can tell the buffered stream will always be there, even when returning as soon as headers have been read. (Good calll on the dispose - my real code does that or when using in .net framework it ends up exhausting all connections if hitting the same server a lot in parallel) |
It won't. If you use ResponseHeadersRead and get the stream with ReadAsStream, you can read from that and there won't be any additional buffering by the HttpContent. GetBytesArrayAsync isn't doing anything magical; it's just copying from that same response stream into a stream backed by array pool arrays. |
Thanks again! I lifted out I think we can close this, but I gotta admit that it might be nice with some new overload/method to get the contents out with low allocation without all those hoops. |
I think this has been answered and resolved, so I'm closing this. |
It’s resolved but there’s a reasonable point about the fact that there’s no one liner that does the equivalent (not suggesting we should break ReadAsByteArrayAsync) |
Thanks David. I found if there’s a content length header the allocations are “only” 2x the download size. Without the length header it’s 4x. |
That's because of the growth doubling-algorithm I mentioned in #81628 (comment). The extra full 2x is also the extreme version of this where you double the last time but then use little-to-none of the increased size. The HttpClient.GetByteArrayAsync implementation still experiences that exact same 2x, but it does so using pooled arrays, so it's less visible in an allocation profile.
If we were to do anything here, I'd actually suggest this be the path forward. It'd be interesting to understand who actually uses ReadAsByteArrayAsync multiple times on the same content and also mutates the returned byte[]. If the answer is "almost no one", as I suspect, it'd be simpler and cheaper to just take a breaking change to the implementation to not make a defensive copy. |
I'm not sure how we get this information, but I agree.
With an opt out switch? |
Ideally not. The opt-out switch is: if you're doing this, make a copy of the byte[] on your own instead of mutating the original.
Search for all calls on GitHub and see if/how folks are using it. Then try to make the change, document it as breaking, and see who complains :) |
Would calling HttpContext.LoadIntoBufferAsync() be the workaround?
I would just do this 😄. |
My proposal would be to just hand out the loaded buffer, in which case, no. |
I think as long as something like this works: HttpClient client = new HttpClient();
var response = await client.GetAsync(url);
var b0 = await response.Content.ReadAsByteArrayAsync();
response.Content = new ByteArrayContent(b0);
var b1 = await response.Content.ReadAsByteArrayAsync(); Then we have a solid workaround for people that need to replay. |
That would suffer from the same issue: if b0 were handed out and b0 were mutated, it'd be mutating the same array stored by the new ByteArrayContent and thus would affect subsequent reads. My suggestion was just: HttpClient client = new HttpClient();
var response = await client.GetAsync(url);
byte[] b0 = (byte[])(await response.Content.ReadAsByteArrayAsync()).Clone();
... // mutate b0 to your hearts content, as it's a copy of what's in the HttpContent In other words, allow whether a copy is made to be left up to the caller rather than forcing it upon everyone. (Regardless of whether we make such a change, we could change the implementation of ReadAsByteArrayAsync to use the same pooling stream that GetByteArrayAsync uses. The reason we don't today is because, whereas GetByteArrayAsync just copies the final stream's data to a byte[] and returns that, the code in ReadAsByteArrayAsync stores the filled stream itself, and we don't want that storing buffers from the array pool. It could instead be made to copy out to a byte[] and store that, but then you're actually increasing the amount of allocation for shorter responses that didn't require growing beyond the initial allocation, in particular if you need to hand out copies. It's also complicated by other methods like ReadAsStreamAsync and ReadAsStringAsync, which don't require the byte[] underlying the stream be exactly sized, so they could no longer share all of the same buffering code.) |
Description
When using
HttpClient.SendAsync
and reading theHttpResponse.Content
into a byte-array, it results in allocations in my testing 4 times that of the actual http response. If usingHttpClient.GetByteArrayAsync
allocations much closer to the actual http response is made. However this API call only allows for basic get requests, and no way to look at response headers etc.I'm trying to find ways to heavily reduce memory allocations while still having full access to the power
SendAsync
provides over the simplifiedGet*
methods.Configuration
.NET 7.0.101, Windows 11 Pro, build 22621, x64
Regression?
Same thing observed with .NET 6, allocations almost identical.
Data
Benchmark:
Now I was able to reduce memory by about 1mb, using this approach, but still quite short of the simple
GetByteArrayAsync
:Analysis
Looking at the HttpContent, lots of byte array copying is performed to ensure no access to the internal memorystream that was buffered to. It might be helpful to create some access to this data with
ReadOnlyMemory<byte>
or similar.The text was updated successfully, but these errors were encountered: