Improve JsonSerializer.ReadAsync(Stream) throughput #35823
Conversation
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Outdated
Show resolved
Hide resolved
If you're optimizing for MemoryStream cases, you could also look at if (stream is MemoryStream ms && ms.TryGetBuffer(out var buffer))
{
// use buffer directly
var span = new ReadOnlySpan<byte>(buffer.Array, buffer.Offset, buffer.Count);
// use span based utf8json overload.
} works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Read.Stream.cs
Show resolved
Hide resolved
{ | ||
// We have less than half the buffer available, double the buffer size. | ||
bufferSize = (bufferSize < HalfMaxValue) ? bufferSize * 2 : int.MaxValue; | ||
byte[] dest = ArrayPool<byte>.Shared.Rent((buffer.Length < (int.MaxValue / 2)) ? buffer.Length * 2 : int.MaxValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked this before (#35609 (comment)), but
Renting
int.MaxValue
would throw OOM. Should we cap it to something smaller (either 2GB, or maybeint.MaxValue - 56
)?
Is there guidelines or common pattern to follow in such scenarios?
Some low hanging fruit: - The method was using the Stream.ReadAsync overload that returns a `Task<int>`. Changed it to take a `Memory<byte>` so that it instead returns a `ValueTask<int>`. - The method was clearing the full rented buffer upon returning it to the pool, even when only using a small portion of it. Changed it to only clear what was used. - The method was resizing the buffer unnecessarily due to faulty logic around how much data remained. Fixed it to only resize when more than half the buffer is full, which was the original intention. - The ReadCore method is a bit chunky to call, initializing a new Utf8JsonReader each time, copying large structs, etc. Since we need to read all of the data from the stream anyway, I changed it to try to fill the buffer, which then minimizes the number of times we need to call ReadCore, in particular avoiding the extra empty call at the end. We can tweak this further in the future as needed, e.g. only making a second attempt to fill the buffer rather than doing so aggressively. - Also fixed the exception to contain the full amount of data read from the stream, not just from the most recent call.
af23048
to
6dd3ef3
Compare
Nice! |
Yup, that might be reasonable. There's a cost associated with special-casing like that, and a caller could just do it themselves, but if we think MemoryStream would commonly be used here, it'd be reasonable (I don't know if it would be). I was using it here just as a data point to compare against. |
) * Improve JsonSerializer.ReadAsync(Stream) throughput Some low hanging fruit: - The method was using the Stream.ReadAsync overload that returns a `Task<int>`. Changed it to take a `Memory<byte>` so that it instead returns a `ValueTask<int>`. - The method was clearing the full rented buffer upon returning it to the pool, even when only using a small portion of it. Changed it to only clear what was used. - The method was resizing the buffer unnecessarily due to faulty logic around how much data remained. Fixed it to only resize when more than half the buffer is full, which was the original intention. - The ReadCore method is a bit chunky to call, initializing a new Utf8JsonReader each time, copying large structs, etc. Since we need to read all of the data from the stream anyway, I changed it to try to fill the buffer, which then minimizes the number of times we need to call ReadCore, in particular avoiding the extra empty call at the end. We can tweak this further in the future as needed, e.g. only making a second attempt to fill the buffer rather than doing so aggressively. - Also fixed the exception to contain the full amount of data read from the stream, not just from the most recent call. * Address PR feedback Commit migrated from dotnet/corefx@7d1a6b0
Some low hanging fruit:
Task<int>
. Changed it to take aMemory<byte>
so that it instead returns aValueTask<int>
.Before:
After:
Benchmark (example SimpleTestClass borrowed from @ahsonkhan):
cc: @ahsonkhan, @steveharter, @bartonjs, @davidfowl, @jkotas