-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an in-box implementation of IBufferWriter<byte> backed by a resizable byte array #28538
Comments
ValueStringBuilder has interesting ability to use whatever initial span it is provided and then switch to ArrayPool. |
Yep, but that forces the type to be a ref struct which doesn't work for this scenario. |
Should this be a |
My goal here is to provide support for writing to an array and stream. How would this help for that case, and how would it reduce copies? |
To use it with a stream you can write every segment of ROS via Stream.WriteAsync. An array can be created by calling There would be way less array resizing and data copies if we can just append segments to a linked list for new data. Would also cause less memory fragmentation for large payloads because it avoids allocating arrays of larger and larger size. |
Going through Here is a sample implementation that uses BufferSegment: |
What scenarios do you need
How is "rare" determined? If 256 used as default size and 2x as resizing algorithm you'll go through 256-512-1024-2048-4096 to write some non primitive JSON output. |
Copying the data that has been written so far to a stream. public async Task CopyToAsync(Stream stream, CancellationToken cancellationToken = default)
{
CheckIfDisposed();
if (stream == null)
throw new ArgumentNullException(nameof(stream));
var sequence = new ReadOnlySequence<byte>(_writeStart, 0, _writeHead, _writeHead.End);
int dataLength = (int)sequence.Length;
byte[] pooledBuffer = ArrayPool<byte>.Shared.Rent(dataLength);
sequence.CopyTo(pooledBuffer);
await stream.WriteAsync(pooledBuffer.AsMemory(0, dataLength), cancellationToken).ConfigureAwait(false);
_committed += _written;
ArrayPool<byte>.Shared.Return(pooledBuffer);
ClearHelper();
}
Yes, I return them. I return them on Dispose() or whenever we resize.
We can modify the growth scheme based on perf measurements, but that is an implementation detail. My point was, we resize only after some amount of writes, which means the cost is amortized. We could jump from 256 directly to 4096 and then start doubling, if it helps. The MemoryPool-based impl that I shared, results in a regression for every write (since Writing HelloWorld goes from ~220 ns to ~300ns. Larger payloads regress by 7-10%. |
It's not exactly required in that scenario. You can just
What benchmarks are you using? I thought we are caching span inside Utf8JsonWriter so why does it matter so much how fast GetSpan is? |
Both implementations are very similar in all but couple ways: Sequence:
Array:
No-one says we can't have both but we need to figure out what performs better in real world scenarios (rendering reasonably sized JSON in kestrel for example) |
Approved with feedback: dotnet/apireviews#88 |
Resolving the feedback from dotnet/apireviews#88 & dotnet/corefx#35094, here is the final API shape: namespace System.Buffers
{
public sealed partial class ArrayBufferWriter<T> : IBufferWriter<T>, IDisposable
{
public ArrayBufferWriter() { }
public ArrayBufferWriter(int initialCapacity) { }
public int AvailableSpace { get { throw null; } } // == Capacity - CurrentIndex
public int Capacity { get { throw null; } }
public int CurrentIndex { get { throw null; } }
public ReadOnlyMemory<T> OutputAsMemory { get { throw null; } }
public ReadOnlySpan<T> OutputAsSpan { get { throw null; } }
public void Clear() { }
// Implements IDisposable
public void Dispose() { }
// Implements IBufferWriter<T>
public void Advance(int count) { }
public Memory<T> GetMemory(int sizeHint = 0) { throw null; }
public Span<T> GetSpan(int sizeHint = 0) { throw null; }
}
} |
For 2. I assume that's |
Yes.
No, you can't reuse the object after the first dispose. Are you asking because you want to avoid allocating multiple instances of |
Yes, so its an allocation to decorate an array with the IBufferWriter interface (plus auto resizing); but if you were pooling the instance you shouldn't hang on to the actual ArrayPool's array. However would be ok, to go custom for this; as, looking at the implementation, also might not want to force clearing the data either before returning. |
any ETA on this issue? no reply from dotnet/corefx#35094 |
I'll submit a PR in the next couple of days (working on it atm). Thanks for the ping. |
It seems that the array option was chosen in the end, and that the class was not made disposable. Would it not be more favorable to make the class disposable, and use I could not extract this from the discussion above, but let me know if the answer was already implied somewhere. |
Motivation
The
Utf8JsonWriter
is a ref struct which acceptsIBufferWriter<byte>
for synchronous writing. Today, we provide one implementation of this interface,PipeWriter
. If the user wants to write to an array or stream, or wants to write asynchronously to an output sink other thanPipeWriter
, they end up implementing their own customIBufferWriter<byte>
. To enable writing JSON to an array or stream (sync/async) using the newUtf8JsonWriter
, we need to provide a built-in implementation ofIBufferWriter<byte>
so that every caller doesn't have to write their own.Proposed API
Sample Usage
Microsoft.Extensions.DependencyModel:
https://github.com/dotnet/core-setup/blob/c304bb38c1462f0f1633fc7ed2c317dcc5c72c9c/src/managed/Microsoft.Extensions.DependencyModel/DependencyContextWriter.Utf8JsonWriter.cs#L15-L48
Sample implementation of
ArrayBufferWriter
:https://gist.github.com/ahsonkhan/1275dd34aaff4239cfdb7b213aa4ebe4
Open Questions
ArrayBufferWriter
?IBufferWriter<byte>
specifically rather than be generic? It implementsIBufferWriter<byte>
since that's what the scenario requires (writing the underlying bytes to the stream).Utf8JsonWriter
acceptsIBufferWriter<byte>
so, as a struct, it would get boxed and the state changes won't be visible to the caller.BytesCommitted
or is it useful?BytesWritten
asInt64
or is keeping it asInt32
fine?BytesWritten
will never be> int.MaxValue
.OutputAsMemory
/OutputAsSpan
?cc @terrajobst, @davidfowl, @stephentoub, @KrzysztofCwalina, @steveharter, @bartonjs, @joshfree, @benaadams, @JeremyKuhne, @pakrym, @eerhardt
The text was updated successfully, but these errors were encountered: