-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: Add Encoding/Decoding APIs for new System.Buffer types #30882
Comments
public static class EncodingExtensions
{
public static void Encode(this Encoding encoding, ReadOnlySpan<char> chars, IBufferWriter<byte> writer);
- public static void Encode(this Encoding encoding, ReadOnlySequence<char> chars, IBufferWriter<byte> writer);
+ public static void Encode(this Encoding encoding, in ReadOnlySequence<char> chars, IBufferWriter<byte> writer);
- public static OperationStatus Decode(this Encoding encoding, in ReadOnlySequence<byte> bytes, Span<char> chars, out int charsWritten, out SequencePosition bytesConsumedPosition);
+ public static OperationStatus Decode(this Encoding encoding, in ReadOnlySequence<byte> bytes, Span<char> chars, out SequencePosition bytesConsumedPosition, out int charsWritten);
public static OperationStatus Decode(this Encoding encoding, in ReadOnlySequence<byte> bytes, IBufferWriter<char> writer, out SequencePosition bytesConsumedPosition);
} |
Updated |
Keep in mind each API call would incur a single tiny allocation for the |
There might be a fast path for the single segment case that avoids the allocation but good reminder (I actually implemented these before filing the issue). Do you think it makes sense to expose a static API that takes an encoder/decoder? The fact that those can be reset would make it possible to re-use them as well.
What if there's incomplete data? I got 2 bytes of a 3 byte UTF8 scalar/codepoint and the next read will have all 3 of the bytes? I'd prefer if these operations not throw but fail in a way that lets the caller provide more data. |
Unlikely, as they wouldn't play well with the other APIs on
What's an example of a scenario where this would occur? My understanding of The // These APIs could return 'long' instead of 'void' if you wanted to know how many elements
// were written to the destination. Additionally, the contract would be that they'd consume the
// input sequence in its entirety, hence no "elements consumed" output parameter. Remember
// to pass "flush = true" on the final call if you're calling this in a loop.
public static void GetBytes(this Encoder encoder, ReadOnlySequence<char> chars, IBufferWriter<byte> bytes, bool flush);
public static void GetChars(this Decoder decoder, ReadOnlySequence<byte> bytes, IBufferWriter<char> chars, bool flush); |
For some context behind my "sequence of sequences" comment above: generally speaking text APIs only ever operate on entirely self-contained text. For example, we don't have an Transcoding has always been one of very few special cases where it is possible to operate on data in a chunked manner. But due to the fact that you can't really operate on chunked data other than shuttle it around - you need to reconstitute into a single contiguous buffer the entirety of the data before you can perform textual operations on it - I want to make sure that we're not encouraging devs to treat |
It doesn’t need to play well but it can’t throw an exception. The thing reading the stream shouldn’t need to try catch, it just needs to know how much data was consumed.
Using a ReadOnlySequence<byte> that comes from a PipeReader. It’s arbitrarily sized data that comes from a network or file. This actually came from me trying to implement a TextReader on top of pipe. All I have is an encoding and a PipeReader (let me look at what StreamReader does because it has the same problem)
That’s not bad. I’ll add that to the proposal and leave these ones to mean fully formed sequence. He problem is it’s a pit of failure when using it with pipes. |
Sounds like we're circling around taking If you wanted a "writes to void Convert(..., ReadOnlySequence<T> src, Span<U> dest, bool flush, out SequencePosition consumed, out int written, out bool completed); Note that this follows the already-established API shape for |
FWIW I wish the Check this out: In theory, if we were to completely redo the Here's the logic for Imagine that the entirety of the |
OK so the |
OK I've gone back and forth on this and I think the original proposal is fine. It'll throw if we don't have a fully formed payload. I'll leave the Encoder/Decoder based overloads to another proposal. |
So what's the proposal then? The current issue description still includes |
@GrabYourPitchforks updated |
The behavior of the APIs that write to |
That's OK. Isn't it the case today? If you want a Span big enough you need to call GetByteCount/GetCharCount right?
Fixed, well sorta. You can't write more than an Int32 worth of data so, it'll throw if the ReadOnlySequence represents > more bytes than an int. I also added GetString and GetBytes overloads that have the same behavior. |
It's not clear to me which assembly these APIs live in. Are you suggesting they go in System.Buffers.dll because |
They have to go where System.Memory lives. |
|
Would it be better to flip these APIs to be UTF8 and IBufferWriter-centric? Something like:
|
I think it would absolutely make sense to have UTF-8 specific APIs for this. But there are also scenarios where we'd want support for non-UTF-8 encodings, such as in an HTTP web request where the client requests that the response be in an alternative encoding. The implementation of |
Agree, that's why I have two overloads in my sketch above. |
Ah, sorry, misinterpreted your suggestion. @davidfowl, would a UTF8 specific helper be beneficial for your scenarios? We could even make that one OperationStatus based if you want. |
In some scenarios as a fast path. Today we're still using the encoding types all over ASP.NET Core (and in the broader .NET library ecosystem as a whole). So maybe eventually but as for right now I'm more concerned with adding overloads for IBW and ReadOnlySequence all over the stack. |
After further thought I don't think |
Today our encoding APIs support
Span<byte>
/Span<char>
/char[]
/byte[]
and the Encoder/Decoder API, I'd like to propose some higher level APIs that supportReadOnlySequence<byte/char>
andIBufferWriter<byte/char>
:These extensions would likely live in the same assembly as these abstractions since encoding couldn't directly take a dependency on those types (unless we push them lower in the stack).
cc @GrabYourPitchforks
The text was updated successfully, but these errors were encountered: