-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Enable JsonEncodedText to be initialized with already encoded bytes. #67850
Comments
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsBackground and motivationI'm trying to efficiently serialize/deserialize private JsonEncodedText(byte[] utf8Value)
{
Debug.Assert](utf8Value != null);
_value = JsonReaderHelper.GetTextFromUtf8(utf8Value);
_utf8Value = utf8Value;
} There is no way to implement a fake encoder to override the default encoder of the sealed class NoEncoder : JavaScriptEncoder
{
public override int MaxOutputCharactersPerInputCharacter => 1;
public override unsafe int FindFirstCharacterToEncode( char* text, int textLength ) => -1;
public override unsafe bool TryEncodeUnicodeScalar( int unicodeScalar, char* buffer, int bufferLength, out int numberOfCharactersWritten )
{
Throw.NotSupportedException();
}
public override bool WillEncode( int unicodeScalar ) => false;
} A simple static method with an explicit name such as the one below will be great and as safe as it can be: public static JsonEncodedText CreateFromEncodedUtf8Bytes( ReadOnlySpan<byte> bytes!! ) => new JsonEncodedText( bytes ); API ProposalAdd these 2 public static methods to the JsonEncodedText class. public static JsonEncodedText CreateFromEncodedUtf8Bytes( ReadOnlySpan<byte> bytes ) => new JsonEncodedText( bytes.ToArray() );
public static JsonEncodedText CreateFromEncodedUtf8Bytes( byte[] bytes!! ) => new JsonEncodedText( bytes ); API UsageThe use of the span below is totally optional. The first goal is to avoid serializing a string in UTF-16 (that will be encoded in UTF-8) and restore a string (that will be read as UTF-8 and encoded in UTF-16 that will need be encoded back to UTF-8!)... Net effect: 0 vs. 1 encoding during write and 1 encoding vs. 2 during read... /// <summary>
/// Writes a <see cref="JsonEncodedText"/>.
/// </summary>
/// <param name="this">This writer.</param>
/// <param name="t">The text.</param>
public static void Write( this ICKBinaryWriter @this, JsonEncodedText t )
{
@this.WriteNonNegativeSmallInt32( t.EncodedUtf8Bytes.Length );
@this.Write( t.EncodedUtf8Bytes );
}
/// <summary>
/// Reads a <see cref="JsonEncodedText"/>.
/// </summary>
/// <param name="this">This reader.</param>
/// <returns>The read text.</returns>
public static JsonEncodedText ReadJsonEncodedText( this ICKBinaryReader @this )
{
int len = @this.ReadNonNegativeSmallInt32();
return JsonEncodedText.Encode( @this.ReadBytes( len ) );
} Alternative DesignsN/A RisksNone that I can imagine...
|
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsBackground and motivationI'm trying to efficiently serialize/deserialize private JsonEncodedText(byte[] utf8Value)
{
Debug.Assert](utf8Value != null);
_value = JsonReaderHelper.GetTextFromUtf8(utf8Value);
_utf8Value = utf8Value;
} There is no way to implement a fake encoder to override the default encoder of the sealed class NoEncoder : JavaScriptEncoder
{
public override int MaxOutputCharactersPerInputCharacter => 1;
public override unsafe int FindFirstCharacterToEncode( char* text, int textLength ) => -1;
public override unsafe bool TryEncodeUnicodeScalar( int unicodeScalar, char* buffer, int bufferLength, out int numberOfCharactersWritten )
{
Throw.NotSupportedException();
}
public override bool WillEncode( int unicodeScalar ) => false;
} A simple static method with an explicit name such as the one below will be great and as safe as it can be: public static JsonEncodedText CreateFromEncodedUtf8Bytes( ReadOnlySpan<byte> bytes!! ) => new JsonEncodedText( bytes ); API ProposalAdd these 2 public static methods to the JsonEncodedText class. public static JsonEncodedText CreateFromEncodedUtf8Bytes( ReadOnlySpan<byte> bytes ) => new JsonEncodedText( bytes.ToArray() );
public static JsonEncodedText CreateFromEncodedUtf8Bytes( byte[] bytes!! ) => new JsonEncodedText( bytes ); API UsageThe first goal is to avoid serializing a string in UTF-16 (that will be encoded in UTF-8) and restore a string (that will be read as UTF-8 and encoded in UTF-16 that will need be encoded back to UTF-8!)... Net effect: 0 vs. 1 encoding during write and 1 encoding vs. 2 during read... /// <summary>
/// Writes a <see cref="JsonEncodedText"/>.
/// </summary>
/// <param name="this">This writer.</param>
/// <param name="t">The text.</param>
public static void Write( this ICKBinaryWriter @this, JsonEncodedText t )
{
@this.WriteNonNegativeSmallInt32( t.EncodedUtf8Bytes.Length );
@this.Write( t.EncodedUtf8Bytes );
}
/// <summary>
/// Reads a <see cref="JsonEncodedText"/>.
/// </summary>
/// <param name="this">This reader.</param>
/// <returns>The read text.</returns>
public static JsonEncodedText ReadJsonEncodedText( this ICKBinaryReader @this )
{
int len = @this.ReadNonNegativeSmallInt32();
return JsonEncodedText.Encode( @this.ReadBytes( len ) );
} Alternative DesignsN/A RisksNone that I can imagine...
|
I believe @GrabYourPitchforks might have thoughts about making it possible to extend the STEW abstractions without writing unsafe code.
It seems clear to me that the current design is very explicitly trying to avoid this. I wouldn't expect the performance of creating |
The data is alreay encoded (since it's coming from a UTF-8 serialized stream/pipe). Calling Encode on it (even the one with https://source.dot.net/#System.Text.Json/System/Text/Json/JsonEncodedText.cs,105 |
@olivier-spinelli I just updated my response as you were posting yours. Please see my new feedback, in particular the final paragraph. |
Seen and thanks!
I surely can for write but then to read it back I need to convert the utf-8 to string before calling the Encode (that converts back the string to... the same content as the original one) to resurect a JsonEncodedText. |
#54410 would make it easier to read JSON strings the need to convert to a string. So you might be able to do something like char[] myBuffer = ArrayPool<char>.Shared.Rent(128);
int charsRead = reader.CopyString(myBuffer);
ReadOnlySpan<char> source = myBuffer.Slice(bytesRead);
if (_charText.SequenceEquals(source))
{
/* handle the value */
}
ArrayPool<char>.Shared.Return(myBuffer); |
Unfortunately I'm writing/reading from a binary reader and not through the Utf8JsonReader. I'm trying to efficiently serialize/deserialize JsonEncodedText... |
I don't think such a scenario is within scope for this particular type. It's meant as a helper struct aiding serialization performance of System.Text.Json, but it's not itself intended for serialization/deserialization. If performance is important enough, I would recommend writing your own custom struct instead. |
"helper struct aiding serialization performance" can be for other kind of serializers. Another struct will not interoperate with the rest of the code/library: a conversion to/from JsonEncodedText will be needed (and since there is no way to initialize a JsonEncodedText with both its bytes and string without conversions, this is a dead end). (Ooops I didn't notice you closed the issue.) |
I don't believe that is the case. We won't support scenaria where this struct is used outside of System.Text.Json, unless used within the contracts of the already exposed APIs. Ultimately, the reason that I closed this is that we generally don't ship factory methods that transfer ownership of buffers to newly created instances, since that would be fairly error prone. |
Noted. Thanks for your time! |
Background and motivation
I'm trying to efficiently serialize/deserialize
JsonEncodedText
: I just need to write theEncodedUtf8Bytes
and read then back as bytes, letting the constructor expanding the string. Unfortunately, the constructor that does this is private (called by theEncode
methods and this makes perfect sense since this ctor would be clearly ambiguous):
There is no way to implement a fake encoder to override the default encoder of the
Encode
method without going unsafe:A simple static method with an explicit name such as the one below will be great and as safe as it can be:
API Proposal
Add these 2 public static methods to the JsonEncodedText class.
API Usage
The first goal is to avoid serializing a string in UTF-16 (that will be encoded in UTF-8) and restore a string (that will be read as UTF-8 and encoded in UTF-16 that will need be encoded back to UTF-8!)...
Net effect: 0 vs. 1 encoding during write and 1 encoding vs. 2 during read...
(and only the required allocations since with the
CreateFromEncodedUtf8Bytes( byte[] bytes!! )
the ownership of the buffer isgiven to the JsonEncodedText.)
Alternative Designs
N/A
Risks
None that I can imagine...
The text was updated successfully, but these errors were encountered: