Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8JsonWriter API Proposal #27938

Closed
ahsonkhan opened this issue Nov 16, 2018 · 24 comments
Closed

Utf8JsonWriter API Proposal #27938

ahsonkhan opened this issue Nov 16, 2018 · 24 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@ahsonkhan
Copy link
Member

A JsonWriter API that supports writing UTF-8 encoded data natively with emphasis on high performance and low allocations.

namespace System.Text.Json {

    public sealed class JsonWriterException : Exception {
        public JsonWriterException(string message);
    }

    public struct JsonWriterOptions {
        // Alternative names: Indent(ed), Minify(ied), Minimize(d), PrettyPrint(ed)
        public bool Formatted { get; set; }
        public bool SkipValidation { get; set; }
    }

    public struct JsonWriterState {
        public JsonWriterState(int maxDepth = DefaultMaxDepth, JsonWriterOptions options = default);
        public JsonWriterOptions Options { get; }
        public int MaxDepth { get; }
        public long BytesWritten { get; } // long?
    }

    public static class Utf8JsonWriter {
        public static OperationStatus EscapeString(ReadOnlySpan<byte> value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static OperationStatus EscapeString(ReadOnlySpan<char> value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static OperationStatus EscapeString(string value, Span<byte> destination, out int consumed, out int bytesWritten);
        public static Utf8JsonWriter<IBufferWriter<byte>> CreateFromStream(Stream stream, JsonWriterState state);
        public static Utf8JsonWriter<IBufferWriter<byte>> CreateFromMemory(Memory<byte> memory, JsonWriterState state);
    }

    public ref struct Utf8JsonWriter<TBufferWriter> where TBufferWriter : IBufferWriter<byte> {
        public Utf8JsonWriter(TBufferWriter bufferWriter, JsonWriterState state);
        public int BytesWritten { get; }
        public int CurrentDepth { get; }
        public JsonWriterState CurrentState {get; }
        public JsonTokenType TokenType { get; } // set?
        public void Flush(bool isFinalBlock = true); // should it validate before advancing the IBufferWriter?

        // These APIs exist mainly for performance, hence optional.
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<int> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<long> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<uint> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<ulong> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<decimal> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<double> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<float> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<Guid> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTime> values);
        public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTimeOffset> values);
        // Can't support Span of Spans. Is there value in adding support for string (or byte) arrays?
        //public void WriteArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<string> values);

        // All permutation of start/end array/object with various property name formats.
        public void WriteEndArray();
        public void WriteEndObject();

        public void WriteStartArray();
        public void WriteStartArray(ReadOnlySpan<byte> propertyName);
        public void WriteStartArray(ReadOnlySpan<char> propertyName);
        public void WriteStartArray(string propertyName);

        public void WriteStartObject();
        public void WriteStartObject(ReadOnlySpan<byte> propertyName);
        public void WriteStartObject(ReadOnlySpan<char> propertyName);
        public void WriteStartObject(string propertyName);

        // Writing property name & values (for writing tokens within JSON objects)
        public void WriteBoolean(ReadOnlySpan<byte> propertyName, bool value);
        public void WriteBoolean(ReadOnlySpan<char> propertyName, bool value);
        public void WriteBoolean(string propertyName, bool value);

        public void WriteBytesUnchecked(ReadOnlySpan<byte> propertyName, ReadOnlySpan<byte> utf8Bytes);
        public void WriteBytesUnchecked(ReadOnlySpan<char> propertyName, ReadOnlySpan<byte> utf8Bytes);
        public void WriteBytesUnchecked(string propertyName, ReadOnlySpan<byte> utf8Bytes);

        public void WriteNull(ReadOnlySpan<byte> propertyName);
        public void WriteNull(ReadOnlySpan<char> propertyName);
        public void WriteNull(string propertyName);

        public void WriteNumber(ReadOnlySpan<byte> propertyName, decimal value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, double value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, int value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, long value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, float value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, uint value);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, ulong value);

        public void WriteNumber(ReadOnlySpan<char> propertyName, decimal value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, double value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, int value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, long value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, float value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, uint value);
        public void WriteNumber(ReadOnlySpan<char> propertyName, ulong value);

        public void WriteNumber(string propertyName, decimal value);
        public void WriteNumber(string propertyName, double value);
        public void WriteNumber(string propertyName, int value);
        public void WriteNumber(string propertyName, long value);
        public void WriteNumber(string propertyName, float value);
        public void WriteNumber(string propertyName, uint value);
        public void WriteNumber(string propertyName, ulong value);

        public void WriteString(ReadOnlySpan<byte> propertyName, DateTime value);
        public void WriteString(ReadOnlySpan<byte> propertyName, DateTimeOffset value);
        public void WriteString(ReadOnlySpan<byte> propertyName, Guid value);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<byte> value);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<char> value);
        public void WriteString(ReadOnlySpan<byte> propertyName, string value);

        public void WriteString(ReadOnlySpan<char> propertyName, DateTime value);
        public void WriteString(ReadOnlySpan<char> propertyName, DateTimeOffset value);
        public void WriteString(ReadOnlySpan<char> propertyName, Guid value);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<byte> value);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value);
        public void WriteString(ReadOnlySpan<char> propertyName, string value);

        public void WriteString(string propertyName, DateTime value);
        public void WriteString(string propertyName, DateTimeOffset value);
        public void WriteString(string propertyName, Guid value);
        public void WriteString(string propertyName, ReadOnlySpan<byte> value);
        public void WriteString(string propertyName, ReadOnlySpan<char> value);
        public void WriteString(string propertyName, string value);

        // Writing values (for single-value JSON or for writing tokens within JSON arrays)
        public void WriteNull();
        public void WriteValue(bool value);
        public void WriteValue(DateTime value);
        public void WriteValue(DateTimeOffset value);
        public void WriteValue(decimal value);
        public void WriteValue(double value);
        public void WriteValue(Guid value);
        public void WriteValue(int value);
        public void WriteValue(long value);
        public void WriteValue(float value);
        public void WriteValue(uint value);
        public void WriteValue(ulong value);

        public void WriteValue(ReadOnlySpan<byte> utf8Text);
        public void WriteValue(ReadOnlySpan<char> utf16Text);
        public void WriteValue(string utf16Text);

        public void WriteBytesUnchecked(ReadOnlySpan<byte> utf8Bytes);

        public void WriteComments(string comment);
        public void WriteComments(ReadOnlySpan<char> comment);
        public void WriteComments(ReadOnlySpan<byte> comment);
    }
}
Previous iteration of the APIs
namespace System.Text.Json {
    // Factory
    public static class Utf8JsonWriter {
        public static Utf8JsonWriter<TBufferWriter> Create<TBufferWriter>(TBufferWriter bufferWriter, bool prettyPrint = false) where TBufferWriter : IBufferWriter<byte>;
        // Other potential factory methods:
        public static Utf8JsonWriter<TBufferWriter> Create<TBufferWriter>(Stream stream, bool prettyPrint = false);
        // Can't support writing to a span directly.
    }
    public ref struct Utf8JsonWriter<TBufferWriter> where TBufferWriter : IBufferWriter<byte> {
        // ctors
        public Utf8JsonWriter(TBufferWriter bufferWriter, bool prettyPrint = false);

        // Advances the IBufferWriter
        public void Flush();
        
        public void WriteStartArray();
        public void WriteEndArray();
        public void WriteStartArray(ReadOnlySpan<byte> name);
        // public void WriteStartArray(string name);
        
        public void WriteStartObject();
        public void WriteEndObject(); // Throws FormatException - should it throw JsonWriterException?
        public void WriteStartObject(ReadOnlySpan<byte> name);
        // public void WriteStartObject(string name);

        // Writing all supported .NET types - which ones should we support?
        // These APIs exist mainly for performance.
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<int> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<bool> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<string> values);
        public void WriteArray(ReadOnlySpan<byte> name, ReadOnlySpan<ReadOnlyMemory<byte>> utf8StringBytes);
        // ... etc.

        // string-based overloads that transcode to UTF-8?
        public void WriteArray(string name, ReadOnlySpan<int> values);
        public void WriteArray(string name, ReadOnlySpan<bool> values);
        public void WriteArray(string name, ReadOnlySpan<string> values);
        public void WriteArray(string name, ReadOnlySpan<ReadOnlyMemory<byte>> utf8StringBytes);
        // ... etc.

        // Writing all supported .NET types - which ones should we support?
        public void WriteNameValue(ReadOnlySpan<byte> name, bool value);
        public void WriteNameValue(ReadOnlySpan<byte> name, DateTime value);
        public void WriteNameValue(ReadOnlySpan<byte> name, DateTimeOffset value);
        public void WriteNameValue(ReadOnlySpan<byte> name, Guid value);
        public void WriteNameValue(ReadOnlySpan<byte> name, long value);
        public void WriteNameValue(ReadOnlySpan<byte> name, ulong value);
        public void WriteNameValue(ReadOnlySpan<byte> name, string value);
        public void WriteNameValue(ReadOnlySpan<byte> name, ReadOnlySpan<byte> utf8Bytes);
        public void WriteNameValueAsBase64(ReadOnlySpan<byte> name, ReadOnlySpan<byte> utf8Bytes);
        // Provide escape overloads that allow skipping escaping the name and values where relevant?
        // ... etc.

        // string-based overloads that transcode to UTF-8?
        public void WriteNameValue(string name, bool value);
        public void WriteNameValue(string name, DateTime value);
        public void WriteNameValue(string name, DateTimeOffset value);
        public void WriteNameValue(string name, Guid value);
        public void WriteNameValue(string name, long value);
        public void WriteNameValue(string name, ulong value);
        public void WriteNameValue(string name, string value);
        public void WriteNameValue(string name, ReadOnlySpan<byte> utf8StringBytes);
        public void WriteNameValueRaw(string name, ReadOnlySpan<byte> utf8Bytes);
        public void WriteNameValueAsBase64(string name, ReadOnlySpan<byte> utf8Bytes);
        // ... etc.

        public void WriteNull(ReadOnlySpan<byte> name);
        public void WriteNull();
     
        // Matching the capabilities of WriteNameValue - which ones should we support?
        public void WriteValue(bool value);
        public void WriteValue(DateTime value);
        public void WriteValue(DateTimeOffset value);
        public void WriteValue(Guid value);
        public void WriteValue(long value);
        public void WriteValue(ulong value);
        public void WriteValue(string value);
        // public void WriteValue(string value, bool escape);
        public void WriteValue(ReadOnlySpan<byte> utf8StringBytes);
        // public void WriteValue(ReadOnlySpan<byte> utf8Bytes, bool escape);
        public void WriteValueRaw(ReadOnlySpan<byte> utf8Bytes);
        public void WriteValueAsBase64(ReadOnlySpan<byte> utf8Bytes);
        // ... etc.

        public void Write(Utf8JsonReader reader, bool writeChildren = false);
    }
}

Sample Usage:

private static void WriteHelloWorld(bool formatted, ArrayFormatterWrapper output)
{
    var json = new Utf8JsonWriter<ArrayFormatterWrapper>(output, prettyPrint: formatted);

    json.WriteStartObject();
    json.WriteNameValue(Message, HelloWorld);
    json.WriteEndObject();
    json.Flush();
}

private static void WriteBasicJson(bool formatted, ArrayFormatterWrapper output, ReadOnlySpan<int> data)
{
    Utf8JsonWriter<ArrayFormatterWrapper> json = Utf8JsonWriter.Create(output, prettyPrint: formatted);

    json.WriteStartObject();
    json.WriteNameValue("age", 42);
    json.WriteNameValue("first", "John");
    json.WriteNameValue("last", "Smith");
    json.WriteStartArray("phoneNumbers");
    json.WriteValue("425-000-1212");
    json.WriteValue("425-000-1213");
    json.WriteEndArray();
    json.WriteStartObject("address");
    json.WriteNameValue("street", "1 Microsoft Way");
    json.WriteNameValue("city", "Redmond");
    json.WriteNameValue("zip", 98052);
    json.WriteEndObject();
    json.WriteArray(ExtraArray, data);
    json.WriteEndObject();
    json.Flush();
}

From SignalR: https://github.com/ahsonkhan/SignalR/blob/9d4a51d6c1eb7cb2a68b154e107e4265fc804b7d/src/Microsoft.AspNetCore.Http.Connections.Common/NegotiateProtocol.cs#L31

        public static void WriteResponse(NegotiationResponse response, IBufferWriter<byte> output)
        {
            Utf8JsonWriter<IBufferWriter<byte>> jsonWriter = Utf8JsonWriter.Create(output);

            jsonWriter.WriteObjectStart();

            if (!string.IsNullOrEmpty(response.Url))
            {
                jsonWriter.WriteAttribute(UrlPropertyName, response.Url);
            }

            if (!string.IsNullOrEmpty(response.AccessToken))
            {
                jsonWriter.WriteAttribute(AccessTokenPropertyName, response.AccessToken);
            }

            if (!string.IsNullOrEmpty(response.ConnectionId))
            {
                jsonWriter.WriteAttribute(ConnectionIdPropertyName, response.ConnectionId);
            }

            jsonWriter.WriteArrayStart(AvailableTransportsPropertyName);

            if (response.AvailableTransports != null)
            {
                foreach (var availableTransport in response.AvailableTransports)
                {
                    jsonWriter.WriteObjectStart();
                    jsonWriter.WriteAttribute(TransportPropertyName, availableTransport.Transport);
                    jsonWriter.WriteArrayStart(TransferFormatsPropertyName);

                    if (availableTransport.TransferFormats != null)
                    {
                        foreach (var transferFormat in availableTransport.TransferFormats)
                        {
                            jsonWriter.WriteValue(transferFormat);
                        }
                    }

                    jsonWriter.WriteArrayEnd();
                    jsonWriter.WriteObjectEnd();
                }
            }

            jsonWriter.WriteArrayEnd();
            jsonWriter.WriteObjectEnd();

            jsonWriter.Flush();
        }

For Reference:

Notes:

  • Why is the type generic? This type relies on BufferWriter as an implementation detail (ref struct BufferWriter<T> where T : IBufferWriter<byte>). From @benaadams:
  1. If you pass BufferWriter<IBufferWriter<byte>> a struct formatter; it will box to the IBufferWriter<byte> type, allocation and perf issue. Using the generic will avoid the allocation and boxing.
  2. The methods called internally to BufferWriter<T> (e.g. GetSpan and Advance) will go via generic interface dispatch in a shared generic; which can't inline and is also the slowest calling convention for the clr. Changing it to the generic means when you pass it a struct type it will use a non-shared generic and direct, inlinable calls.

See dotnet/corefxlab#2358 (comment) for more details.

  • Some of the APIs exist for performance reasons, to reduce the chattiness and to reduce interface calls. In some cases, they are also convenient (like WriteNameValue, or WriteStartArray which accepts a name).
  • Since this is a ref struct (like the reader), async support would need to be built on top.
  • We cannot support writing directly to a span since ref structs cannot implement interfaces (i.e. the IBufferWriter<byte>). If we can get language support to enable that, we could add a span-based factory method.

Questions:

  1. What .NET Types should we support? Decimal? TimeSpan? All numeric types? Uri? BigInteger? Byte/Char?
  2. Should we provide string-based overloads at all (that transcode before writing)?
  3. Allow writing comments? What should the semantics of such an API be?
  4. Do we need to provide an extensible JsonWriterOption which enables features like MaxDepth, custom indentation character, custom new line character, string escape handling? Currently, we only have two options, which are served by a bool parameter within a ctor:
    • minimized (default, i.e. no extra white space)
    • pretty printed (properly indented with 2 space characters based on depth with Environment.NewLine)
    • We escape everything based on a white-list, by default. One alternative could be to provide overloads that accept bool escape.
  5. General API and argument names:
    • WriteAttribute, WritePropertyNameValue, WriteKeyValue, etc.
    • PrettyPrint or Formatted?
    • Argument name - name or propertyName?
  6. How do we want to deal with API overloads that accept System.String, UTF-8 string as ReadOnlySpan<byte>, and raw bytes as ReadOnlySpan<byte> that need to be Base64-encoded? Should the argument name remain value or be explicitly different? The current proposal is to add AsBase64 to the API name to help with overload resolution.
  7. What should the API name be for writing null values? WriteNull or WriteValueNull/WriteNameValueNull?
  8. Should we provide support for writing raw JSON data as string/UTF-8 bytes?
  9. Should we have static factory methods or are constructors good enough?
Utf8JsonWriter<ArrayFormatterWrapper> json = Utf8JsonWriter.Create(output, prettyPrint: true);
var json = new Utf8JsonWriter<ArrayFormatterWrapper>(output, prettyPrint: true);

cc @KrzysztofCwalina, @terrajobst, @davidfowl, @steveharter, @joshfree, @benaadams

@ahsonkhan ahsonkhan self-assigned this Nov 16, 2018
@eerhardt
Copy link
Member

Why is this in the System.Text.Json namespace? We don't/didn't put the Xml functionality in System.Text.Xml?

@eerhardt
Copy link
Member

Also, why do we need Utf8 in the class name? According to RFC 4627 UTF-8 is the default encoding. Why do we need to put the default encoding in the class name? Can't it be safe to assume that if you don't qualify the encoding, it would be the default?

@Tornhoof
Copy link
Contributor

Tornhoof commented Nov 17, 2018

I like:

  • Factory method for Stream (via some kind of IBufferWriter wrapper) for simpler use for many scenarios.
  • Supporting pretty print oob
  • A specified way to write byte arrays, i.e. via base64

I'd like to see:

  • Support for writing all types which already have an optimized UTF8 Formatter, or implement that ISpanFormatter or IBufferFormatter interface.

I don't really like:

  • The convenience methods, i.e. WriteStartArray("phoneNumbers"). I think they only clutter the API and you can simple use something like WriteAttribute and then the appropriate value writer method.
  • Supporting JSON comments oob, this should be hidden behind a (disabled) setting, as rfc8259 does not allow this, the default configuration should be rfc conform.

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Nov 17, 2018

Why is this in the System.Text.Json namespace? We don't/didn't put the Xml functionality in System.Text.Xml?

Given the reader is already there, we would put the writer there too. We put Xml functionality in System.Xml. Are you suggesting System.Json? If so, we have some APIs there already which do not fit in with these new APIs and could cause confusion. If not, what alternative namespace did you have in mind?

Why do we need to put the default encoding in the class name? Can't it be safe to assume that if you don't qualify the encoding, it would be the default?

Fair point. We already have Utf8JsonReader, so the writer follows suit. We could consider removing utf8 from both types and rely on "it is utf8 by default".

The convenience methods, i.e. WriteStartArray("phoneNumbers"). I think they only clutter the API and you can simple use something like WriteAttribute and then the appropriate value writer method.

Some of the APIs exist for performance reasons, to reduce the chattiness and to reduce interface calls. In some cases, they are also convenient (like WriteNameValue, or WriteStartArray which accepts a name).

We are trying to merge and do as much work as possible in single API calls for performance, and such an API helps there.

Supporting JSON comments oob, this should be hidden behind a (disabled) setting, as rfc8259 does not allow this, the default configuration should be rfc conform.

Unlike the reader, WriteComments is an API so it is difficult to provide an option to disallow/hide it. What should it do if the user set default option and still calls it? Throw? If the caller explicitly opt'd to call the API, they are knowingly violating rfc (which we can document). I am not sure if hiding it (maybe with EditorBrowsableNever) really helps, and having a setting here only gets in the way without much benefit.

@Tornhoof
Copy link
Contributor

We are trying to merge and do as much work as possible in single API calls for performance, and such an API helps there.

As I don't know your plans for a higher level serialization API, one which possibly will need some kind of codegen to be performant. This higher level serializer should then take care of i.e. precalculating the named array start. I understand that your optimized method could write three bytes directly ":[ as an optimization, but if you actually go the codegen route then these methods will never be used.

Unlike the reader, WriteComments is an API ...Throw?

Yes I would model it symmetrically in the Reader and Writer and throw if disabled. But I do admit it's more relevant for the Reader side.

@eerhardt
Copy link
Member

I didn't know the reader was already approved, I see now that this is following some of the namings from that.

Yes, I would recommend System.Json. That would be more natural IMO. Obviously JSON is Text. Just like Xml is Text. So I don't feel these APIs (neither the reader nor the writer) needs to be under System.Text. And it would be more symmetrical with System.Xml if this was under System.Json.

I don't think the existing APIs in the namespace would cause much confusion.

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Dec 11, 2018

API shape based on feedback from the previous review: dotnet/apireviews#82

  • Removed MaxDepth from JsonWriterState
  • Set max depth allowed to 1_000
  • Rename WriteValue to WriteNumberValue, WriteStringValue, etc.
  • Add bool suppressEscaping optional argument to all APIs that take "strings" (either as property name or value). The default is false (i.e. we escape everything).
  • Changed validation to only do structural validation
  • JsonWriter is no longer generic.
  • Renamed Formatted to Indented

Questions:

  • Does suppressEscaping only skip escaping property names (i.e. we always escape values).
  • Should we continue to provide the ReadOnlySpan<char> overloads as well as the string overloads?
namespace System.Text.Json {
    // Using InvalidOperationException instead, to be consistent with XmlWriter.
    /*public sealed class JsonWriterException : Exception {
        public JsonWriterException(string message);
    }*/
    public struct JsonWriterOptions {
        public bool Indented { get; set; }
        public bool SkipValidation { get; set; }
    }
    public struct JsonWriterState {
        public JsonWriterState(JsonWriterOptions options = default(JsonWriterOptions));
        public long BytesCommitted { get; }
        public long BytesWritten { get; }
        public JsonWriterOptions Options { get; }
    }
    public ref struct Utf8JsonWriter {
        public Utf8JsonWriter(IBufferWriter<byte> bufferWriter, JsonWriterState state = default(JsonWriterState));
        //public Utf8JsonWriter(Span<byte> outputSpan, JsonWriterState state = default(JsonWriterState));
        public long BytesCommitted { get; }
        public long BytesWritten { get; }
        public int CurrentDepth { get; }
        //public JsonWriterState CurrentState { get; } // Changing to a method instead of a property
        public JsonWriterState GetCurrentState();
        public void Flush(bool isFinalBlock = true);
        
        public void WriteStringArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTime> values, bool suppressEscaping = false);
        public void WriteStringArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<DateTimeOffset> values, bool suppressEscaping = false);
        public void WriteStringArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<Guid> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<decimal> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<double> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<int> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<long> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<float> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<uint> values, bool suppressEscaping = false);
        public void WriteNumberArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<ulong> values, bool suppressEscaping = false);
        public void WriteBooleanArray(ReadOnlySpan<byte> propertyName, ReadOnlySpan<bool> values, bool suppressEscaping = false);

        public void WriteStringArrayValue(ReadOnlySpan<DateTime> valuese);
        public void WriteStringArrayValue(ReadOnlySpan<DateTimeOffset> values);
        public void WriteStringArrayValue(ReadOnlySpan<Guid> values);
        public void WriteNumberArrayValue(ReadOnlySpan<decimal> values);
        public void WriteNumberArrayValue(ReadOnlySpan<double> values);
        public void WriteNumberArrayValue(ReadOnlySpan<int> values);
        public void WriteNumberArrayValue(ReadOnlySpan<long> values);
        public void WriteNumberArrayValue(ReadOnlySpan<float> values);
        public void WriteNumberArrayValue(ReadOnlySpan<uint> values);
        public void WriteNumberArrayValue(ReadOnlySpan<ulong> values);
        public void WriteBooleanArrayValue(ReadOnlySpan<bool> values);

        public void WriteBoolean(ReadOnlySpan<byte> propertyName, bool value, bool suppressEscaping = false);
        public void WriteBoolean(ReadOnlySpan<char> propertyName, bool value, bool suppressEscaping = false);
        public void WriteBoolean(string propertyName, bool value, bool suppressEscaping = false);
        public void WriteBooleanValue(bool value);
        public void WriteCommentValue(ReadOnlySpan<byte> value, bool suppressEscaping = false);
        public void WriteCommentValue(ReadOnlySpan<char> value, bool suppressEscaping = false);
        public void WriteCommentValue(string value, bool suppressEscaping = false);
        public void WriteEndArray();
        public void WriteEndObject();
        public void WriteNull(ReadOnlySpan<byte> propertyName, bool suppressEscaping = false);
        public void WriteNull(ReadOnlySpan<char> propertyName, bool suppressEscaping = false);
        public void WriteNull(string propertyName, bool suppressEscaping = false);
        public void WriteNullValue();
        public void WriteNumber(ReadOnlySpan<byte> propertyName, decimal value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, double value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, int value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, long value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, float value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, uint value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<byte> propertyName, ulong value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, decimal value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, double value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, int value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, long value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, float value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, uint value, bool suppressEscaping = false);
        public void WriteNumber(ReadOnlySpan<char> propertyName, ulong value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, decimal value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, double value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, int value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, long value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, float value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, uint value, bool suppressEscaping = false);
        public void WriteNumber(string propertyName, ulong value, bool suppressEscaping = false);
        public void WriteNumberValue(decimal value);
        public void WriteNumberValue(double value);
        public void WriteNumberValue(int value);
        public void WriteNumberValue(long value);
        public void WriteNumberValue(float value);
        public void WriteNumberValue(uint value);
        public void WriteNumberValue(ulong value);
        public void WriteStartArray();
        public void WriteStartArray(ReadOnlySpan<byte> propertyName, bool suppressEscaping = false);
        public void WriteStartArray(ReadOnlySpan<char> propertyName, bool suppressEscaping = false);
        public void WriteStartArray(string propertyName, bool suppressEscaping = false);
        public void WriteStartObject();
        public void WriteStartObject(ReadOnlySpan<byte> propertyName, bool suppressEscaping = false);
        public void WriteStartObject(ReadOnlySpan<char> propertyName, bool suppressEscaping = false);
        public void WriteStartObject(string propertyName, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, DateTime value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, DateTimeOffset value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, Guid value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<byte> value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, ReadOnlySpan<char> value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<byte> propertyName, string value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, DateTime value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, DateTimeOffset value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, Guid value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<byte> value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, ReadOnlySpan<char> value, bool suppressEscaping = false);
        public void WriteString(ReadOnlySpan<char> propertyName, string value, bool suppressEscaping = false);
        public void WriteString(string propertyName, DateTime value, bool suppressEscaping = false);
        public void WriteString(string propertyName, DateTimeOffset value, bool suppressEscaping = false);
        public void WriteString(string propertyName, Guid value, bool suppressEscaping = false);
        public void WriteString(string propertyName, ReadOnlySpan<byte> value, bool suppressEscaping = false);
        public void WriteString(string propertyName, ReadOnlySpan<char> value, bool suppressEscaping = false);
        public void WriteString(string propertyName, string value, bool suppressEscaping = false);
        public void WriteStringValue(DateTime value);
        public void WriteStringValue(DateTimeOffset value);
        public void WriteStringValue(Guid value);
        public void WriteStringValue(ReadOnlySpan<byte> value, bool suppressEscaping = false);
        public void WriteStringValue(ReadOnlySpan<char> value, bool suppressEscaping = false);
        public void WriteStringValue(string value, bool suppressEscaping = false);
    }
}

@mirkogeffken
Copy link

Have you considered making the API more interface based and providing a reasonable and safe implementation as a default and a performant option as an alternate?

Either to me me should possibly take overloads on either ctors or methods to propertyvalidators or valuevalidators with defaults being either complex or noop. Wouldn’t inlining handle the performance case and suck appropriately in the correct case? (I genuinely do not know)

To me formatting is a completely nice-to-have but valuable, but I think adds more value as a base type than an interface.

To me if intention checking is cheap (well base syntax in general) do that always.

Again correctness > performance but you should ideally be able to opt into the trust me model regardless of how wrong you are.

I still maintain that I would rather get my bank balance in 20ms over 10ms over it sometimes being off by -100M. If Jeff or Bill or Steve or Satya disagree please transfer said amount to my bank (won’t make a diff to you, but somehow my creditors and kids private school does (they are just too worried about stupid decimals). I told them soo many time that I transferred the exact 99.999 thing they were asking about in the appropriate currency I chose ;) (.s and ,s are TOTALLY the same thing) (hint I am German :)) I transferred the rounded up amount in Euros (€100) as i had a hard time worrying about the dumb decimals and they still complained. Wait they didn’t really mean $100K in dollars did they? That would be insane.

I am obviously being ridiculous, but my point is correctness always beats speed. Opt into speed, choose correctness. Leave it to nitwits to point to “but it is fast”.

I really struggle with watching design vids as I keep thinking please choose correctness and simplicity over performance first and mega complexity. I think way too many people publicize edge case scenarios in which C++ and Redis and ... tech fail. We need a push on if you know nothing about this you will most likely be OK and, if you care, here are your alternatives.

I mean this kindly. Correctness > performance, but ideal is opting into either or both or ... You guys genuinely have an impossible job!

@benaadams
Copy link
Member

Not sure about this:

JsonWriter is no longer generic.

Listening to the Design Review to understand the reasoning.

@benaadams
Copy link
Member

Where did this come from?

JsonWriter is no longer generic.

Or TODO in dotnet/corefxlab#2612

Consider making the type non-generic with a ctor overload that accepts span (or memory)

Didn't catch it in the design review. It does mean a struct IBufferWriter<byte> cannot be used.

The alternative .ctor taking Span<byte> isn't helpful with the Utf8JsonWriter apis as it it didn't have enough space in the Span presumably they throw?

So it would be control flow via throw/catch, which is anti-performance when presumably the Span<byte> overload is for performance?

cc @ahsonkhan, @bartonjs, @KrzysztofCwalina @terrajobst

@ahsonkhan
Copy link
Member Author

Marking as approved based on dotnet/apireviews#82.

Where did this come from?

JsonWriter is no longer generic.

Didn't catch it in the design review. It does mean a struct IBufferWriter<byte> cannot be used.

@jkotas provided feedback to change JsonWriter to no longer be generic. The usability concern with a generic JsonWriter was not worth the trade off of preventing boxing of a struct IBufferWriter<byte>. Furthermore, given the only publicly shipped IBufferWriter<byte> is a class (PipeWriter), it made sense to design for that instead. Also, based on recent benchmarking, there was no performance difference between a struct or class implementation of IBufferWriter<byte> (aside from the small allocation penalty). Making it generic actually regressed the class impl slightly.

The methods called internally to BufferWriter<T> (e.g. GetSpan and Advance) will go via generic interface dispatch in a shared generic; which can't inline and is also the slowest calling convention for the clr. Changing it to the generic means when you pass it a struct type it will use a non-shared generic and direct, inlinable calls.

Since we are no longer relying on BufferWriter<T>, we are not going through the slowest calling convention. Based on discussions with @jkotas, the call site in the current implementation is not expensive.

The alternative .ctor taking Span<byte> isn't helpful with the Utf8JsonWriter apis as it it didn't have enough space in the Span presumably they throw?

So it would be control flow via throw/catch, which is anti-performance when presumably the Span<byte> overload is for performance?

That is a good point and was an oversight on my part. Deferring writing directly to a span for now since we need a mechanism to grow the destination span if there is not enough space (removed that ctor).

@benaadams
Copy link
Member

Since we are no longer relying on BufferWriter<T>, we are not going through the slowest calling convention.

That's just a case of wrapping it in a struct to switch from the slowest calling convention to the fastest calling convention (see: TechEmpower/aspnetcore/PlatformBenchmarks); which is faster than the interface calls

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static BufferWriter<WriterAdapter> GetWriter(PipeWriter pipeWriter)
    => new BufferWriter<WriterAdapter>(new WriterAdapter(pipeWriter));

private struct WriterAdapter : IBufferWriter<byte>
{
    public PipeWriter Writer;
    public WriterAdapter(PipeWriter writer) => Writer = writer;
    public void Advance(int count) => Writer.Advance(count);
    public Memory<byte> GetMemory(int sizeHint = 0) => Writer.GetMemory(sizeHint);
    public Span<byte> GetSpan(int sizeHint = 0) => Writer.GetSpan(sizeHint);
}

Which is where a struct IBufferWriter<byte> comes in Benchmark gist

               Method |     Mean |     Error |    StdDev |
--------------------- |---------:|----------:|----------:|
 WriterGenericWrapper | 10.85 ns | 0.0098 ns | 0.0091 ns |
        WriterGeneric | 19.72 ns | 0.0492 ns | 0.0384 ns |
      WriterInterface | 19.73 ns | 0.0311 ns | 0.0260 ns |

Making it an interface base closes down this speed up .

Making it generic actually regressed the class impl slightly.

Shared generics can inline and then become direct calls rather than interface calls; if its interface based then its always an interface call regardless.

If a non-inlined shared generic is significantly slower than an interface dispatch, is that more for something for the Jit to resolve/work on; rather than alter the public api for it?

The usability concern with a generic JsonWriter was not worth the trade off of preventing boxing of a struct IBufferWriter<byte>.

People that need the extra performance will deal with the usability of the generic being worse; people that don't won't be using JsonWriter directly they will be using something higher level like a JsonDocument type or a JsonSerilaizer method so won't deal with the usability.

In practice for TechEmpower/.../BenchmarkApplication.Json.cs the change will look like this?

private static void Json(PipeWriter pipeWriter)
{
    var writer = GetWriter(pipeWriter);

    // HTTP 1.1 OK
    writer.Write(_http11OK);

    // Server headers
    writer.Write(_headerServer);

    // Date header
    writer.Write(DateHeader.HeaderBytes);

    // Content-Type header
    writer.Write(_headerContentTypeJson);

    // Content-Length header
    writer.Write(_headerContentLength);
    
-   var jsonPayload = JsonSerializer.SerializeUnsafe(new JsonMessage { message = "Hello, World!" });
-   writer.WriteNumeric((uint)jsonPayload.Count);
  
-   // End of headers
-   writer.Write(_eoh);

-   // Body
-   writer.Write(jsonPayload);
-   writer.Commit();
    
+   // Save location for writing length
+   var lengthWriter = writer;
+   writer.Write(_contentLengthGap);

+   // End of headers
+   writer.Write(_eoh);
            
+   // Flush to the writer
+   writer.Commit();
    
+   var jsonWriter = new Utf8JsonWriter(pipeWriter);
+   jsonWriter.WriteStartObject();
+   jsonWriter.WriteString("message", "Hello, World!");
+   jsonWriter.WriteEndObject();
+   jsonWriter.Flush();

+   // Go back and write the length
+   lengthWriter.WriteNumeric((uint)(jsonWriter.BytesCommitted));
}

@benaadams
Copy link
Member

Note: If it was generic would share the WriterAdapter between writer and jsonWriter instantiating both with the same wrapper; but the code change would be very similar.

@davidfowl
Copy link
Member

So now we’re forced to box and allocate 😫

@KrzysztofCwalina
Copy link
Member

The perf difference is staggering. I think @ahsonkhan should dig into it and if we cannot remove the overhead, consider going back to a generic writer.

@benaadams
Copy link
Member

Is there anything the Jit can do on the constrained shared generics? i.e. why is a regular interface call significantly faster? (If it is?)

@jkotas
Copy link
Member

jkotas commented Jan 18, 2019

The perf difference is staggering.

You are measuring CPU cycles on a small micro-benchmark. It is not the only performance metric to worry about. The code size and startup time is important as well. It will shows the same staggering difference once you start using the generic code stamping trick - in opposite direction. I had to answer many times (last time yesterday) why the libraries using the generic code stamping micro-optimization end up with large code footprint.

FWIW, 100% allocation-free JSON formatting is not mainline-enough scenario to me to warrant a public .NET API.

@benaadams
Copy link
Member

Yes but wouldn't the common usage be via a shared generic? The generic just leaves the door open to other optimizations; whereas the interface shuts them down.

@jkotas
Copy link
Member

jkotas commented Jan 18, 2019

Yes but wouldn't the common usage be via a shared generic?

But then the common usage pays extra overhead of shared generic.

@benaadams
Copy link
Member

Yes but wouldn't the common usage be via a shared generic?

But then the common usage pays extra overhead of shared generic.

Is there anything that can be improved on how that works? i.e. here and a lot of other classes are simple generic use where for the non-inlined shared generic the constrained type could be switched for the interface (as suggested doing in the C# above)

@jkotas
Copy link
Member

jkotas commented Jan 18, 2019

I think the improvement here is make the allocations cheaper or local to make the generic code-stamping to avoid allocations irrelevant. Escape analysis, manual annotations to aid escape analysis, local per-request or per-thread heaps, ...

@benaadams
Copy link
Member

The code-stamping with a wrapper struct also makes the calls faster direct calls rather than via interface.

Utf8JsonWriter/Reader are also the lowest level apis where performance is more sensitive; whereas a more common usage would be using a object Utf8Serializer type; which would then use these apis underneath.

If you are using these apis directly then you probably care about performance more?

@stephentoub
Copy link
Member

@ahsonkhan, can this issue be closed, or is it still tracking adding additional APIs?

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Jun 3, 2019

is it still tracking adding additional APIs?

Nope, the issue can be closed. Any other future APIs/discussions can be tracked separately.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

10 participants