Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Custom [Try]GetValue<T> methods on JsonElement for efficient projection of string values to dotnet types #74028

Open
mwadams opened this issue Aug 16, 2022 · 9 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@mwadams
Copy link

mwadams commented Aug 16, 2022

Background and motivation

There have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a JsonElement.

So far, there has been no consensus on a suitable approach.

In general, the criticism of previous proposals has been either that:

  1. they require too much supporting scaffolding (e.g. an approach relying on various flavours of Utf8String), or
  2. they are "too raw" - just exposing the underlying UTF8 bytes is inflexible and prone to errors, especially with respect to encoding.

This API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate string allocation.

The basic shape of the API should be familiar. JsonElement already has a number of such methods - projecting to DateTime, DateTimeOffset and Guid, for example, with the GetXXX() and TryGetXXX() APIs.

These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded ReadOnlySpan<byte> or ReadOnlySpan<char>), managing the memory on behalf of the developer.

Importantly, they do not leak any information about the underlying backing for the JSON string value.

Common use cases might include:

  • projections to 3rd party types that are unlikely ever to be directly supported by in-the-box libraries, such as those from NodaTime
  • projections to in-the-box types introduced in the libraries but not yet supported by System.Text.Json.
  • custom projections (e.g. transforms of the JSON string value itself)

They use a similar delegate-and-state pattern to String.Create() to encourage efficient usage, and to avoid accidental allocation of closure-capturing objects.

API Proposal

namespace System.Text.Json;

public readonly struct JsonElement
{
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, decode: true, out value);
    }
    
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="decode">Indicates whether the UTF8 JSON string should be decoded.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, bool decode, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, decode, out value);
    }
    
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, out value);
    }
    
    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }
    
        return value;
    }
    
    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }
    
        return value;
    }
}

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The UTF8-encoded JSON string. This may be encoded or decoded depending on context.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Utf8Parser<TState, TResult>(ReadOnlySpan<byte> span, TState state, [NotNullWhen(true)] out TResult? value);

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The JSON string. This will always be in its decoded form.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Parser<TState, TResult>(ReadOnlySpan<char> span, TState state, [NotNullWhen(true)] out TResult? value);

Rough implementation of the additional internal methods on JsonDocument can be found here.

https://github.com/mwadams/runtime/blob/55372e6b8e93a123513cfd6e6415cfd80c0630a4/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs#L660

API Usage

JsonElement element;

if (element.TryParseValue(ParseLength, state: default(object?), decode: true, out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

if (element.TryParseValue(ParseLengthAsDecodedChars, state: new ParseConfiguration(true), out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

static bool ParseLength(ReadOnlySpan<byte> input, object? state, out int result)
{
    result = input.Length; // Or whatever processing you require
    return true;
}

static bool ParseLengthAsDecodedChars(ReadOnlySpan<char> input, ParseConfiguration state, out int result)
{
    result = state.SomeSwitch ? input.Length : input.Length + 1; // Or whatever processing you require
    return true;
}

internal readonly record struct ParseConfiguration(bool SomeSwitch);

Alternative Designs

  • We could provide additional overloads that allow you not to provide a TState value (e.g. defaulting to a null value for an object?)
    • I think this is in danger of trapping people into closure-capturing
    • It would diverge from the spirit of string.Create(), although this is perhaps a minor consideration
  • We could require TState to be a an in parameter, which would minimise copying in the case that your state was a readonly struct. Personally, I would favour this variation.
  • We could require "decode: true" and eliminate that overload; while this would add a small overhead to every call, it eliminates any potential encoding/decoding confusion.

Risks

This should have no impact on the existing API surface area.

It should be possible to implement largely using the existing methods in JsonDocument and JsonReaderHelper.

There are comments in JsonDocument suggesting that there is a plan to move to common UTF8 encoding/decoding code; this would increase the surface area of methods in JsonDocument that would need to switch to using those common APIs if/when that change occurs.

@mwadams mwadams added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 16, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 16, 2022
@ghost
Copy link

ghost commented Aug 16, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

There have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a JsonElement.

So far, there has been no consensus on a suitable approach.

In general, the criticism of previous proposals has been either that:

  1. they require too much supporting scaffolding (e.g. an approach relying on various flavours of Utf8String), or
  2. they are "too raw" - just exposing the underlying UTF8 bytes is inflexible and prone to errors, especially with respect to encoding.

This API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate string allocation.

The basic shape of the API should be familiar. JsonElement already has a number of such methods - projecting to DateTime, DateTimeOffset and Guid, for example, with the GetXXX() and TryGetXXX() APIs.

These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded ReadOnlySpan<byte> or ReadOnlySpan<char>), managing the memory on behalf of the developer.

Importantly, they do not leak any information about the underlying backing for the JSON string value.

Common use cases might include:

  • projections to 3rd party types that are unlikely ever to be directly supported by in-the-box libraries, such as those from NodaTime types
  • projections to in-the-box types introduced in the libraries but not yet supported by System.Text.Json.
  • custom projections (e.g. transforms of the JSON string value itself)

They use a similar delegate-and-state pattern to the String.Create() to encourage efficient usage, and to avoid accidental allocation of closure-capturing objects.

API Proposal

namespace System.Text.Json;

public readonly struct JsonElement
{
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, decode: true, out value);
    }
    
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="decode">Indicates whether the UTF8 JSON string should be decoded.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, bool decode, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, decode, out value);
    }
    
    /// <summary>
    ///   Attempts to represent the current JSON string as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
    /// <param name="parser">A delegate to the method that parses the JSON string.</param>
    /// <param name="state">The state for the parser.</param>
    /// <param name="value">Receives the value.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>
    ///   <see langword="true"/> if the string can be represented as the given type,
    ///   <see langword="false"/> otherwise.
    /// </returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public bool TryGetValue<TState, TResult>(Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
    {
        CheckValidInstance();
    
        return _parent.TryGetValue(_idx, parser, state, out value);
    }
    
    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }
    
        return value;
    }
    
    /// <summary>
    ///   Gets the value of the element as the given type.
    /// </summary>
    /// <typeparam name="TState">The type of the parser state.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="parser">The parser for the result.</param>
    /// <param name="state">The state for the parser.</param>
    /// <remarks>
    ///   This method does not create a representation of values other than JSON strings.
    /// </remarks>
    /// <returns>The value of the element as the given type.</returns>
    /// <exception cref="InvalidOperationException">
    ///   This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
    /// </exception>
    /// <exception cref="FormatException">
    ///   The value cannot be represented as the given type.
    /// </exception>
    /// <exception cref="ObjectDisposedException">
    ///   The parent <see cref="JsonDocument"/> has been disposed.
    /// </exception>
    public TResult GetValue<TState, TResult>(Parser<TState, TResult> parser, TState state)
    {
        if (!TryGetValue(parser, state, out TResult? value))
        {
            ThrowHelper.ThrowFormatException();
        }
    
        return value;
    }
}

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The UTF8-encoded JSON string. This may be encoded or decoded depending on context.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Utf8Parser<TState, TResult>(ReadOnlySpan<byte> span, TState state, [NotNullWhen(true)] out TResult? value);

/// <summary>
///   A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The JSON string. This will always be in its decoded form.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
///   This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
///   <see langword="true"/> if the string can be represented as the given type,
///   <see langword="false"/> otherwise.
/// </returns>
public delegate bool Parser<TState, TResult>(ReadOnlySpan<char> span, TState state, [NotNullWhen(true)] out TResult? value);

Rough implementation of the additional internal methods on JsonDocument can be found here.

https://github.com/mwadams/runtime/blob/55372e6b8e93a123513cfd6e6415cfd80c0630a4/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs#L660

API Usage

JsonElement element;

if (element.TryParseValue(ParseLength, state: default(object?), decode: true, out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

if (element.TryParseValue(ParseLengthAsDecodedChars, state: new ParseConfiguration(true), out int value))
{
    Console.WriteLine($"Length = {value}");
}
else
{
    Console.WriteLine($"Unable to parse the value.");
}

static bool ParseLength(ReadOnlySpan<byte> input, object? state, out int result)
{
    result = input.Length; // Or whatever processing you require
    return true;
}

static bool ParseLengthAsDecodedChars(ReadOnlySpan<char> input, ParseConfiguration state, out int result)
{
    result = state.SomeSwitch ? input.Length : input.Length + 1; // Or whatever processing you require
    return true;
}

internal readonly record struct ParseConfiguration(bool SomeSwitch);

Alternative Designs

  • We could provide additional overloads that allow you not to provide a TState value (e.g. defaulting to a null value for an object?)
    • I think this is in danger of trapping people into closure-capturing
    • It would diverge from the spirit of string.Create(), although this is perhaps a minor consideration
  • We could require TState to be a value type, and declare it as an in parameter, which would minimise copying in the case that your state was a readonly struct.

Risks

This should have no impact on the existing API surface area.

It should be possible to implement largely using the existing methods in JsonDocument and JsonReaderHelper.

There are comments in JsonDocument suggesting that there is a plan to move to common UTF8 encoding/decoding code; this would increase the surface are of methods in JsonDocument that would need to switch to using those common APIs if/when that change occurs.

Author: mwadams
Assignees: -
Labels:

api-suggestion, area-System.Text.Json

Milestone: -

@mwadams
Copy link
Author

mwadams commented Aug 16, 2022

One usage observation: developers may wish to implement extension methods that hide the underlying parsers from their ultimate clients.

public static bool TryGetCustomValue(this JsonElement element, out CustomValue value) {}

This would give these custom value converters the same signature for end users as the built-in APIs.

@mwadams
Copy link
Author

mwadams commented Aug 17, 2022

For a quantifiable benefit: if implemented, these APIs would enable me to eliminate allocations in three hot-paths in my JsonSchema validation.

In one of our benchmarks, we validate an array of 10,000 "Person" objects.

The Person has a FirstName, LastName, and a DateOfBirth.

The DateOfBirth is a JSON string in date format which we project to a NodaTime local date type.

The name elements are constrained to a maxLength.

Validation causes 40,000 string allocations at a cost of ~3MB (which are otherwise unused because we write the values directly into the output stream - a very common use case).

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FindWholeArrayLoopSchemaGenValidateDeserialize 20.050 ms 0.0598 ms 0.0530 ms 312.5000 - - 2,792,162 B

With this API this reduces to zero.

@mwadams
Copy link
Author

mwadams commented Aug 17, 2022

As an example of the flexibility of this API, here's a projection that would allow us to do Rune counting of the decoded JSON string, avoiding a string allocation (another requirement for json-schema validation).

public static class JsonElementExtensions
{
    public static int CountRunes(this JsonElement element)
    {
        if (element.TryGetValue(RuneCounter, default(object?), out int result))
        {
            return result;
        }

        throw new InvalidOperationException();
    }

    private static bool RuneCounter(ReadOnlySpan<char> input, object? state, out int result)
    {
        int runeCount = 0;
        SpanRuneEnumerator enumerator = input.EnumerateRunes();
        while (enumerator.MoveNext())
        {
            runeCount++;
        }

        result = runeCount;
        return true;
    }
}

In our actual implementation we would create a whole 'string validator' type that would also do e.g. pattern matching using the new net7 Regex.IsMatch(ReadOnlySpan<T>). This is a highly simplified example that illustrates the benefits of building over this API.

public static class JsonElementExtensions
{
    public static ValidationContext ValidateString(this JsonElement element, in ValidationContext validationContext)
    {
        if (element.TryGetValue(StringValidator, validationContext, out ValidationContext result))
        {
            return result;
        }

        throw new InvalidOperationException();
    }

    private static Regex SomePattern = new Regex("H(.*)o", RegexOptions.Compiled, TimeSpan.FromSeconds(3));

    private static bool StringValidator(ReadOnlySpan<char> input, in ValidationContext context, out ValidationContext result)
    {
        // Emitted if minLength or maxLength
        int runeCount = 0;
        SpanRuneEnumerator enumerator = input.EnumerateRunes();
        while (enumerator.MoveNext())
        {
            runeCount++;
        }

        result = context.WithResult(runeCount < 10);

        // Emitted if has pattern match validation Net7 Regex
        result = result.WithResult(SomePattern.IsMatch(input));
        return true;
    }
}

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Aug 18, 2022
@layomia layomia added this to the 8.0.0 milestone Aug 18, 2022
@mwadams
Copy link
Author

mwadams commented Aug 19, 2022

I think I'd like to emphasize the benefit of this API for "in-the-box" types.

By allowing maintainers to implement TryGetXXX() methods without having to update System.Text.Json.JsonElement, and yet maintaining close-to-optimal performance, support for new types (like the Date or Time-onlies) could be provided out-of-band with framework releases, and without polluting the (otherwise ever-growing) JsonElement native API.

If there was a strong reason to then roll those into the JsonElement API itself, that could be done with the next major version, and the Extension method deprecated, and implemented over the new API for backwards compatibility.

@mwadams
Copy link
Author

mwadams commented Sep 16, 2023

I thought it might be useful to see the practical benefit of this.

We have an internal implementation of this API, along these lines; it offers a good tradeoff with allocation (approx zero in steady state) and performance.

    /// <summary>
    /// Process raw JSON text.
    /// </summary>
    /// <typeparam name="TState">The type of the state for the processor.</typeparam>
    /// <typeparam name="TResult">The type of the result of processing.</typeparam>
    /// <param name="element">The json element to process.</param>
    /// <param name="state">The state passed to the processor.</param>
    /// <param name="callback">The processing callback.</param>
    /// <param name="result">The result of processing.</param>
    /// <returns><c>True</c> if the processing succeeded, otherwise false.</returns>
    public static bool ProcessRawText<TState, TResult>(
        this JsonElement element,
        in TState state,
        in Utf8Parser<TState, TResult> callback,
        [NotNullWhen(true)] out TResult? result)
    {
        if (UseReflection)
        {
            return callback(GetRawValue(element).Span, state, out result);
        }
        else
        {
            PooledWriter? writerPair = null;
            try
            {
                writerPair = WriterPool.Get();
                (Utf8JsonWriter w, ArrayPoolBufferWriter<byte> writer) = writerPair.Get();
                element.WriteTo(w);
                w.Flush();
                return callback(writer.WrittenSpan[1..^1], state, out result);
            }
            finally
            {
                if (writerPair is not null)
                {
                    WriterPool.Return(writerPair);
                }
            }
        }
    }

I have added the "UseReflection" path for benchmarking. If we choose that, it emits some code to get the underlying memory from the JsonDocument, as if this was part of the internals of the TryGetValue() method proposed here.

    private static GetRawValueDelegate BuildGetRawValue()
    {
        Type returnType = typeof(ReadOnlyMemory<byte>);
        Type[] parameterTypes = new[] { typeof(JsonElement) };

        Type jsonElementType = typeof(JsonElement);
        Type jsonDocumentType = typeof(JsonDocument);
        FieldInfo parentField = jsonElementType.GetField("_parent", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._parent field");
        FieldInfo idxField = jsonElementType.GetField("_idx", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._idx field");
        MethodInfo getRawValueMethod = jsonDocumentType.GetMethod("GetRawValue", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonDocument.GetRawValue method");

        var getRawText = new DynamicMethod("GetRawValue", returnType, parameterTypes, typeof(LowAllocJsonUtils).Module, true);

        ILGenerator il = getRawText.GetILGenerator(256);
        LocalBuilder idx = il.DeclareLocal(typeof(int));
        il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument
        il.Emit(OpCodes.Ldfld, idxField);
        il.Emit(OpCodes.Stloc_0, idx); // Store the index in a local variable
        il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument again
        il.Emit(OpCodes.Ldfld, parentField); // Put the parent on the stack
        il.Emit(OpCodes.Ldloc_0, idx); // Put the index on the stack from the local variable
        il.Emit(OpCodes.Ldc_I4_0); // Put "false" onto the stack
        il.Emit(OpCodes.Callvirt, getRawValueMethod);
        il.Emit(OpCodes.Ret);

        return getRawText.CreateDelegate<GetRawValueDelegate>();
    }

As you can see, it simply gets the gets the underlying raw value from the parent JsonDocument.

The benchmarks show a 20-30% benefit.

image
image
image

(As a reminder - this is an underlying means of accessing the raw text - we optionally decode and/or translate to RoS<char> further up the implementation; these benchmarks include that decoding and transcoding - they are performing json-schema document validation.)

@mwadams
Copy link
Author

mwadams commented Sep 16, 2023

Those benchmarks were on net7.0. It's well worth comparing with net8.0 (this is the last preview, not RC1)
We get huge perf improvements from the new runtime - I'm guessing that's a lot of vectorization improvements, and I also notice that our tiny bit of allocation that we need to do in the "everything is fine" case has got smaller, too!

That's pretty impressive for "no work" - and indeed is as much benefit as this proposed change would produce... except as you can see we get it again even on net8.0!

image
image
image

@mwadams
Copy link
Author

mwadams commented Sep 16, 2023

I should say that the Corvus.JsonSchema library is ours. JsonEverything is @gregsdennis excellent cornucopia of json goodness, also built over System.Text.Json.

With Corvus we are trying to make JsonSchema validation a "just do it and you don't notice" option for API implementers, with the added bonus of an object model over the document "for free" rather than having to hand build a separate set of POCOs for serialisation.

A validation pass for a "normal" sized document is a couple of microseconds and zero allocations.

For a 1.5Mb document it is a few 10s of milliseconds and a few 10s of bytes of allocation.

We are a bit obsessive about shaving as much as we can off that!

@mwadams
Copy link
Author

mwadams commented Dec 6, 2023

Blog post about net8 performance wins and potential impact of this change.

https://endjin.com/blog/2023/12/how-dotnet-8-boosted-json-schema-performance-by-20-percent-for-free

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

3 participants