New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Custom [Try]GetValue<T> methods on JsonElement for efficient projection of string values to dotnet types #74028
Comments
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsBackground and motivationThere have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a So far, there has been no consensus on a suitable approach. In general, the criticism of previous proposals has been either that:
This API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate The basic shape of the API should be familiar. These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded Importantly, they do not leak any information about the underlying backing for the JSON string value. Common use cases might include:
They use a similar delegate-and-state pattern to the API Proposalnamespace System.Text.Json;
public readonly struct JsonElement
{
/// <summary>
/// Attempts to represent the current JSON string as the given type.
/// </summary>
/// <typeparam name="TState">The type of the parser state.</typeparam>
/// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
/// <param name="parser">A delegate to the method that parses the JSON string.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">Receives the value.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
/// <see langword="true"/> if the string can be represented as the given type,
/// <see langword="false"/> otherwise.
/// </returns>
/// <exception cref="InvalidOperationException">
/// This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
/// </exception>
/// <exception cref="ObjectDisposedException">
/// The parent <see cref="JsonDocument"/> has been disposed.
/// </exception>
public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
{
CheckValidInstance();
return _parent.TryGetValue(_idx, parser, state, decode: true, out value);
}
/// <summary>
/// Attempts to represent the current JSON string as the given type.
/// </summary>
/// <typeparam name="TState">The type of the parser state.</typeparam>
/// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
/// <param name="parser">A delegate to the method that parses the JSON string.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="decode">Indicates whether the UTF8 JSON string should be decoded.</param>
/// <param name="value">Receives the value.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
/// <see langword="true"/> if the string can be represented as the given type,
/// <see langword="false"/> otherwise.
/// </returns>
/// <exception cref="InvalidOperationException">
/// This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
/// </exception>
/// <exception cref="ObjectDisposedException">
/// The parent <see cref="JsonDocument"/> has been disposed.
/// </exception>
public bool TryGetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state, bool decode, [NotNullWhen(true)] out TResult? value)
{
CheckValidInstance();
return _parent.TryGetValue(_idx, parser, state, decode, out value);
}
/// <summary>
/// Attempts to represent the current JSON string as the given type.
/// </summary>
/// <typeparam name="TState">The type of the parser state.</typeparam>
/// <typeparam name="TResult">The type with which to represent the JSON string.</typeparam>
/// <param name="parser">A delegate to the method that parses the JSON string.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">Receives the value.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
/// <see langword="true"/> if the string can be represented as the given type,
/// <see langword="false"/> otherwise.
/// </returns>
/// <exception cref="InvalidOperationException">
/// This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
/// </exception>
/// <exception cref="ObjectDisposedException">
/// The parent <see cref="JsonDocument"/> has been disposed.
/// </exception>
public bool TryGetValue<TState, TResult>(Parser<TState, TResult> parser, TState state, [NotNullWhen(true)] out TResult? value)
{
CheckValidInstance();
return _parent.TryGetValue(_idx, parser, state, out value);
}
/// <summary>
/// Gets the value of the element as the given type.
/// </summary>
/// <typeparam name="TState">The type of the parser state.</typeparam>
/// <typeparam name="TResult">The type of the result.</typeparam>
/// <param name="parser">The parser for the result.</param>
/// <param name="state">The state for the parser.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>The value of the element as the given type.</returns>
/// <exception cref="InvalidOperationException">
/// This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
/// </exception>
/// <exception cref="FormatException">
/// The value cannot be represented as the given type.
/// </exception>
/// <exception cref="ObjectDisposedException">
/// The parent <see cref="JsonDocument"/> has been disposed.
/// </exception>
public TResult GetValue<TState, TResult>(Utf8Parser<TState, TResult> parser, TState state)
{
if (!TryGetValue(parser, state, out TResult? value))
{
ThrowHelper.ThrowFormatException();
}
return value;
}
/// <summary>
/// Gets the value of the element as the given type.
/// </summary>
/// <typeparam name="TState">The type of the parser state.</typeparam>
/// <typeparam name="TResult">The type of the result.</typeparam>
/// <param name="parser">The parser for the result.</param>
/// <param name="state">The state for the parser.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>The value of the element as the given type.</returns>
/// <exception cref="InvalidOperationException">
/// This value's <see cref="ValueKind"/> is not <see cref="JsonValueKind.String"/>.
/// </exception>
/// <exception cref="FormatException">
/// The value cannot be represented as the given type.
/// </exception>
/// <exception cref="ObjectDisposedException">
/// The parent <see cref="JsonDocument"/> has been disposed.
/// </exception>
public TResult GetValue<TState, TResult>(Parser<TState, TResult> parser, TState state)
{
if (!TryGetValue(parser, state, out TResult? value))
{
ThrowHelper.ThrowFormatException();
}
return value;
}
}
/// <summary>
/// A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The UTF8-encoded JSON string. This may be encoded or decoded depending on context.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
/// <see langword="true"/> if the string can be represented as the given type,
/// <see langword="false"/> otherwise.
/// </returns>
public delegate bool Utf8Parser<TState, TResult>(ReadOnlySpan<byte> span, TState state, [NotNullWhen(true)] out TResult? value);
/// <summary>
/// A delegate to a method that attempts to represent a JSON string as a given type.
/// </summary>
/// <typeparam name="TState">The type of the state for the parser.</typeparam>
/// <typeparam name="TResult">The type of the resulting value.</typeparam>
/// <param name="span">The JSON string. This will always be in its decoded form.</param>
/// <param name="state">The state for the parser.</param>
/// <param name="value">The resulting value.</param>
/// <remarks>
/// This method does not create a representation of values other than JSON strings.
/// </remarks>
/// <returns>
/// <see langword="true"/> if the string can be represented as the given type,
/// <see langword="false"/> otherwise.
/// </returns>
public delegate bool Parser<TState, TResult>(ReadOnlySpan<char> span, TState state, [NotNullWhen(true)] out TResult? value); Rough implementation of the additional internal methods on API UsageJsonElement element;
if (element.TryParseValue(ParseLength, state: default(object?), decode: true, out int value))
{
Console.WriteLine($"Length = {value}");
}
else
{
Console.WriteLine($"Unable to parse the value.");
}
if (element.TryParseValue(ParseLengthAsDecodedChars, state: new ParseConfiguration(true), out int value))
{
Console.WriteLine($"Length = {value}");
}
else
{
Console.WriteLine($"Unable to parse the value.");
}
static bool ParseLength(ReadOnlySpan<byte> input, object? state, out int result)
{
result = input.Length; // Or whatever processing you require
return true;
}
static bool ParseLengthAsDecodedChars(ReadOnlySpan<char> input, ParseConfiguration state, out int result)
{
result = state.SomeSwitch ? input.Length : input.Length + 1; // Or whatever processing you require
return true;
}
internal readonly record struct ParseConfiguration(bool SomeSwitch); Alternative Designs
RisksThis should have no impact on the existing API surface area. It should be possible to implement largely using the existing methods in There are comments in
|
One usage observation: developers may wish to implement extension methods that hide the underlying parsers from their ultimate clients. public static bool TryGetCustomValue(this JsonElement element, out CustomValue value) {} This would give these custom value converters the same signature for end users as the built-in APIs. |
For a quantifiable benefit: if implemented, these APIs would enable me to eliminate allocations in three hot-paths in my JsonSchema validation. In one of our benchmarks, we validate an array of 10,000 "Person" objects. The Person has a FirstName, LastName, and a DateOfBirth. The DateOfBirth is a JSON string in The name elements are constrained to a Validation causes 40,000 string allocations at a cost of ~3MB (which are otherwise unused because we write the values directly into the output stream - a very common use case).
With this API this reduces to zero. |
As an example of the flexibility of this API, here's a projection that would allow us to do public static class JsonElementExtensions
{
public static int CountRunes(this JsonElement element)
{
if (element.TryGetValue(RuneCounter, default(object?), out int result))
{
return result;
}
throw new InvalidOperationException();
}
private static bool RuneCounter(ReadOnlySpan<char> input, object? state, out int result)
{
int runeCount = 0;
SpanRuneEnumerator enumerator = input.EnumerateRunes();
while (enumerator.MoveNext())
{
runeCount++;
}
result = runeCount;
return true;
}
} In our actual implementation we would create a whole 'string validator' type that would also do e.g. pattern matching using the new net7 public static class JsonElementExtensions
{
public static ValidationContext ValidateString(this JsonElement element, in ValidationContext validationContext)
{
if (element.TryGetValue(StringValidator, validationContext, out ValidationContext result))
{
return result;
}
throw new InvalidOperationException();
}
private static Regex SomePattern = new Regex("H(.*)o", RegexOptions.Compiled, TimeSpan.FromSeconds(3));
private static bool StringValidator(ReadOnlySpan<char> input, in ValidationContext context, out ValidationContext result)
{
// Emitted if minLength or maxLength
int runeCount = 0;
SpanRuneEnumerator enumerator = input.EnumerateRunes();
while (enumerator.MoveNext())
{
runeCount++;
}
result = context.WithResult(runeCount < 10);
// Emitted if has pattern match validation Net7 Regex
result = result.WithResult(SomePattern.IsMatch(input));
return true;
}
} |
I think I'd like to emphasize the benefit of this API for "in-the-box" types. By allowing maintainers to implement TryGetXXX() methods without having to update If there was a strong reason to then roll those into the |
I thought it might be useful to see the practical benefit of this. We have an internal implementation of this API, along these lines; it offers a good tradeoff with allocation (approx zero in steady state) and performance. /// <summary>
/// Process raw JSON text.
/// </summary>
/// <typeparam name="TState">The type of the state for the processor.</typeparam>
/// <typeparam name="TResult">The type of the result of processing.</typeparam>
/// <param name="element">The json element to process.</param>
/// <param name="state">The state passed to the processor.</param>
/// <param name="callback">The processing callback.</param>
/// <param name="result">The result of processing.</param>
/// <returns><c>True</c> if the processing succeeded, otherwise false.</returns>
public static bool ProcessRawText<TState, TResult>(
this JsonElement element,
in TState state,
in Utf8Parser<TState, TResult> callback,
[NotNullWhen(true)] out TResult? result)
{
if (UseReflection)
{
return callback(GetRawValue(element).Span, state, out result);
}
else
{
PooledWriter? writerPair = null;
try
{
writerPair = WriterPool.Get();
(Utf8JsonWriter w, ArrayPoolBufferWriter<byte> writer) = writerPair.Get();
element.WriteTo(w);
w.Flush();
return callback(writer.WrittenSpan[1..^1], state, out result);
}
finally
{
if (writerPair is not null)
{
WriterPool.Return(writerPair);
}
}
}
} I have added the "UseReflection" path for benchmarking. If we choose that, it emits some code to get the underlying memory from the JsonDocument, as if this was part of the internals of the private static GetRawValueDelegate BuildGetRawValue()
{
Type returnType = typeof(ReadOnlyMemory<byte>);
Type[] parameterTypes = new[] { typeof(JsonElement) };
Type jsonElementType = typeof(JsonElement);
Type jsonDocumentType = typeof(JsonDocument);
FieldInfo parentField = jsonElementType.GetField("_parent", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._parent field");
FieldInfo idxField = jsonElementType.GetField("_idx", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonElement._idx field");
MethodInfo getRawValueMethod = jsonDocumentType.GetMethod("GetRawValue", BindingFlags.Instance | BindingFlags.NonPublic) ?? throw new InvalidOperationException("Unable to find JsonDocument.GetRawValue method");
var getRawText = new DynamicMethod("GetRawValue", returnType, parameterTypes, typeof(LowAllocJsonUtils).Module, true);
ILGenerator il = getRawText.GetILGenerator(256);
LocalBuilder idx = il.DeclareLocal(typeof(int));
il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument
il.Emit(OpCodes.Ldfld, idxField);
il.Emit(OpCodes.Stloc_0, idx); // Store the index in a local variable
il.Emit(OpCodes.Ldarg_0); // Get the JSON Element argument again
il.Emit(OpCodes.Ldfld, parentField); // Put the parent on the stack
il.Emit(OpCodes.Ldloc_0, idx); // Put the index on the stack from the local variable
il.Emit(OpCodes.Ldc_I4_0); // Put "false" onto the stack
il.Emit(OpCodes.Callvirt, getRawValueMethod);
il.Emit(OpCodes.Ret);
return getRawText.CreateDelegate<GetRawValueDelegate>();
} As you can see, it simply gets the gets the underlying raw value from the parent The benchmarks show a 20-30% benefit. (As a reminder - this is an underlying means of accessing the raw text - we optionally decode and/or translate to |
Those benchmarks were on net7.0. It's well worth comparing with net8.0 (this is the last preview, not RC1) That's pretty impressive for "no work" - and indeed is as much benefit as this proposed change would produce... except as you can see we get it again even on net8.0! |
I should say that the Corvus.JsonSchema library is ours. JsonEverything is @gregsdennis excellent cornucopia of json goodness, also built over System.Text.Json. With Corvus we are trying to make JsonSchema validation a "just do it and you don't notice" option for API implementers, with the added bonus of an object model over the document "for free" rather than having to hand build a separate set of POCOs for serialisation. A validation pass for a "normal" sized document is a couple of microseconds and zero allocations. For a 1.5Mb document it is a few 10s of milliseconds and a few 10s of bytes of allocation. We are a bit obsessive about shaving as much as we can off that! |
Blog post about net8 performance wins and potential impact of this change. https://endjin.com/blog/2023/12/how-dotnet-8-boosted-json-schema-performance-by-20-percent-for-free |
Background and motivation
There have been a number of discussions about possible APIs for efficient access to the underlying UTF8 bytes for JSON string values represented by a
JsonElement
.So far, there has been no consensus on a suitable approach.
In general, the criticism of previous proposals has been either that:
Utf8String
), orThis API proposal focuses instead on a specific use-case - projecting from a JSON string value to a user-defined type, without an intermediate
string
allocation.The basic shape of the API should be familiar.
JsonElement
already has a number of such methods - projecting toDateTime
,DateTimeOffset
andGuid
, for example, with theGetXXX()
andTryGetXXX()
APIs.These proposed APIs offer developers the ability to project to user defined types, without requiring an intermediate conversion to string. They will handle decoding and transcoding (e.g. to a decoded
ReadOnlySpan<byte>
orReadOnlySpan<char>
), managing the memory on behalf of the developer.Importantly, they do not leak any information about the underlying backing for the JSON string value.
Common use cases might include:
They use a similar delegate-and-state pattern to
String.Create()
to encourage efficient usage, and to avoid accidental allocation of closure-capturing objects.API Proposal
Rough implementation of the additional internal methods on
JsonDocument
can be found here.https://github.com/mwadams/runtime/blob/55372e6b8e93a123513cfd6e6415cfd80c0630a4/src/libraries/System.Text.Json/src/System/Text/Json/Document/JsonDocument.cs#L660
API Usage
Alternative Designs
TState
value (e.g. defaulting to anull
value for anobject?
)string.Create()
, although this is perhaps a minor considerationTState
to be a anin
parameter, which would minimise copying in the case that your state was areadonly struct
. Personally, I would favour this variation.Risks
This should have no impact on the existing API surface area.
It should be possible to implement largely using the existing methods in
JsonDocument
andJsonReaderHelper
.There are comments in
JsonDocument
suggesting that there is a plan to move to common UTF8 encoding/decoding code; this would increase the surface area of methods inJsonDocument
that would need to switch to using those common APIs if/when that change occurs.The text was updated successfully, but these errors were encountered: