Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

roji · 2023-04-01T12:01:39Z

We currently do JSON serialization/deserialization by passing through DOM. That is, when deserializing, we get a JsonDocument from S.T.Json, and then construct entity CLR types from that (serialization works similarly in the other direction). Instead, we could simply parse the JSON via Utf8JsonReader (which is what JsonSerializer uses under the hood), and directly materialize entity CLR types from that. This should substantially reduce CPU time and heap allocations.

Note that we could use code generation to compile the JSON materializer, just like we do today for regular materialization from DbDataReader. In fact, Utf8jsonreader and DbDataReader are similar, offering the same kind of low-level access to primitive int/string values.

This technique wouldn't support JsonSerializationOptions - but our current technique is also very limited. When using EF, EF is the one managing the "serialization ones", e.g. assigning names to properties (and even naming conventions), value converters (instead of S.T.Json JsonConverters), etc. Though we should still see if it makes sense to accept and respect certain JsonSerializationOptions settings.
By default, we DbDataReader.GetString() to get the JSON string to be serialized; that's UTF16 rather than UTF8. It's trivial to transcode and should add very little overhead (compared to the extra DOM processing we do today, in any case). In cases where we can just get UTF8 directly from the database (Npgsql, SQL Server with UTF8 columns), we can skip this step. Note that with Npgsql we can even get a raw binary Stream out of DbDataReader, and pass that to Utf8jsonreader, for maximum perf.
It should be possible to support unmapped JSON properties by writing them to some structure (possibly a JsonDocument DOM representation) which gets stored in the change tracker, in the entry. In the update pipeline, these unmapped would be written back to the database.

Note: benchmark to see the perf differences

AndriySvyryd · 2023-04-22T01:58:11Z

Consider caching the JSON property names as JsonEncodedText in a annotation on the property

roji · 2023-04-22T07:47:15Z

@maumar on the UTF16/UTF8 question (2nd point above)... Utf8JsonReader requires a UTF8 string, represented as a ReadOnlySpan<byte>, but strings for databases are generally exposed as .NET string, which is UTF16.

I think this issue should do the simple/minimal thing, and just transcode the string you get from DbDataReader into a UTF8 byte array on the heap. This isn't the most efficient thing (although most probably still significantly more efficient than the current DOM implementation), but it would work everywhere. Later, as an optimization we can skip that steps for ADO.NET providers where getting a raw UTF8 byte array is supported (Npgsql allows this; SqlClient might, if not we can raise the request).

…JsonReader/Utf8JsonWriter Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993

…JsonReader/Utf8JsonWriter (#31160) Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here. We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup. This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations). We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent. Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward. Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug. Fixes #30604 Fixes #30993

roji added type-enhancement area-perf area-json labels Apr 1, 2023

roji mentioned this issue Apr 12, 2023

Add type mapping APIs to customize JSON value serialization/deserialization #30677

Closed

ajcvickers added the consider-for-current-release label Apr 13, 2023

ajcvickers added this to the Backlog milestone Apr 13, 2023

ajcvickers assigned maumar Apr 19, 2023

ajcvickers modified the milestones: Backlog, 8.0.0 Apr 19, 2023

AndriySvyryd mentioned this issue Apr 21, 2023

JSON: support for weakly typed json properties #28871

Open

roji mentioned this issue Apr 22, 2023

Allow the user to specify JsonWriterOptions/JsonReaderOptions globally #30744

Open

roji mentioned this issue Apr 26, 2023

Json: add support for collection of primitive types inside JSON columns #28688

Closed

AndriySvyryd mentioned this issue Jun 8, 2023

Support customized reading and writing of JSON values #31021

Merged

maumar mentioned this issue Jun 29, 2023

Query/Json: use Utf8JsonReader in materializer #31159

Closed

maumar mentioned this issue Jul 1, 2023

Fix to #30604 - Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #31160

Merged

maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jul 1, 2023

maumar closed this as completed in #31160 Jul 11, 2023

ajcvickers modified the milestones: 8.0.0, 8.0.0-preview7 Jul 20, 2023

ajcvickers removed the consider-for-current-release label Oct 11, 2023

ajcvickers modified the milestones: 8.0.0-preview7, 8.0.0 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

roji commented Apr 1, 2023

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

Comments

roji commented Apr 1, 2023

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023