Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

Closed
roji opened this issue Apr 1, 2023 · 2 comments · Fixed by #31160
Closed

Implement JSON serialization/deserialization via Utf8JsonReader/Utf8JsonWriter #30604

roji opened this issue Apr 1, 2023 · 2 comments · Fixed by #31160
Assignees
Labels
area-json area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Apr 1, 2023

We currently do JSON serialization/deserialization by passing through DOM. That is, when deserializing, we get a JsonDocument from S.T.Json, and then construct entity CLR types from that (serialization works similarly in the other direction). Instead, we could simply parse the JSON via Utf8JsonReader (which is what JsonSerializer uses under the hood), and directly materialize entity CLR types from that. This should substantially reduce CPU time and heap allocations.

Note that we could use code generation to compile the JSON materializer, just like we do today for regular materialization from DbDataReader. In fact, Utf8jsonreader and DbDataReader are similar, offering the same kind of low-level access to primitive int/string values.

  • This technique wouldn't support JsonSerializationOptions - but our current technique is also very limited. When using EF, EF is the one managing the "serialization ones", e.g. assigning names to properties (and even naming conventions), value converters (instead of S.T.Json JsonConverters), etc. Though we should still see if it makes sense to accept and respect certain JsonSerializationOptions settings.
  • By default, we DbDataReader.GetString() to get the JSON string to be serialized; that's UTF16 rather than UTF8. It's trivial to transcode and should add very little overhead (compared to the extra DOM processing we do today, in any case). In cases where we can just get UTF8 directly from the database (Npgsql, SQL Server with UTF8 columns), we can skip this step. Note that with Npgsql we can even get a raw binary Stream out of DbDataReader, and pass that to Utf8jsonreader, for maximum perf.
  • It should be possible to support unmapped JSON properties by writing them to some structure (possibly a JsonDocument DOM representation) which gets stored in the change tracker, in the entry. In the update pipeline, these unmapped would be written back to the database.

Note: benchmark to see the perf differences

@AndriySvyryd
Copy link
Member

Consider caching the JSON property names as JsonEncodedText in a annotation on the property

@roji
Copy link
Member Author

roji commented Apr 22, 2023

@maumar on the UTF16/UTF8 question (2nd point above)... Utf8JsonReader requires a UTF8 string, represented as a ReadOnlySpan<byte>, but strings for databases are generally exposed as .NET string, which is UTF16.

I think this issue should do the simple/minimal thing, and just transcode the string you get from DbDataReader into a UTF8 byte array on the heap. This isn't the most efficient thing (although most probably still significantly more efficient than the current DOM implementation), but it would work everywhere. Later, as an optimization we can skip that steps for ADO.NET providers where getting a raw UTF8 byte array is supported (Npgsql allows this; SqlClient might, if not we can raise the request).

maumar added a commit that referenced this issue Jul 1, 2023
…JsonReader/Utf8JsonWriter

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jul 1, 2023
maumar added a commit that referenced this issue Jul 1, 2023
…JsonReader/Utf8JsonWriter

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
maumar added a commit that referenced this issue Jul 6, 2023
…JsonReader/Utf8JsonWriter

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
maumar added a commit that referenced this issue Jul 7, 2023
…JsonReader/Utf8JsonWriter

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
maumar added a commit that referenced this issue Jul 10, 2023
…JsonReader/Utf8JsonWriter

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
maumar added a commit that referenced this issue Jul 11, 2023
…JsonReader/Utf8JsonWriter (#31160)

Using Utf8JsonReader to read JSON data rather than caching it using DOM. This should reduce allocations significantly. Tricky part is that entity materializers are build in a way that assumes we have random access to all the data we need. This is not the case here.
We read JSON data sequentially and can only do it once, and we don't know the order in which we get the data. This is somewhat problematic in case where entity takes argument in the constructor. Those could be at the very end of the JSON string, so we must read all the data before we can instantiate the object, and populate it's properties and do navigation fixup.
This requires us reading all the JSON data, store them in local variables, and only when we are done reading we instantiate the entity and populate all the properties with data stored in those variables. This adds some allocations (specifically navigations).

We also have to disable de-duplication logic - we can't always safely re-read the JSON string, and definitely can't start reading it from arbitrary position, so now we have to add JSON string for every aggregate projected, even if we already project it's parent.

Serialization implementation (i.e. Utf8JsonWriter) is pretty straighforward.

Also fix to #30993 - Query/Json: data corruption for tracking queries with nested json entities, then updating nested entities outside EF and re-querying

Fix is to recognize and modify shaper in case of tracking query, so that nav expansions are not skipped when parent entity is found in Change Tracker. This is necessary to fix alongside streaming, because now we throw exception from reader (unexpected token) if we don't process the entire stream correctly. Before it would be silently ignored apart from the edge case described in the bug.

Fixes #30604
Fixes #30993
@ajcvickers ajcvickers modified the milestones: 8.0.0, 8.0.0-preview7 Jul 20, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.0-preview7, 8.0.0 Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-json area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
4 participants