Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON type representations and conversions to store types #30727

Closed
Tracked by #30731
roji opened this issue Apr 18, 2023 · 5 comments
Closed
Tracked by #30731

JSON type representations and conversions to store types #30727

roji opened this issue Apr 18, 2023 · 5 comments
Assignees
Labels
area-json area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Apr 18, 2023

When querying against values coming out of a JSON document, we need to make sure that the JSON values get projected out as the correct store type, otherwise comparison with e.g. a regular column will fail. For SQL Server, most store types work by simply casting the string representation returned from OPENJSON; but with SQLite - which has very few types - there are some issues:

.NET type Store representation Default JSON representation Notes
DateTime 2023-01-01 12:30:00 2023-01-01T12:30:00 Can be converted via datetime()
DateTimeOffset 2023-01-01 12:30:00+02:00 2023-01-01T12:30:00+02:00 datetime() yields 2023-01-01 10:30:00
decimal "1.0" (string) 1 (number)
Guid uppercase lowercase can use upper(), or store uppercase in JSON
byte[] X'0102' AQI= (base64) Same problem in SQL Server, though OPENJSON/WITH does decode base64

Options:

  1. Convert the JSON representation to the store type upon extraction; this is ideal, when supported. A CAST usually works for SQL Server, and in some cases a custom function is needed (e.g. datetime for SQLite). Adding the conversion function to the type mapping is part of Add type mapping APIs to customize JSON value serialization/deserialization #30677.
  2. Store the database's literal representation in JSON. In cases where there's a semi-standard JSON representation, this isn't great; for example, timestamps are commonly represented as ISO8601 in JSON, and storing another representation wouldn't be good for interoperability. The API for this would also be part of Add type mapping APIs to customize JSON value serialization/deserialization #30677.
  3. Leave the type as unqueryable - that is, storable via value converter only.

For option 1, we need type mapping API support which tells us how to convert values coming out of a JSON document into their equivalent relational store representation. Note that this is complicated somewhat in SQL Server, where in most cases we can simply use OPENJSON/WITH which automatically does the conversion for us, but in some cases we can't do that (because OPENJSON/WITH doesn't preserve ordering, and is also incompatible with geometry).

@roji
Copy link
Member Author

roji commented Apr 21, 2023

Design decisions:

  • We'll obviously do (1) where possible (apply a SQL conversion such as datetime()), allowing us to both have the "standard" JSON value format and have queryability.
  • Where that's not possible, we'll do (3) by default, i.e. allow users to map the value but not query it. At some point we'll also expose an API to allow users to configure the server-side conversion.
  • In SQLite specifically, we may be able to register e.g. a base64 conversion .NET function that would allow us to do (1) even where a native SQLite function such as datetime() doesn't exist.

@AndriySvyryd
Copy link
Member

Related - #10434 (comment)
A custom function can be added to the database that could be used in the custom server-side value converter

@roji
Copy link
Member Author

roji commented May 24, 2023

I did an investigation into binary support on SQL Server.

  • OPENJSON (and FOR AUTO JSON) do perform base64 conversions, which is great, as that seems to be the JSON de-facto standard.
  • However, that requires using OPENJSON with WITH (providing the type information), which doesn't support ordering. We currently translate to OPENJSON without WITH for that reason, but that just returns a string which we can't easily then decode as base64.
  • The same problem exists with our general usage of JSON_VALUE to extract scalars - it just extracts a string.
  • The only way to make this work is to use OPENJSON yet again - this time with WITH - just for the base64 conversion:
SELECT y.bar
FROM OPENJSON('["AAAwOQ==", "AAEJMg=="]') AS x
CROSS APPLY OPENJSON('["' + x.value + '"]') WITH (bar VARBINARY(MAX) '$') AS y
ORDER BY x.[key];
Full investigation
SELECT * FROM (VALUES(CAST(12345 AS VARBINARY(MAX)))) AS foo(bar) FOR JSON AUTO; -- [{"bar":"AAAwOQ=="}]
-- OPENJSON does the same, decoding the JSON base64 string to VARBINARY:
SELECT * FROM OPENJSON('["AAAwOQ=="]') WITH (bar VARBINARY(MAX) '$'); -- 0x00003039
-- In fact, trying to decode a non-base64 fails very explicitly with:
-- Cannot convert a string value found in the JSON text to binary value because it is not Base64 encoded.
SELECT * FROM OPENJSON('["foo"]') WITH (bar VARBINARY(MAX) '$');

-- However, OPENJSON with WITH doesn't support preserving the original order.
-- For that we use OPENJSON without WITH, but then we need to do regular relational casting.
-- This *doesn't* work with base64:
SELECT CAST(value AS VARBINARY(MAX)) FROM OPENJSON('["AAAwOQ=="]'); -- 41007700-4100-4100-4f00-51003d003d00

-- The only option seems to both preserve ordering and properly decode a base64 string out of JSON seems to be to use
-- OPENJSON *again* with WITH to perform the conversion:
SELECT y.bar
FROM OPENJSON('["AAAwOQ==", "AAEJMg=="]') AS x
CROSS APPLY OPENJSON('["' + x.value + '"]') WITH (bar VARBINARY(MAX) '$') AS y
ORDER BY x.[key];

@roji
Copy link
Member Author

roji commented Jul 20, 2023

Update on this... As previously decided, we ended up implementing some conversions when extracting JSON values out in SQLite (Guid to uppercase, add remove T from timestamp). For some types, it was impossible to reliably convert (binary, decimal, DateTimeOffset); we decided to allow storing these types, but to block their querying as that can't be supported.

However, JSON value extraction is also necessary for the new Contains implementation; this means that it would no longer be possible to do Where(b => decimals.Contains(b.SomeDecimal), which would be a problematic breaking change.

We considered two options:

  1. Special-case the JSON representation for parameters (as opposed to columns), and embed a compatible relational representation instead of the standard JSON one (e.g. represent decimals as strings). Since nobody cares about the actual contents of parameter collections, that seemed reasonable. It would mean that column collections of the problematic types remain unqueryable.
  2. Simply default to using the SQLite representations everyone in JSON, both columns and parameters. This would also allow querying all types, at the cost of non-standard JSON documents with e.g. decimals as strings.

Since supporting standard JSON representations has turned out to be increasingly difficult and has added complexity, we've opted for option 2.

Additional notes:

  • We'll switch to the SQLite-specific JSON representation for all types, not just for those we can't reliably convert. For example, DateTime will be stored within the JSON with T, even though we're capable of converting it out (removing the T).
  • Users still have the option of explicitly configuring a JsonValueReaderWriter to control the JSON representation, allowing them to implement the standard JSON representation instead. At that point, however, querying the column will almost certainly return incorrect reuslts. We should log a warning for this case.

@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Aug 15, 2023
@ajcvickers
Copy link
Member

@roji Take a look at SQLite translations.

@ajcvickers ajcvickers reopened this Aug 15, 2023
@ajcvickers ajcvickers removed the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Aug 15, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
@roji roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Aug 17, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
roji added a commit to roji/efcore that referenced this issue Aug 17, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.0, 8.0.0-rc1 Aug 19, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.0-rc1, 8.0.0 Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-json area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

No branches or pull requests

3 participants