New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/json: support < and <= for JSON datums #43832
Conversation
For whoever reviews this: what ways can this break? For example, I'm currently worried about a situation where a mixed-version cluster with this new code and some old version are running. Say a query gets issued with the new code as the gateway (and thus it supports JSON <). But let's say distsql schedules a processor on an old node that doesn't know about JSON <. Will that blow up the cluster? Can this break if a new node gets a json object from disk (and thus sorts it with this new sort) and then serializes that object, sends it to an old node for distsql processing. The old node has some other json bytes (and thus sorted in a different manner), and compares them for equality, which fails since the exact ordering of fields doesn't match up, and the old (and new) json code assumes objects are correctly sorted. Any other scenarios? Because of this I'm very hesitant to merge this until these questions are resolved. |
I think that something will go wrong in that case -- I don't know if the old node will panic or just error at the unsupported comparison operation (should be relatively simple to test though). Your other concern seems pretty likely. It feels to me that you need some sort of version gate on this, but I don't know where to put it. I'm also wondering how this has been done in the past with introduction of new types / updating operations on existing types. |
There is already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For whoever reviews this: what ways can this break?
Can this break if a new node gets a json object from disk
So the only case I can think of this breaking that hasn't been touched is key encoding.
But I noticed it's not supported:
root@127.0.0.1:49698/defaultdb> create table a2 (j jsonb primary key);
ERROR: unimplemented: column j is of type jsonb and thus is not indexable
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/35730
So I think the only problem is in a mixed version cluster, the results read by different versioned nodes may present a different order as we Decode
differently. Is that a problem?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mjibson and @otan)
pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):
DROP TABLE json_family # See #43123
maybe worth subtesting, e.g. subtest regression_43123_json_sorting
?
pkg/util/json/json.go, line 400 at r1 (raw file):
// Zero-length arrays are special. They sort before everything, even // JSON null.
Can you point at the docs for this from postgres' side? If not documented, strange!
pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):
is there value in adding a more recursive json structure in here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan)
pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
maybe worth subtesting, e.g.
subtest regression_43123_json_sorting
?
Done.
pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
is there value in adding a more recursive json structure in here?
Done.
pkg/util/json/json.go, line 400 at r1 (raw file):
Previously, otan (Oliver Tan) wrote…
Can you point at the docs for this from postgres' side? If not documented, strange!
Done.
Improve Postgres compatibility by teaching JSON how to compare exactly like Postgres. Two large changes happened to make this happen. First, arrays were special cased during type comparison since empty arrays in Postgres compare less than other things, even JSON null. Second, JSON objects had their ordering changed to match Postgres. This means that the object encoding ordering changed for some objects. The DecodeJSON method was taught how to properly sort on-disk data to cope with old encodings. This commit does not change anything about disallowing JSON in ORDER BY or its usability in indexes. However this is important progress to support those use cases. Fixes #43123 Release note (sql change): the JSON type now supports inequality comparisons (<, <=, etc.)
Current status: I'm going to leave this PR unmerged until the following issues are fixed.
I'm fast approaching my paternity leave and have already spent more time on this than I've allotted myself and need to work on other things before I leave. I may or may not have time to return to this and fix things. I recommend ignoring this work until a PM decides we should start working on JSON again. Or someone can take it up if they feel strongly. |
Discussed offline: instead of changing the existing encoding for JSON, we could add a separate encoding using a separate coltype in descriptors. Then "old data" continues to use the old encoding (and cannot be indexed) and new data use the new type/encoding and can be indexed. This way the behavior in mixed version is clear and users can control when they migrate from one column type to the next. |
I think where this doesn't work well is backup and restores. Namely if we change the type, we probably have to rename it on the backups too and makes it difficult to restore, especially on older versions. |
Improve Postgres compatibility by teaching JSON how to compare exactly
like Postgres. Two large changes happened to make this happen.
First, arrays were special cased during type comparison since empty
arrays in Postgres compare less than other things, even JSON null.
Second, JSON objects had their ordering changed to match Postgres. This
means that the object encoding ordering changed for some objects. The
DecodeJSON method was taught how to properly sort on-disk data to cope
with old encodings.
This commit does not change anything about disallowing JSON in ORDER
BY or its usability in indexes. However this is important progress to
support those use cases.
Fixes #43123
Release note (sql change): the JSON type now supports inequality
comparisons (<, <=, etc.)