sql/json: support < and <= for JSON datums #43832

maddyblue · 2020-01-08T23:01:22Z

Improve Postgres compatibility by teaching JSON how to compare exactly
like Postgres. Two large changes happened to make this happen.

First, arrays were special cased during type comparison since empty
arrays in Postgres compare less than other things, even JSON null.

Second, JSON objects had their ordering changed to match Postgres. This
means that the object encoding ordering changed for some objects. The
DecodeJSON method was taught how to properly sort on-disk data to cope
with old encodings.

This commit does not change anything about disallowing JSON in ORDER
BY or its usability in indexes. However this is important progress to
support those use cases.

Fixes #43123

Release note (sql change): the JSON type now supports inequality
comparisons (<, <=, etc.)

cockroach-teamcity · 2020-01-08T23:01:28Z

This change is

maddyblue · 2020-01-08T23:15:23Z

For whoever reviews this: what ways can this break? For example, I'm currently worried about a situation where a mixed-version cluster with this new code and some old version are running. Say a query gets issued with the new code as the gateway (and thus it supports JSON <). But let's say distsql schedules a processor on an old node that doesn't know about JSON <. Will that blow up the cluster?

Can this break if a new node gets a json object from disk (and thus sorts it with this new sort) and then serializes that object, sends it to an old node for distsql processing. The old node has some other json bytes (and thus sorted in a different manner), and compares them for equality, which fails since the exact ordering of fields doesn't match up, and the old (and new) json code assumes objects are correctly sorted.

Any other scenarios? Because of this I'm very hesitant to merge this until these questions are resolved.

rohany · 2020-01-08T23:33:10Z

I think that something will go wrong in that case -- I don't know if the old node will panic or just error at the unsupported comparison operation (should be relatively simple to test though). Your other concern seems pretty likely.

It feels to me that you need some sort of version gate on this, but I don't know where to put it. I'm also wondering how this has been done in the past with introduction of new types / updating operations on existing types.

yuzefovich · 2020-01-09T02:23:30Z

There is already DistSQLVersion in place (it's in pkg/sql/execinfra/server_config.go:61). Basically you'll bump it up (as well as MinAcceptedVersion), and everything should just work :)

otan

For whoever reviews this: what ways can this break?
Can this break if a new node gets a json object from disk

So the only case I can think of this breaking that hasn't been touched is key encoding.
But I noticed it's not supported:

root@127.0.0.1:49698/defaultdb> create table a2 (j jsonb primary key);
ERROR: unimplemented: column j is of type jsonb and thus is not indexable
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/35730

So I think the only problem is in a mixed version cluster, the results read by different versioned nodes may present a different order as we Decode differently. Is that a problem?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mjibson and @otan)

pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):

DROP TABLE json_family

# See #43123

maybe worth subtesting, e.g. subtest regression_43123_json_sorting?

pkg/util/json/json.go, line 400 at r1 (raw file):

	// Zero-length arrays are special. They sort before everything, even
	// JSON null.

Can you point at the docs for this from postgres' side? If not documented, strange!

otan · 2020-01-10T14:35:33Z

pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):

                ('{"b": 2}'::JSONB),
                ('{"bb": "dd"}'::JSONB),
                ('{"c": "e"}'::JSONB),

is there value in adding a more recursive json structure in here?

maddyblue

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan)

pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

maybe worth subtesting, e.g. subtest regression_43123_json_sorting?

Done.

pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

is there value in adding a more recursive json structure in here?

Done.

pkg/util/json/json.go, line 400 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

Can you point at the docs for this from postgres' side? If not documented, strange!

Done.

Improve Postgres compatibility by teaching JSON how to compare exactly like Postgres. Two large changes happened to make this happen. First, arrays were special cased during type comparison since empty arrays in Postgres compare less than other things, even JSON null. Second, JSON objects had their ordering changed to match Postgres. This means that the object encoding ordering changed for some objects. The DecodeJSON method was taught how to properly sort on-disk data to cope with old encodings. This commit does not change anything about disallowing JSON in ORDER BY or its usability in indexes. However this is important progress to support those use cases. Fixes #43123 Release note (sql change): the JSON type now supports inequality comparisons (<, <=, etc.)

maddyblue · 2020-01-13T23:41:20Z

Current status: I'm going to leave this PR unmerged until the following issues are fixed.

It's not passing a json_build_object test. The test is correct (expected matches postgres), but the implementation is broken.
Need to bump the distsql api version and set the new minimum to it. Make sure the distsql team is ok with bumping the minimum as it means in a mixed node cluster nodes of different versions will not be able to schedule work on the other nodes, which could be bad for customers load.

I'm fast approaching my paternity leave and have already spent more time on this than I've allotted myself and need to work on other things before I leave. I may or may not have time to return to this and fix things. I recommend ignoring this work until a PM decides we should start working on JSON again. Or someone can take it up if they feel strongly.

knz · 2020-02-05T17:10:24Z

Discussed offline: instead of changing the existing encoding for JSON, we could add a separate encoding using a separate coltype in descriptors.

Then "old data" continues to use the old encoding (and cannot be indexed) and new data use the new type/encoding and can be indexed.

This way the behavior in mixed version is clear and users can control when they migrate from one column type to the next.

otan · 2020-02-05T23:48:59Z

we could add a separate encoding using a separate coltype in descriptors.

I think where this doesn't work well is backup and restores. Namely if we change the type, we probably have to rename it on the backups too and makes it difficult to restore, especially on older versions.

bobvawter · 2020-03-10T16:27:49Z

All committers have signed the CLA.

maddyblue requested a review from otan January 8, 2020 23:15

otan reviewed Jan 10, 2020

View reviewed changes

maddyblue commented Jan 13, 2020

View reviewed changes

rohany mentioned this pull request Feb 19, 2020

sql: change encoding strategy for distsql datum hashing #45229

Merged

maddyblue closed this Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/json: support < and <= for JSON datums #43832

sql/json: support < and <= for JSON datums #43832

maddyblue commented Jan 8, 2020

cockroach-teamcity commented Jan 8, 2020

maddyblue commented Jan 8, 2020

rohany commented Jan 8, 2020

yuzefovich commented Jan 9, 2020

otan left a comment

otan commented Jan 10, 2020

maddyblue left a comment

maddyblue commented Jan 13, 2020 •

edited

knz commented Feb 5, 2020

otan commented Feb 5, 2020 •

edited

bobvawter commented Mar 10, 2020 •

edited by cockroach-teamcity

sql/json: support < and <= for JSON datums #43832

sql/json: support < and <= for JSON datums #43832

Conversation

maddyblue commented Jan 8, 2020

cockroach-teamcity commented Jan 8, 2020

maddyblue commented Jan 8, 2020

rohany commented Jan 8, 2020

yuzefovich commented Jan 9, 2020

otan left a comment

Choose a reason for hiding this comment

otan commented Jan 10, 2020

maddyblue left a comment

Choose a reason for hiding this comment

maddyblue commented Jan 13, 2020 • edited

knz commented Feb 5, 2020

otan commented Feb 5, 2020 • edited

bobvawter commented Mar 10, 2020 • edited by cockroach-teamcity

maddyblue commented Jan 13, 2020 •

edited

otan commented Feb 5, 2020 •

edited

bobvawter commented Mar 10, 2020 •

edited by cockroach-teamcity