Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/json: support < and <= for JSON datums #43832

Closed
wants to merge 1 commit into from
Closed

sql/json: support < and <= for JSON datums #43832

wants to merge 1 commit into from

Conversation

maddyblue
Copy link
Contributor

Improve Postgres compatibility by teaching JSON how to compare exactly
like Postgres. Two large changes happened to make this happen.

First, arrays were special cased during type comparison since empty
arrays in Postgres compare less than other things, even JSON null.

Second, JSON objects had their ordering changed to match Postgres. This
means that the object encoding ordering changed for some objects. The
DecodeJSON method was taught how to properly sort on-disk data to cope
with old encodings.

This commit does not change anything about disallowing JSON in ORDER
BY or its usability in indexes. However this is important progress to
support those use cases.

Fixes #43123

Release note (sql change): the JSON type now supports inequality
comparisons (<, <=, etc.)

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@maddyblue
Copy link
Contributor Author

For whoever reviews this: what ways can this break? For example, I'm currently worried about a situation where a mixed-version cluster with this new code and some old version are running. Say a query gets issued with the new code as the gateway (and thus it supports JSON <). But let's say distsql schedules a processor on an old node that doesn't know about JSON <. Will that blow up the cluster?

Can this break if a new node gets a json object from disk (and thus sorts it with this new sort) and then serializes that object, sends it to an old node for distsql processing. The old node has some other json bytes (and thus sorted in a different manner), and compares them for equality, which fails since the exact ordering of fields doesn't match up, and the old (and new) json code assumes objects are correctly sorted.

Any other scenarios? Because of this I'm very hesitant to merge this until these questions are resolved.

@maddyblue maddyblue requested a review from otan January 8, 2020 23:15
@rohany
Copy link
Contributor

rohany commented Jan 8, 2020

I think that something will go wrong in that case -- I don't know if the old node will panic or just error at the unsupported comparison operation (should be relatively simple to test though). Your other concern seems pretty likely.

It feels to me that you need some sort of version gate on this, but I don't know where to put it. I'm also wondering how this has been done in the past with introduction of new types / updating operations on existing types.

@yuzefovich
Copy link
Member

There is already DistSQLVersion in place (it's in pkg/sql/execinfra/server_config.go:61). Basically you'll bump it up (as well as MinAcceptedVersion), and everything should just work :)

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whoever reviews this: what ways can this break?
Can this break if a new node gets a json object from disk

So the only case I can think of this breaking that hasn't been touched is key encoding.
But I noticed it's not supported:

root@127.0.0.1:49698/defaultdb> create table a2 (j jsonb primary key);
ERROR: unimplemented: column j is of type jsonb and thus is not indexable
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/35730

So I think the only problem is in a mixed version cluster, the results read by different versioned nodes may present a different order as we Decode differently. Is that a problem?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mjibson and @otan)


pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):

DROP TABLE json_family

# See #43123

maybe worth subtesting, e.g. subtest regression_43123_json_sorting?


pkg/util/json/json.go, line 400 at r1 (raw file):

	// Zero-length arrays are special. They sort before everything, even
	// JSON null.

Can you point at the docs for this from postgres' side? If not documented, strange!

@otan
Copy link
Contributor

otan commented Jan 10, 2020


pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):

                ('{"b": 2}'::JSONB),
                ('{"bb": "dd"}'::JSONB),
                ('{"c": "e"}'::JSONB),

is there value in adding a more recursive json structure in here?

Copy link
Contributor Author

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan)


pkg/sql/logictest/testdata/logic_test/json, line 723 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

maybe worth subtesting, e.g. subtest regression_43123_json_sorting?

Done.


pkg/sql/logictest/testdata/logic_test/json, line 746 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

is there value in adding a more recursive json structure in here?

Done.


pkg/util/json/json.go, line 400 at r1 (raw file):

Previously, otan (Oliver Tan) wrote…

Can you point at the docs for this from postgres' side? If not documented, strange!

Done.

Improve Postgres compatibility by teaching JSON how to compare exactly
like Postgres. Two large changes happened to make this happen.

First, arrays were special cased during type comparison since empty
arrays in Postgres compare less than other things, even JSON null.

Second, JSON objects had their ordering changed to match Postgres. This
means that the object encoding ordering changed for some objects. The
DecodeJSON method was taught how to properly sort on-disk data to cope
with old encodings.

This commit does not change anything about disallowing JSON in ORDER
BY or its usability in indexes. However this is important progress to
support those use cases.

Fixes #43123

Release note (sql change): the JSON type now supports inequality
comparisons (<, <=, etc.)
@maddyblue
Copy link
Contributor Author

maddyblue commented Jan 13, 2020

Current status: I'm going to leave this PR unmerged until the following issues are fixed.

  • It's not passing a json_build_object test. The test is correct (expected matches postgres), but the implementation is broken.
  • Need to bump the distsql api version and set the new minimum to it. Make sure the distsql team is ok with bumping the minimum as it means in a mixed node cluster nodes of different versions will not be able to schedule work on the other nodes, which could be bad for customers load.

I'm fast approaching my paternity leave and have already spent more time on this than I've allotted myself and need to work on other things before I leave. I may or may not have time to return to this and fix things. I recommend ignoring this work until a PM decides we should start working on JSON again. Or someone can take it up if they feel strongly.

@knz
Copy link
Contributor

knz commented Feb 5, 2020

Discussed offline: instead of changing the existing encoding for JSON, we could add a separate encoding using a separate coltype in descriptors.

Then "old data" continues to use the old encoding (and cannot be indexed) and new data use the new type/encoding and can be indexed.

This way the behavior in mixed version is clear and users can control when they migrate from one column type to the next.

@otan
Copy link
Contributor

otan commented Feb 5, 2020

we could add a separate encoding using a separate coltype in descriptors.

I think where this doesn't work well is backup and restores. Namely if we change the type, we probably have to rename it on the backups too and makes it difficult to restore, especially on older versions.

@bobvawter
Copy link
Member

bobvawter commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

@maddyblue maddyblue closed this Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: implement json compare
7 participants