Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Support unnest(anyarray, anyarray, anyarray...) #41557

Merged
merged 1 commit into from
Jan 29, 2020

Conversation

giorgosp
Copy link
Contributor

@giorgosp giorgosp commented Oct 13, 2019

Previously unnest supported only one array argument. This commit
adds a variadic overload in order to support multiple array arguments,
as in Postgres.

The variadic overload is supported in a SELECT clause as well, in contrast
with Postgres that only supports it only in a FROM clause.

This change has two side effects.

  1. The first side effect is that tuples are allowed to be declared with
    duplicate labels, like:

SELECT ((1, '2') AS a, a);

This statement executes successfully since the tuple labels are declared
but not used.

However, accessing a duplicate tuple label returns AmbiguousColumn error.
Both statements below return a "column reference is ambiguous"
error:

SELECT (((1,2,3) AS a,b,a)).a;
SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;

  1. The to_json builtin doesn't return duplicate error anymore but
    succeeds when passed in values with duplicate labels. Eg:
SELECT to_json(x.*) FROM (VALUES (1,2)) AS x(column2);
+----------------+
|    to_json     |
|----------------|
| {"column2": 2} |
+----------------+

This is a bit more compatible with Postgres which also doesn't return
an error for this query. However, there is still an incompatibility in
the results as Postgres renders the duplicate labels as duplicate json
keys. The correct test cases have been added as a TODO comment, tracked
by #44465.

Release note (sql change): An overload was added to the unnest builtin
function in order to support multiple array arguments.

Release note (sql change): Duplicate labels are allowed when declaring a
tuple, but an Ambiguous column error is returned if a duplicate
label is accessed. For example:
SELECT ((1, '2') AS a, a); is successful but
SELECT (((1,2,3) AS a,b,a)).a; returns an error.

Fixes #41434

@giorgosp giorgosp changed the title sql: Support unnest(anyarray, anyarray, anyarray...) WIP: sql: Support unnest(anyarray, anyarray, anyarray...) Oct 13, 2019
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@giorgosp
Copy link
Contributor Author

Hi @jordanlewis ! I have created an overload of unnest that supports variadic arguments. However I have some issues and would like some advice on how to proceed:

The labels are hardcoded when passed to genProps and it seems the FunctionProperties. ReturnLabels take precedence over the labels returned by the ReturnTyper or the multipleArrayValueGenerator.ResolvedType().

If I pass nil to genProps, the variadic overload works but not the simple overload. I could try adding a new makeUnnestBuiltin that would set the labels appropriately and create the overloads inside itself, except if you have a better idea to keep the code consistent (using the already provided workflow with makeBuiltin()/genProps()/makeGeneratorOverload()) so that the overloads are easily visible and not created deeper in the stack.

This is the exact (trimmed) error from
make testlogic TESTCONFIG=local FILES='srfs' SUBTESTS='unnest':

SELECT unnest(ARRAY[1,2], ARRAY['a','b']);
--- PASS: TestExecBuild (1.20s)
    test_log_scope.go:77: test logs captured to: /tmp/logTestExecBuild113095743
    test_log_scope.go:58: use -show-logs to present logs inline
    --- PASS: TestExecBuild/local (0.28s)
        --- PASS: TestExecBuild/local/srfs (0.28s)
testing: warning: no tests to run
PASS
panic: invalid datum type given: int, expected tuple{int AS unnest, string AS unnest} [recovered]
	panic: panic while executing 1 statements: SELECT unnest(ARRAY[_, _], ARRAY[_, _]); caused by invalid datum type given: int, expected tuple{int AS unnest, string AS unnest}

goroutine 1174 [running]:
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).closeWrapper(0xc00122c000, 0x3d1a620, 0xc000a95500, 0x3064460, 0xc0016f16f0)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:731 +0x330
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1(0xc00122c000, 0x3d1a620, 0xc000a95500)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:442 +0x61
panic(0x3064460, 0xc0016f16f0)
	/usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
github.com/cockroachdb/cockroach/pkg/sql/sqlbase.DatumToEncDatum(0xc001706a00, 0x3d876a0, 0xc0010afec8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlbase/encoded_datum.go:167 +0x1b7
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*projectSetProcessor).toEncDatum(0xc001343400, 0x3d876a0, 0xc0010afec8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:280 +0x9b
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*projectSetProcessor).nextGeneratorValues(0xc001343400, 0xc00100aa20, 0x1, 0x1)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:180 +0x340
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*projectSetProcessor).Next(0xc001343400, 0x0, 0x0, 0x0, 0x3d4cd60)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/project_set.go:259 +0x89
github.com/cockroachdb/cockroach/pkg/sql/execinfra.Run(0x3d1a620, 0xc001205e00, 0x3d38f20, 0xc001343400, 0x3cff960, 0xc001423c00)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:169 +0x35
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).Run(0xc001343400, 0x3d1a620, 0xc001205e00)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:725 +0x92
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run(0xc001287560, 0x3d1a6e0, 0xc00100a990, 0x364c628, 0x0, 0x0)
	/home/gio/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:375 +0x1ec

Thank you!

@yuzefovich
Copy link
Member

Hi @giorgosp, thanks for taking this on! I looked into the relevant code, and it seems to me that current implementation of makeBuiltin does not support the case when two overloads have different FunctionProperties - and this is exactly the case that we have here with unnest. I'm actually not sure on how to properly address it - it seems to me that your idea of makeUnnestBuiltin is not sufficient because we have generators as map[string]builtinDefinition, and we need a map from string + argument types to builtinDefinition, so for unnest we need to be able to return a different builtin (with different props) depending on the input argument types.

Possibly another way to solve it is to modify the way SRFs are transformed sql/transform_srf.go, but I don't have a feeling whether it is doable/reasonable.

cc @jordanlewis @knz am I missing something?

@knz
Copy link
Contributor

knz commented Oct 14, 2019

This would need a redesign of how function properties work.

If you wish to do this I think it would be better to extend the function properties mechanism (and the overload resolution code that goes with it) in a separate PR first, then come back to the current PR once the other PR is merged.

@jordanlewis
Copy link
Member

@giorgosp, considering that the known use case we're trying to support uses exactly 2 arguments, would it be possible to just add that one for now? I know that is an unsatisfying answer...

@giorgosp
Copy link
Contributor Author

Thank you for your comments! @jordanlewis If I understand correctly, you suggest we support unnest(array, array) only.

My main driver for having two overloads was that Postgres supports unnest(array,array) only in in a FROM clause, but it seems like CockroachDB doesn't have that limitation.

So we could just keep the variadic one, which also won't break the current usages of unnest with one argument. What do you think?

@jordanlewis
Copy link
Member

No, I meant adding a second overload with 2 anyarray arguments, alongside the one with a single anyarray argument. I was under the impression from this conversation that it would be difficult to make variadic unnest work properly.

@giorgosp
Copy link
Contributor Author

As @yuzefovich and @knz said, the mechamism around FunctionProperties doesn't allow different function properties per overload. In our case we need different FunctionProperties.ReturnLabels.

unnest(anyarray) shouldn't return a tuple so that select unnest(ARRAY[1,2]) + 1 works. But
unnest(anyarray...) must return a tuple.

My suggestion to keep only the variadic overload won't work either.
I am still looking into it.

@yuzefovich
Copy link
Member

I agree with @giorgosp - I also think without making changes that Raphael pointed out it is not possible to add unnest(anyarray,anyarray) because it requires two labels, but currently a single "unnest" label is hardcoded in the function properties.

@giorgosp
Copy link
Contributor Author

giorgosp commented Oct 23, 2019

After a second look, I found out it would be simple to completely remove the ReturnLabels which were hardcoded in FunctionProperties and have some flexibility as to what return labels will be used each time.

Opened a PR (#41861) to await for your feedback.

@giorgosp giorgosp force-pushed the unnest-signature branch 2 times, most recently from fa16a1a to bd772bb Compare October 28, 2019 18:29
@giorgosp giorgosp changed the title WIP: sql: Support unnest(anyarray, anyarray, anyarray...) sql: Support unnest(anyarray, anyarray, anyarray...) Dec 26, 2019
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I think you're very close to the finish line :)

Reviewed 2 of 3 files at r1, 4 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @giorgosp and @jordanlewis)


pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):


query T
SELECT unnest(ARRAY[1,2], ARRAY['a'], ARRAY[1.1, 2.2, 3.3])

Hm, postgres (11.5 version) doesn't seem to support such case (of tuples), only SELECT * FROM unnest... is supported. I expected that there would be NULLs in the tuples when the corresponding array is "exhausted", so I wanted to check with PSQL. Shouldn't there be NULLs in this query?


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

					}
					returnTypes[i] = *arg.ResolvedType().ArrayContents()
					labels[i] = "unnest"

The build is complaining about duplicated labels, probably we'll need to include the counter into the label.


pkg/sql/sem/builtins/generator_builtins.go, line 508 at r2 (raw file):

	for i, arr := range s.arrays {
		returnTypes[i] = *arr.ParamTyp
		labels[i] = "unnest"

ditto for the counter


pkg/sql/sem/builtins/generator_builtins.go, line 535 at r2 (raw file):

// Values implements the tree.ValueGenerator interface.
func (s *multipleArrayValueGenerator) Values() tree.Datums {
	var datums tree.Datums

[nit]: it probably makes sense to allocate the memory right away so that no allocations occur during appends below. We could take it one step further and perform the allocation in Start method, store the slice within the struct and reuse it on every call to Values.

Copy link
Contributor Author

@giorgosp giorgosp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please bare with me (some more) because I will be on vacation for a week. But afterwards I will work on it so we finish it as soon as possible.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):

Previously, yuzefovich wrote…

Hm, postgres (11.5 version) doesn't seem to support such case (of tuples), only SELECT * FROM unnest... is supported. I expected that there would be NULLs in the tuples when the corresponding array is "exhausted", so I wanted to check with PSQL. Shouldn't there be NULLs in this query?

Indeed postgres doesn't support unnest in a SELECT clause. NULLs are returned by the srf anyway but I guess cockroachdb doesn't render them? If unnest is used in a FROM clause, the NULLs are rendered.


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

Previously, yuzefovich wrote…

The build is complaining about duplicated labels, probably we'll need to include the counter into the label.

Thanks for checking it out! I wanted to ask for help about this. Do we really want the counter-in-label solution? Because in postgres all the labels have the same name.

What is that duplicate labels check used for? Would it make sense to skip it for system-defined labels and keep it only for user defined ones (when AS is used in the query)? Postgres already has this quirk for unnest and SELECT unnest FROM unnest(...) returns "ambiguous column" error as someone would expect.

Copy link
Contributor Author

@giorgosp giorgosp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

Previously, giorgosp (George Papadrosou) wrote…

Thanks for checking it out! I wanted to ask for help about this. Do we really want the counter-in-label solution? Because in postgres all the labels have the same name.

What is that duplicate labels check used for? Would it make sense to skip it for system-defined labels and keep it only for user defined ones (when AS is used in the query)? Postgres already has this quirk for unnest and SELECT unnest FROM unnest(...) returns "ambiguous column" error as someone would expect.

Still thinking about this. Adding counters to the unnest labels would finalize this PR but would create an incompatibility with postgres. Trying to figure out func (expr *Tuple) TypeCheck is not called with local config


pkg/sql/sem/builtins/generator_builtins.go, line 535 at r2 (raw file):

Previously, yuzefovich wrote…

[nit]: it probably makes sense to allocate the memory right away so that no allocations occur during appends below. We could take it one step further and perform the allocation in Start method, store the slice within the struct and reuse it on every call to Values.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, @knz, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):

Previously, giorgosp (George Papadrosou) wrote…

Indeed postgres doesn't support unnest in a SELECT clause. NULLs are returned by the srf anyway but I guess cockroachdb doesn't render them? If unnest is used in a FROM clause, the NULLs are rendered.

Ok, sounds good to me.


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

Previously, giorgosp (George Papadrosou) wrote…

Still thinking about this. Adding counters to the unnest labels would finalize this PR but would create an incompatibility with postgres. Trying to figure out func (expr *Tuple) TypeCheck is not called with local config

Hm, I see that in #24832 we purposefully decided to deviate from Postgres, so I think we now have two choices:

  1. include the counter into the label in this PR. This will increase deviation from psql, but no other changes are needed
  2. remove the requirement that the tuple's labels are unique. Then the counter is not needed, but we might need to change the way we access an element of the tuple by the label (if there are multiple matches, just use the first one, or something similar).

I don't know which one is preferable. Maybe @knz has thoughts?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

Previously, yuzefovich wrote…

Hm, I see that in #24832 we purposefully decided to deviate from Postgres, so I think we now have two choices:

  1. include the counter into the label in this PR. This will increase deviation from psql, but no other changes are needed
  2. remove the requirement that the tuple's labels are unique. Then the counter is not needed, but we might need to change the way we access an element of the tuple by the label (if there are multiple matches, just use the first one, or something similar).

I don't know which one is preferable. Maybe @knz has thoughts?

I vote to remove the duplicate check in labels but at the same time do the same as pg and return an ambiguous error during the per-label lookup if the searched label is found two times.

(This will need dedicated unit tests)

@giorgosp giorgosp force-pushed the unnest-signature branch 3 times, most recently from c494a43 to c893d72 Compare January 28, 2020 21:09
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! :lgtm:

Could you please update the PR description message with the latest commit message? Also, an addition to release note is needed now that we allow duplicate labels but will error out on accessing them if it is ambigous.

Reviewed 8 of 8 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/json_builtins, line 327 at r4 (raw file):

{"a": 1, "column2": 2}

# TODO: Implement the test cases below to be compatible with Postgres

Do you know if we have an issue about this (probably not)? If not, could you please create one and mention it in this comment?


pkg/sql/logictest/testdata/logic_test/tuple, line 678 at r4 (raw file):

(1,2,hello,,)  (t,,"(f,6.6,f)")

# Duplicate tuple labels are allowed (but access fails a duplicated label is accessed, see

[nit]: s/access fails a/access fails when a/g or something like that.


pkg/sql/sem/tree/type_check.go, line 610 at r4 (raw file):

		if label == expr.ColName {
			if expr.ColIndex != -1 {
				// Found a duplicate label

[nit]: missing period.

Previously `unnest` supported only one array argument. This commit
adds a variadic overload in order to support multiple array arguments,
as in Postgres.

The variadic overload is supported in a SELECT clause as well, in contrast
with Postgres that only supports it only in a FROM clause.

This change has two side effects.

1. The first side effect is that tuples are allowed to be declared with
duplicate labels, like:

`SELECT ((1, '2') AS a, a);`

This statement executes successfully since the tuple labels are declared
but not used.

However, accessing a duplicate tuple label returns AmbiguousColumn error.
Both statements below return a "column reference <column> is ambiguous"
error:

`SELECT (((1,2,3) AS a,b,a)).a;`
`SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;`

2. The `to_json` builtin doesn't return duplicate error anymore but
succeeds when passed in values with duplicate labels. Eg:

```sql
SELECT to_json(x.*) FROM (VALUES (1,2)) AS x(column2);
+----------------+
|    to_json     |
|----------------|
| {"column2": 2} |
+----------------+
```

This is a bit more compatible with Postgres which also doesn't return
an error for this query. However, there is still an incompatibility in
the results as Postgres renders the duplicate labels as duplicate json
keys. The correct test cases have been added as a TODO comment, tracked
by cockroachdb#44465.

Release note (sql change): An overload was added to the `unnest` builtin
function in order to support multiple array arguments.

Release note (sql change): Duplicate labels are allowed when declaring a
tuple, but an Ambiguous column error is returned if a duplicate
label is accessed. For example:
`SELECT ((1, '2') AS a, a);` is successful but
`SELECT (((1,2,3) AS a,b,a)).a;` returns an error.

Fixes cockroachdb#41434
Copy link
Contributor Author

@giorgosp giorgosp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Updated the PR and the commit msg.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/json_builtins, line 327 at r4 (raw file):

Previously, yuzefovich wrote…

Do you know if we have an issue about this (probably not)? If not, could you please create one and mention it in this comment?

Done!


pkg/sql/logictest/testdata/logic_test/tuple, line 678 at r4 (raw file):

Previously, yuzefovich wrote…

[nit]: s/access fails a/access fails when a/g or something like that.

Done.


pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):

Previously, knz (kena) wrote…

I vote to remove the duplicate check in labels but at the same time do the same as pg and return an ambiguous error during the per-label lookup if the searched label is found two times.

(This will need dedicated unit tests)

Thank you all for your comments! I was about to add a comment but @yuzefovich beat me to it :)


pkg/sql/sem/tree/type_check.go, line 610 at r4 (raw file):

Previously, yuzefovich wrote…

[nit]: missing period.

Right. I was expecting the linter to show an error btw :/

@yuzefovich
Copy link
Member

Thanks for your contribution!

bors r=yuzefovich

craig bot pushed a commit that referenced this pull request Jan 28, 2020
41557: sql: Support unnest(anyarray, anyarray, anyarray...) r=yuzefovich a=giorgosp

Previously `unnest` supported only one array argument. This commit
adds a variadic overload in order to support multiple array arguments,
as in Postgres.

The variadic overload is supported in a SELECT clause as well, in contrast
with Postgres that only supports it only in a FROM clause.

This change has two side effects.

1. The first side effect is that tuples are allowed to be declared with
duplicate labels, like:

`SELECT ((1, '2') AS a, a);`

This statement executes successfully since the tuple labels are declared
but not used.

However, accessing a duplicate tuple label returns AmbiguousColumn error.
Both statements below return a "column reference <column> is ambiguous"
error:

`SELECT (((1,2,3) AS a,b,a)).a;`
`SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;`

2. The `to_json` builtin doesn't return duplicate error anymore but
succeeds when passed in values with duplicate labels. Eg:

```sql
SELECT to_json(x.*) FROM (VALUES (1,2)) AS x(column2);
+----------------+
|    to_json     |
|----------------|
| {"column2": 2} |
+----------------+
```

This is a bit more compatible with Postgres which also doesn't return
an error for this query. However, there is still an incompatibility in
the results as Postgres renders the duplicate labels as duplicate json
keys. The correct test cases have been added as a TODO comment, tracked
by #44465.

Release note (sql change): An overload was added to the `unnest` builtin
function in order to support multiple array arguments.

Release note (sql change): Duplicate labels are allowed when declaring a
tuple, but an Ambiguous column error is returned if a duplicate
label is accessed. For example:
`SELECT ((1, '2') AS a, a);` is successful but
`SELECT (((1,2,3) AS a,b,a)).a;` returns an error.

Fixes #41434


Co-authored-by: George Papadrosou <gpapadrosou@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jan 29, 2020

Build succeeded

@craig craig bot merged commit 23a9712 into cockroachdb:master Jan 29, 2020
@giorgosp giorgosp deleted the unnest-signature branch January 29, 2020 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: support unnest(anyarray, anyarray)
5 participants