-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Support unnest(anyarray, anyarray, anyarray...) #41557
Conversation
Hi @jordanlewis ! I have created an overload of The labels are hardcoded when passed to If I pass This is the exact (trimmed) error from
Thank you! |
Hi @giorgosp, thanks for taking this on! I looked into the relevant code, and it seems to me that current implementation of Possibly another way to solve it is to modify the way SRFs are transformed cc @jordanlewis @knz am I missing something? |
This would need a redesign of how function properties work. If you wish to do this I think it would be better to extend the function properties mechanism (and the overload resolution code that goes with it) in a separate PR first, then come back to the current PR once the other PR is merged. |
@giorgosp, considering that the known use case we're trying to support uses exactly 2 arguments, would it be possible to just add that one for now? I know that is an unsatisfying answer... |
Thank you for your comments! @jordanlewis If I understand correctly, you suggest we support My main driver for having two overloads was that Postgres supports So we could just keep the variadic one, which also won't break the current usages of unnest with one argument. What do you think? |
No, I meant adding a second overload with 2 |
As @yuzefovich and @knz said, the mechamism around
My suggestion to keep only the variadic overload won't work either. |
I agree with @giorgosp - I also think without making changes that Raphael pointed out it is not possible to add |
After a second look, I found out it would be simple to completely remove the ReturnLabels which were hardcoded in Opened a PR (#41861) to await for your feedback. |
fa16a1a
to
bd772bb
Compare
bd772bb
to
48262d2
Compare
48262d2
to
a281de6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! I think you're very close to the finish line :)
Reviewed 2 of 3 files at r1, 4 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @giorgosp and @jordanlewis)
pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):
query T SELECT unnest(ARRAY[1,2], ARRAY['a'], ARRAY[1.1, 2.2, 3.3])
Hm, postgres (11.5 version) doesn't seem to support such case (of tuples), only SELECT * FROM unnest...
is supported. I expected that there would be NULLs in the tuples when the corresponding array is "exhausted", so I wanted to check with PSQL. Shouldn't there be NULLs in this query?
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
} returnTypes[i] = *arg.ResolvedType().ArrayContents() labels[i] = "unnest"
The build is complaining about duplicated labels, probably we'll need to include the counter into the label.
pkg/sql/sem/builtins/generator_builtins.go, line 508 at r2 (raw file):
for i, arr := range s.arrays { returnTypes[i] = *arr.ParamTyp labels[i] = "unnest"
ditto for the counter
pkg/sql/sem/builtins/generator_builtins.go, line 535 at r2 (raw file):
// Values implements the tree.ValueGenerator interface. func (s *multipleArrayValueGenerator) Values() tree.Datums { var datums tree.Datums
[nit]: it probably makes sense to allocate the memory right away so that no allocations occur during append
s below. We could take it one step further and perform the allocation in Start
method, store the slice within the struct and reuse it on every call to Values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please bare with me (some more) because I will be on vacation for a week. But afterwards I will work on it so we finish it as soon as possible.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):
Previously, yuzefovich wrote…
Hm, postgres (11.5 version) doesn't seem to support such case (of tuples), only
SELECT * FROM unnest...
is supported. I expected that there would be NULLs in the tuples when the corresponding array is "exhausted", so I wanted to check with PSQL. Shouldn't there be NULLs in this query?
Indeed postgres doesn't support unnest
in a SELECT clause. NULLs are returned by the srf anyway but I guess cockroachdb doesn't render them? If unnest
is used in a FROM clause, the NULLs are rendered.
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
Previously, yuzefovich wrote…
The build is complaining about duplicated labels, probably we'll need to include the counter into the label.
Thanks for checking it out! I wanted to ask for help about this. Do we really want the counter-in-label solution? Because in postgres all the labels have the same name.
What is that duplicate labels check used for? Would it make sense to skip it for system-defined labels and keep it only for user defined ones (when AS is used in the query)? Postgres already has this quirk for unnest and SELECT unnest FROM unnest(...)
returns "ambiguous column" error as someone would expect.
a281de6
to
6e9751b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
Previously, giorgosp (George Papadrosou) wrote…
Thanks for checking it out! I wanted to ask for help about this. Do we really want the counter-in-label solution? Because in postgres all the labels have the same name.
What is that duplicate labels check used for? Would it make sense to skip it for system-defined labels and keep it only for user defined ones (when AS is used in the query)? Postgres already has this quirk for unnest and
SELECT unnest FROM unnest(...)
returns "ambiguous column" error as someone would expect.
Still thinking about this. Adding counters to the unnest labels would finalize this PR but would create an incompatibility with postgres. Trying to figure out func (expr *Tuple) TypeCheck
is not called with local config
pkg/sql/sem/builtins/generator_builtins.go, line 535 at r2 (raw file):
Previously, yuzefovich wrote…
[nit]: it probably makes sense to allocate the memory right away so that no allocations occur during
append
s below. We could take it one step further and perform the allocation inStart
method, store the slice within the struct and reuse it on every call toValues
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, @knz, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/srfs, line 1113 at r2 (raw file):
Previously, giorgosp (George Papadrosou) wrote…
Indeed postgres doesn't support
unnest
in a SELECT clause. NULLs are returned by the srf anyway but I guess cockroachdb doesn't render them? Ifunnest
is used in a FROM clause, the NULLs are rendered.
Ok, sounds good to me.
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
Previously, giorgosp (George Papadrosou) wrote…
Still thinking about this. Adding counters to the unnest labels would finalize this PR but would create an incompatibility with postgres. Trying to figure out
func (expr *Tuple) TypeCheck
is not called with local config
Hm, I see that in #24832 we purposefully decided to deviate from Postgres, so I think we now have two choices:
- include the counter into the label in this PR. This will increase deviation from psql, but no other changes are needed
- remove the requirement that the tuple's labels are unique. Then the counter is not needed, but we might need to change the way we access an element of the tuple by the label (if there are multiple matches, just use the first one, or something similar).
I don't know which one is preferable. Maybe @knz has thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2, 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
Previously, yuzefovich wrote…
Hm, I see that in #24832 we purposefully decided to deviate from Postgres, so I think we now have two choices:
- include the counter into the label in this PR. This will increase deviation from psql, but no other changes are needed
- remove the requirement that the tuple's labels are unique. Then the counter is not needed, but we might need to change the way we access an element of the tuple by the label (if there are multiple matches, just use the first one, or something similar).
I don't know which one is preferable. Maybe @knz has thoughts?
I vote to remove the duplicate check in labels but at the same time do the same as pg and return an ambiguous error during the per-label lookup if the searched label is found two times.
(This will need dedicated unit tests)
c494a43
to
c893d72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please update the PR description message with the latest commit message? Also, an addition to release note is needed now that we allow duplicate labels but will error out on accessing them if it is ambigous.
Reviewed 8 of 8 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @giorgosp, @jordanlewis, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/json_builtins, line 327 at r4 (raw file):
{"a": 1, "column2": 2} # TODO: Implement the test cases below to be compatible with Postgres
Do you know if we have an issue about this (probably not)? If not, could you please create one and mention it in this comment?
pkg/sql/logictest/testdata/logic_test/tuple, line 678 at r4 (raw file):
(1,2,hello,,) (t,,"(f,6.6,f)") # Duplicate tuple labels are allowed (but access fails a duplicated label is accessed, see
[nit]: s/access fails a/access fails when a/g
or something like that.
pkg/sql/sem/tree/type_check.go, line 610 at r4 (raw file):
if label == expr.ColName { if expr.ColIndex != -1 { // Found a duplicate label
[nit]: missing period.
Previously `unnest` supported only one array argument. This commit adds a variadic overload in order to support multiple array arguments, as in Postgres. The variadic overload is supported in a SELECT clause as well, in contrast with Postgres that only supports it only in a FROM clause. This change has two side effects. 1. The first side effect is that tuples are allowed to be declared with duplicate labels, like: `SELECT ((1, '2') AS a, a);` This statement executes successfully since the tuple labels are declared but not used. However, accessing a duplicate tuple label returns AmbiguousColumn error. Both statements below return a "column reference <column> is ambiguous" error: `SELECT (((1,2,3) AS a,b,a)).a;` `SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;` 2. The `to_json` builtin doesn't return duplicate error anymore but succeeds when passed in values with duplicate labels. Eg: ```sql SELECT to_json(x.*) FROM (VALUES (1,2)) AS x(column2); +----------------+ | to_json | |----------------| | {"column2": 2} | +----------------+ ``` This is a bit more compatible with Postgres which also doesn't return an error for this query. However, there is still an incompatibility in the results as Postgres renders the duplicate labels as duplicate json keys. The correct test cases have been added as a TODO comment, tracked by cockroachdb#44465. Release note (sql change): An overload was added to the `unnest` builtin function in order to support multiple array arguments. Release note (sql change): Duplicate labels are allowed when declaring a tuple, but an Ambiguous column error is returned if a duplicate label is accessed. For example: `SELECT ((1, '2') AS a, a);` is successful but `SELECT (((1,2,3) AS a,b,a)).a;` returns an error. Fixes cockroachdb#41434
c893d72
to
23a9712
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Updated the PR and the commit msg.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/json_builtins, line 327 at r4 (raw file):
Previously, yuzefovich wrote…
Do you know if we have an issue about this (probably not)? If not, could you please create one and mention it in this comment?
Done!
pkg/sql/logictest/testdata/logic_test/tuple, line 678 at r4 (raw file):
Previously, yuzefovich wrote…
[nit]:
s/access fails a/access fails when a/g
or something like that.
Done.
pkg/sql/sem/builtins/generator_builtins.go, line 185 at r2 (raw file):
Previously, knz (kena) wrote…
I vote to remove the duplicate check in labels but at the same time do the same as pg and return an ambiguous error during the per-label lookup if the searched label is found two times.
(This will need dedicated unit tests)
Thank you all for your comments! I was about to add a comment but @yuzefovich beat me to it :)
pkg/sql/sem/tree/type_check.go, line 610 at r4 (raw file):
Previously, yuzefovich wrote…
[nit]: missing period.
Right. I was expecting the linter to show an error btw :/
Thanks for your contribution! bors r=yuzefovich |
41557: sql: Support unnest(anyarray, anyarray, anyarray...) r=yuzefovich a=giorgosp Previously `unnest` supported only one array argument. This commit adds a variadic overload in order to support multiple array arguments, as in Postgres. The variadic overload is supported in a SELECT clause as well, in contrast with Postgres that only supports it only in a FROM clause. This change has two side effects. 1. The first side effect is that tuples are allowed to be declared with duplicate labels, like: `SELECT ((1, '2') AS a, a);` This statement executes successfully since the tuple labels are declared but not used. However, accessing a duplicate tuple label returns AmbiguousColumn error. Both statements below return a "column reference <column> is ambiguous" error: `SELECT (((1,2,3) AS a,b,a)).a;` `SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;` 2. The `to_json` builtin doesn't return duplicate error anymore but succeeds when passed in values with duplicate labels. Eg: ```sql SELECT to_json(x.*) FROM (VALUES (1,2)) AS x(column2); +----------------+ | to_json | |----------------| | {"column2": 2} | +----------------+ ``` This is a bit more compatible with Postgres which also doesn't return an error for this query. However, there is still an incompatibility in the results as Postgres renders the duplicate labels as duplicate json keys. The correct test cases have been added as a TODO comment, tracked by #44465. Release note (sql change): An overload was added to the `unnest` builtin function in order to support multiple array arguments. Release note (sql change): Duplicate labels are allowed when declaring a tuple, but an Ambiguous column error is returned if a duplicate label is accessed. For example: `SELECT ((1, '2') AS a, a);` is successful but `SELECT (((1,2,3) AS a,b,a)).a;` returns an error. Fixes #41434 Co-authored-by: George Papadrosou <gpapadrosou@gmail.com>
Build succeeded |
Previously
unnest
supported only one array argument. This commitadds a variadic overload in order to support multiple array arguments,
as in Postgres.
The variadic overload is supported in a SELECT clause as well, in contrast
with Postgres that only supports it only in a FROM clause.
This change has two side effects.
duplicate labels, like:
SELECT ((1, '2') AS a, a);
This statement executes successfully since the tuple labels are declared
but not used.
However, accessing a duplicate tuple label returns AmbiguousColumn error.
Both statements below return a "column reference is ambiguous"
error:
SELECT (((1,2,3) AS a,b,a)).a;
SELECT ((unnest(ARRAY[1,2], ARRAY[1,2]))).unnest;
to_json
builtin doesn't return duplicate error anymore butsucceeds when passed in values with duplicate labels. Eg:
This is a bit more compatible with Postgres which also doesn't return
an error for this query. However, there is still an incompatibility in
the results as Postgres renders the duplicate labels as duplicate json
keys. The correct test cases have been added as a TODO comment, tracked
by #44465.
Release note (sql change): An overload was added to the
unnest
builtinfunction in order to support multiple array arguments.
Release note (sql change): Duplicate labels are allowed when declaring a
tuple, but an Ambiguous column error is returned if a duplicate
label is accessed. For example:
SELECT ((1, '2') AS a, a);
is successful butSELECT (((1,2,3) AS a,b,a)).a;
returns an error.Fixes #41434