New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support array_agg with arrays as inputs #117838
Conversation
5043f77
to
af62030
Compare
6a77e61
to
775ce95
Compare
This commit fixes a minor omission where `DistsqlBlocklist` property of functions would be ignored when checking distributability of aggregate functions. It appears that only `array_agg` with tuples as the input argument could previously unexpectedly get distributed (however, it seems to me that things still worked, so this commit leaves a TODO to re-evaluate that restriction). The main driver for this change is the following commit which introduces support of `array_agg` with arrays as inputs, which we cannot distribute (because we don't have value encoding for nested arrays at the moment, so we don't know how to send them over the wire). Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)
pkg/sql/opt/optbuilder/scalar.go
line 154 at r2 (raw file):
// Thus, we reject a query that is correlated and over a type that we can't array_agg. typ := b.factory.Metadata().ColumnMeta(inCol).Type if !s.outerCols.Empty() && !memo.AggregateOverloadExists(opt.ArrayAggOp, typ) {
This seems like a thing we can remove since we now do have overloads for array_agg
for all (or at least most) types, but let me know if we should keep this check, just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/array
line 121 at r2 (raw file):
{1,2,1} # This query works in local config but can fail in distributed config since the
could we also add a test for a non-local mode to make sure there is still a proper error message in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @rafiss)
pkg/sql/logictest/testdata/logic_test/array
line 121 at r2 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
could we also add a test for a non-local mode to make sure there is still a proper error message in those cases?
Good point, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the nesting is deeper than one level?
Ex:
WITH
foo AS (SELECT array_agg(x) FROM generate_series(1, 3) g(x)),
bar AS (SELECT array_agg(y) FROM (SELECT foo.* FROM foo, generate_series(1, 3)) v(y))
SELECT array_agg(z) FROM (SELECT bar.* FROM bar, generate_series(1, 3)) v(z);
Reviewed 7 of 7 files at r1, 11 of 14 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @yuzefovich)
pkg/sql/opt/optbuilder/scalar.go
line 154 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This seems like a thing we can remove since we now do have overloads for
array_agg
for all (or at least most) types, but let me know if we should keep this check, just to be safe.
If we don't support nesting deeper than one level, we'd better still check for that case.
This commit extends `array_agg` aggregate function to support arrays as their input. The result of such aggregation is a nested array, and the existing implementation simply works for that case. The main complication of this change is that we currently don't support multidimensional arrays in all cases, so previously we would disallow it in a few places (like when creating a command result with a nested array type). This commit had to remove that limitation since it was too broad. There are still things that we don't support (e.g. in CREATE TABLE context, or when the nested array needs to be serialized (we don't have value encoding for it)), so in those cases we will keep on returning errors. Release note (sql change): `array_agg` aggregate function can now support arrays as the input. Note, however, that CockroachDB still doesn't have full support of nested arrays (that is tracked by cockroachdb#32552), and `array_agg` doesn't support nested arrays as inputs either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, good callout - we return an error because we cannot find the array_agg
overload that takes in two-dimensional array as the input. Added a test and adjusted the release note to mention this.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @rafiss)
pkg/sql/opt/optbuilder/scalar.go
line 154 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
If we don't support nesting deeper than one level, we'd better still check for that case.
Ack, reverted this deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 14 files at r2, 1 of 1 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rafiss)
Thanks for the reviews! bors r+ |
Build succeeded: |
This commit extends
array_agg
aggregate function to support arrays as their input. The result of such aggregation is a nested array, and the existing implementation simply works for that case.The main complication of this change is that we currently don't support multidimensional arrays in all cases, so previously we would disallow it in a few places (like when creating a command result with a nested array type). This commit had to remove that limitation since it was too broad. There are still things that we don't support (e.g. in CREATE TABLE context, or when the nested array needs to be serialized (we don't have value encoding for it)), so in those cases we will keep on returning errors.
Fixes: #117756.
Release note (sql change):
array_agg
aggregate function can now support arrays as the input. Note, however, that CockroachDB still doesn't have full support of nested arrays (that is tracked by #32552), andarray_agg
doesn't support nested arrays as inputs either.