Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support array_agg with arrays as inputs #117838

Merged
merged 2 commits into from Jan 19, 2024

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jan 16, 2024

This commit extends array_agg aggregate function to support arrays as their input. The result of such aggregation is a nested array, and the existing implementation simply works for that case.

The main complication of this change is that we currently don't support multidimensional arrays in all cases, so previously we would disallow it in a few places (like when creating a command result with a nested array type). This commit had to remove that limitation since it was too broad. There are still things that we don't support (e.g. in CREATE TABLE context, or when the nested array needs to be serialized (we don't have value encoding for it)), so in those cases we will keep on returning errors.

Fixes: #117756.

Release note (sql change): array_agg aggregate function can now support arrays as the input. Note, however, that CockroachDB still doesn't have full support of nested arrays (that is tracked by #32552), and array_agg doesn't support nested arrays as inputs either.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit fixes a minor omission where `DistsqlBlocklist` property of
functions would be ignored when checking distributability of aggregate
functions. It appears that only `array_agg` with tuples as the input
argument could previously unexpectedly get distributed (however, it
seems to me that things still worked, so this commit leaves a TODO to
re-evaluate that restriction). The main driver for this change is the
following commit which introduces support of `array_agg` with arrays as
inputs, which we cannot distribute (because we don't have value encoding
for nested arrays at the moment, so we don't know how to send them over
the wire).

Release note: None
@yuzefovich yuzefovich marked this pull request as ready for review January 17, 2024 22:34
@yuzefovich yuzefovich requested review from a team as code owners January 17, 2024 22:34
@yuzefovich yuzefovich requested review from michae2 and DrewKimball and removed request for a team January 17, 2024 22:34
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @michae2)


pkg/sql/opt/optbuilder/scalar.go line 154 at r2 (raw file):

		// Thus, we reject a query that is correlated and over a type that we can't array_agg.
		typ := b.factory.Metadata().ColumnMeta(inCol).Type
		if !s.outerCols.Empty() && !memo.AggregateOverloadExists(opt.ArrayAggOp, typ) {

This seems like a thing we can remove since we now do have overloads for array_agg for all (or at least most) types, but let me know if we should keep this check, just to be safe.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/array line 121 at r2 (raw file):

{1,2,1}

# This query works in local config but can fail in distributed config since the

could we also add a test for a non-local mode to make sure there is still a proper error message in those cases?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @rafiss)


pkg/sql/logictest/testdata/logic_test/array line 121 at r2 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

could we also add a test for a non-local mode to make sure there is still a proper error message in those cases?

Good point, done.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the nesting is deeper than one level?
Ex:

WITH
  foo AS (SELECT array_agg(x) FROM generate_series(1, 3) g(x)),
  bar AS (SELECT array_agg(y) FROM (SELECT foo.* FROM foo, generate_series(1, 3)) v(y))
SELECT array_agg(z) FROM (SELECT bar.* FROM bar, generate_series(1, 3)) v(z);

Reviewed 7 of 7 files at r1, 11 of 14 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @rafiss, and @yuzefovich)


pkg/sql/opt/optbuilder/scalar.go line 154 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This seems like a thing we can remove since we now do have overloads for array_agg for all (or at least most) types, but let me know if we should keep this check, just to be safe.

If we don't support nesting deeper than one level, we'd better still check for that case.

This commit extends `array_agg` aggregate function to support arrays as
their input. The result of such aggregation is a nested array, and the
existing implementation simply works for that case.

The main complication of this change is that we currently don't support
multidimensional arrays in all cases, so previously we would disallow it
in a few places (like when creating a command result with a nested array
type). This commit had to remove that limitation since it was too broad.
There are still things that we don't support (e.g. in CREATE TABLE
context, or when the nested array needs to be serialized (we don't have
value encoding for it)), so in those cases we will keep on returning
errors.

Release note (sql change): `array_agg` aggregate function can now
support arrays as the input. Note, however, that CockroachDB still
doesn't have full support of nested arrays (that is tracked by cockroachdb#32552),
and `array_agg` doesn't support nested arrays as inputs either.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, good callout - we return an error because we cannot find the array_agg overload that takes in two-dimensional array as the input. Added a test and adjusted the release note to mention this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @rafiss)


pkg/sql/opt/optbuilder/scalar.go line 154 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

If we don't support nesting deeper than one level, we'd better still check for that case.

Ack, reverted this deletion.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 14 files at r2, 1 of 1 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @rafiss)

@yuzefovich
Copy link
Member Author

Thanks for the reviews!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 19, 2024

Build succeeded:

@craig craig bot merged commit 2a4e6b8 into cockroachdb:master Jan 19, 2024
9 of 10 checks passed
@yuzefovich yuzefovich deleted the array-agg branch January 19, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: support to array_agg() for arrays
4 participants