Skip to content

fix: cardinality returns incorrect results for ragged nested arrays#23271

Open
lyne7-sc wants to merge 4 commits into
apache:mainfrom
lyne7-sc:fix/cardinality
Open

fix: cardinality returns incorrect results for ragged nested arrays#23271
lyne7-sc wants to merge 4 commits into
apache:mainfrom
lyne7-sc:fix/cardinality

Conversation

@lyne7-sc

@lyne7-sc lyne7-sc commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

cardinality used compute_array_dims and multiplied the inferred dimensions to compute nested array cardinality. That only works for rectangular nested arrays, where every nested list has the same shape.

For ragged nested arrays, compute_array_dims follows the first nested value shape and can return dimensions that do not describe the actual number of leaf elements. As a result, cardinality can return incorrect results for valid nested arrays.

Examples:

select cardinality([[1], [2, 3]]);
-- before: 2
-- after:  3

select cardinality([[1, 2, 3], []]);
-- before: 6
-- after:  3

select cardinality([[], [1, 2]]);
-- before: 0
-- after:  2

What changes are included in this PR?

This PR changes list cardinality computation to recursively count actual leaf elements instead of multiplying inferred dimensions.

Are these changes tested?

Yes, added sqllogictest coverage.

Are there any user-facing changes?

Bug fix only.

Additional discussion

The root cause is that cardinality() used compute_array_dims() and multiplies the returned dimensions.

compute_array_dims() itself follows the first child array when computing nested dimensions. This behavior may also need a separate discussion for ragged nested arrays, because such arrays do not have a single rectangular dimension vector. For example, array_dims([[1], [2, 3]]) currently returns [2, 1].

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jul 1, 2026

@nuno-faria nuno-faria left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lyne7-sc, LGTM.

For example, array_dims([[1], [2, 3]]) currently returns [2, 1].

Yeah array_dims does not make sense without fixed lengths. Maybe it could return an error in the future, like array_add:

> select array_add([1, 2], [3]);
Execution error: array_add requires both list inputs to have the same length per row, got 2 and 1 at row 0

@lyne7-sc

lyne7-sc commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @nuno-faria for the review!

Maybe it could return an error in the future

That makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cardinality returns incorrect results for ragged nested arrays

2 participants