Skip to content

fix: rewrite concat(array, ...) to array_concat#21689

Open
hcrosse wants to merge 3 commits intoapache:mainfrom
hcrosse:fix/concat-array-rewrite
Open

fix: rewrite concat(array, ...) to array_concat#21689
hcrosse wants to merge 3 commits intoapache:mainfrom
hcrosse:fix/concat-array-rewrite

Conversation

@hcrosse
Copy link
Copy Markdown
Contributor

@hcrosse hcrosse commented Apr 17, 2026

Which issue does this PR close?

Rationale for this change

concat(array, array, ...) in SQL dispatches to the string concat UDF, which only handles string and binary types. With array arguments the call is coerced to a string form and concatenated textually, so concat([1,2,3], [4,5]) returns [1,2,3][4,5] instead of [1,2,3,4,5]. We want concat to work on arrays rather than rejecting the call as a type error, to match DuckDB's behavior.

Two earlier attempts were rejected. #18137 changed ConcatFunc's signature to accept arrays and broke simplify_expressions. #18105 duplicated array_concat logic inside ConcatFunc.

This PR rewrites concat(array, ...) to array_concat(array, ...) at the analyzer phase. Every logical plan gets the corrected behavior regardless of frontend, the string concat signature stays untouched, and no array logic is duplicated.

What changes are included in this PR?

A new ConcatArrayRewrite FunctionRewrite lives in datafusion-functions-nested. It detects calls to ConcatFunc by identity check via Any::is::<ConcatFunc>, so user-level shadowing of concat such as Spark's variant is unaffected. When all args resolve to List, LargeList, or FixedSizeList it rewrites to array_concat_udf(). Mixed array and non-array returns a plan_err.

The rewrite is wired into SessionStateDefaults::default_function_rewrites() and registered on the analyzer in SessionStateBuilder::with_default_features(), which is the actual default-init path. It's also registered via FunctionRegistry::register_function_rewrite in functions_nested::register_all as a fallback for custom registries.

Known limitation: concat(List, LargeList) hits an existing array_concat coercion bug (#21702). FSL + List works.

Are these changes tested?

New SLT coverage in array/array_concat.slt:

  • 2- and 3-argument array concat, table-column concat, arrays with NULLs, string arrays
  • LargeList, FixedSizeList, and FSL + List mixed inputs
  • NULL::integer[] at either position, all-NULL case
  • Two error cases for mixed array and non-array

explain.slt is updated to reflect the new apply_function_rewrites line that now appears in EXPLAIN VERBOSE output.

Are there any user-facing changes?

concat(array, ...) now returns correct array_concat results instead of the prior wrong output. No public API changes.

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!

Can we add a brief note to the PR description for why we want concat to work on arrays to begin with (e.g., rather than rejecting it as an error). Is the reason DuckDB compatibility?

It's a bit unfortunate that we can't tighten the signature of concat to reject non-string arguments, but that seems non-trivial.

Seems like concat_ws has similar problems to what this PR is addressing for concat. Not sure offhand the best fix (reject it?), but maybe worth filing a separate issue.

[NULL]


## concat() delegates to array_concat when all arguments are arrays.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding tests for mixed list types? e.g.,

concat(List, LargeList) -> LargeList
concat(FSL, List) -> FSL

Copy link
Copy Markdown
Contributor Author

@hcrosse hcrosse Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I added FSL + List, but skipped List + LargeList, since array_concat itself errors on that on main. Filed #21702.

Comment thread datafusion/functions-nested/src/concat_rewrite.rs Outdated
- use array_concat expr_fn instead of hand-constructing ScalarFunction
- add FSL + List SLT case to cover mixed list-variant coercion
@Jefffrey
Copy link
Copy Markdown
Contributor

I think I had a similar thought in that we should try to solve this in the optimizer/simplify stage, but I believe one of the drivers for this was Spark compatibility. See this comment:

cc @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unexpected output for concat for arrays

3 participants