fix: rewrite concat(array, ...) to array_concat#21689
fix: rewrite concat(array, ...) to array_concat#21689hcrosse wants to merge 3 commits intoapache:mainfrom
Conversation
neilconway
left a comment
There was a problem hiding this comment.
Overall looks good!
Can we add a brief note to the PR description for why we want concat to work on arrays to begin with (e.g., rather than rejecting it as an error). Is the reason DuckDB compatibility?
It's a bit unfortunate that we can't tighten the signature of concat to reject non-string arguments, but that seems non-trivial.
Seems like concat_ws has similar problems to what this PR is addressing for concat. Not sure offhand the best fix (reject it?), but maybe worth filing a separate issue.
| [NULL] | ||
|
|
||
|
|
||
| ## concat() delegates to array_concat when all arguments are arrays. |
There was a problem hiding this comment.
Worth adding tests for mixed list types? e.g.,
concat(List, LargeList) -> LargeList
concat(FSL, List) -> FSL
There was a problem hiding this comment.
Good catch! I added FSL + List, but skipped List + LargeList, since array_concat itself errors on that on main. Filed #21702.
- use array_concat expr_fn instead of hand-constructing ScalarFunction - add FSL + List SLT case to cover mixed list-variant coercion
|
I think I had a similar thought in that we should try to solve this in the optimizer/simplify stage, but I believe one of the drivers for this was Spark compatibility. See this comment: cc @comphead |
Which issue does this PR close?
concatfor arrays #18020Rationale for this change
concat(array, array, ...)in SQL dispatches to the stringconcatUDF, which only handles string and binary types. With array arguments the call is coerced to a string form and concatenated textually, soconcat([1,2,3], [4,5])returns[1,2,3][4,5]instead of[1,2,3,4,5]. We wantconcatto work on arrays rather than rejecting the call as a type error, to match DuckDB's behavior.Two earlier attempts were rejected. #18137 changed
ConcatFunc's signature to accept arrays and brokesimplify_expressions. #18105 duplicatedarray_concatlogic insideConcatFunc.This PR rewrites
concat(array, ...)toarray_concat(array, ...)at the analyzer phase. Every logical plan gets the corrected behavior regardless of frontend, the stringconcatsignature stays untouched, and no array logic is duplicated.What changes are included in this PR?
A new
ConcatArrayRewriteFunctionRewritelives indatafusion-functions-nested. It detects calls toConcatFuncby identity check viaAny::is::<ConcatFunc>, so user-level shadowing ofconcatsuch as Spark's variant is unaffected. When all args resolve toList,LargeList, orFixedSizeListit rewrites toarray_concat_udf(). Mixed array and non-array returns aplan_err.The rewrite is wired into
SessionStateDefaults::default_function_rewrites()and registered on the analyzer inSessionStateBuilder::with_default_features(), which is the actual default-init path. It's also registered viaFunctionRegistry::register_function_rewriteinfunctions_nested::register_allas a fallback for custom registries.Known limitation:
concat(List, LargeList)hits an existingarray_concatcoercion bug (#21702). FSL + List works.Are these changes tested?
New SLT coverage in
array/array_concat.slt:NULL::integer[]at either position, all-NULL caseexplain.sltis updated to reflect the newapply_function_rewritesline that now appears inEXPLAIN VERBOSEoutput.Are there any user-facing changes?
concat(array, ...)now returns correctarray_concatresults instead of the prior wrong output. No public API changes.