Skip to content

fix: correct array_append return type and mark as Compatible#3795

Merged
comphead merged 4 commits intoapache:mainfrom
andygrove:fix-array-append-compat
Mar 27, 2026
Merged

fix: correct array_append return type and mark as Compatible#3795
comphead merged 4 commits intoapache:mainfrom
andygrove:fix-array-append-compat

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Mar 25, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

CometArrayAppend had two bugs:

  1. Runtime assertion failure for literal arrays. DataFusion's array_append always returns a list with nullable: true on the element field. But when the input array has non-null elements (e.g. array(1, 2, 3)), Spark's ArrayAppend.dataType returns ArrayType(IntegerType, containsNull = false). The serde code was passing expr.dataType as the "promised" return type to DataFusion, which caused a runtime assertion failure:

    Assertion failed: result_data_type == *expected_type: Function 'array_append' returned value of
    type 'List(Field { data_type: Int32, nullable: true })' while the following type was promised at
    planning time and expected: 'List(Field { data_type: Int32 })'
    
  2. Incorrect Incompatible classification. CometArrayAppend was marked Incompatible(None) with no explanation, which disabled it by default. The CaseWhen(IsNotNull(arr), array_append(arr, elem), null) wrapper already handles the only genuine incompatibility (DataFusion's array_append does not preserve null top-level array rows on its own), so the expression matches Spark's behavior fully.

What changes are included in this PR?

  • arrays.scala: Use ArrayType(elementType, containsNull = true) as the promised return type for array_append, matching what DataFusion actually returns. Change getSupportLevel from Incompatible to Compatible.
  • array_append.sql: Remove spark.comet.expression.ArrayAppend.allowIncompatible=true (no longer needed now that it's Compatible). Add comment explaining why ArrayInsert.allowIncompatible=true is still needed on Spark 4.0 (where array_append is a RuntimeReplaceable that rewrites to array_insert(-1)).
  • expressions.md: Mark ArrayAppend as Spark-compatible.

How are these changes tested?

Existing SQL file test expressions/array/array_append.sql covers column inputs, literal inputs, NULL arrays, and NULL elements across both dictionary and non-dictionary Parquet. The literal-only query that previously triggered the assertion failure now passes.

@andygrove andygrove marked this pull request as draft March 25, 2026 21:01
…usion

DataFusion's array_append always returns a list with nullable elements
(nullable: true on the inner field), but Spark's ArrayAppend.dataType
can have containsNull = false when the input array has non-null elements
(e.g. array(1, 2, 3)). This caused a runtime assertion failure when the
promised type did not match the actual DataFusion output type.

Fixes the literal-only query:
  SELECT array_append(array(1, 2, 3), 4), ...
@andygrove andygrove marked this pull request as ready for review March 25, 2026 23:56
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@comphead comphead merged commit 77bd8e0 into apache:main Mar 27, 2026
150 of 151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants