Tracking issue for follow-up work surfaced by the array expression audit in #4483. Each item below is either a support-level / serde correctness fix that the audit deliberately deferred, or a coverage gap the audit documented but did not implement. Already-filed gaps (#4481 NaN/signed-zero canonicalization, #4482 array_max/array_min NaN ordering, #3178 array_join null handling, #2190 collation, #3338 empty-array coerce_types) are not duplicated here.
High priority
1. Lift convert-time withInfo fallbacks in arrays.scala into getSupportLevel
Several array serdes bail out from convert with withInfo(...) + return None for cases that are knowable from the expression alone:
CometArrayRemove (spark/src/main/scala/org/apache/comet/serde/arrays.scala:40-55) rejects unsupported child types in convert.
CometArrayExcept (arrays.scala:357-374) rejects BinaryType / StructType element types in convert rather than via getSupportLevel.
CometArrayPosition (arrays.scala:701-727) rejects all-foldable args and unsupported element types in convert.
CometElementAt (arrays.scala:565-598) rejects non-array input in convert; the MapType rejection in particular belongs in getSupportLevel.
CometFlatten (arrays.scala:603-617) rejects unsupported child element types in convert.
CometArrayReverse (arrays.scala:545-556) rejects unsupported child types in convert even though it already has a getSupportLevel.
CometSortArray (arrays.scala:172-182) rejects non-Literal ascendingOrder in convert. Spark 4.0+ widens ascendingOrder to any foldable boolean, so this convert-time rejection is the new fallback path; it should be declared as Unsupported(Some(...)) in getSupportLevel so EXPLAIN surfaces the reason.
The audit-comet-expression skill (rule 10) requires expression-shape restrictions to be declared as Unsupported(Some(reason)) / Incompatible(Some(reason)) branches in getSupportLevel so EXPLAIN surfaces the reason at planning time, the auto-generated compatibility doc picks them up, and the dispatcher can route around them. The audit doc for array_position, array_remove, element_at, flatten, and sort_array explicitly notes these need to be lifted.
2. Documented NaN / canonicalization divergences should flip support level to Incompatible
The audit doc flags Known limitation / Known divergence sub-bullets on several entries, but the corresponding Comet serdes still report Compatible (skill rule 12 requires this to flip the support level):
These should either gain a CometExpressionSerde wrapper with Incompatible(Some(reason)) keyed on float/double element types, or the CometScalarFunction constructor should accept an optional support-level callback.
3. Dead serde registrations after Spark 4.0 rewrites
Skill rule 13 says unreachable serde mappings should be relabeled or removed so the dispatcher map reflects reality:
CometArrayAppend (arrays.scala:58-107, registered at QueryPlanSerde.scala:52): unreachable on Spark 4.0+ because ArrayAppend is now RuntimeReplaceable and rewritten to ArrayInsert(arr, Literal(-1), elem). The audit doc confirms dispatch flows through CometArrayInsert on 4.0+. Either drop the 4.0+ registration through the shim or mark it as 3.x-only.
CometArrayCompact (arrays.scala:311-333, registered at QueryPlanSerde.scala:53): ArrayCompact is always RuntimeReplaceable (rewritten to ArrayFilter(arr, IsNotNull(...)) in all supported Spark versions), so the direct classOf[ArrayCompact] -> CometArrayCompact mapping is never hit at convert time. The actual dispatch path is CometArrayFilter.convert -> CometArrayCompact.convert (arrays.scala:625-637). The direct registration should be removed.
Medium priority
4. CometSortArray Spark 4.0+ foldable-boolean coverage
Spark 4.0 widens SortArray.ascendingOrder from a Literal(_: Boolean, BooleanType) to any foldable boolean. CometSortArray (arrays.scala:172-182) still pattern-matches only Literal, so sort_array(arr, NOT some_const) and similar shapes silently fall back on 4.0+. Constant-folding catches most cases but not all. Either evaluate the foldable expression at convert time (preferred) or surface the limitation in getSupportLevel.
5. array_union result ordering vs DataFusion is unverified
The audit doc for array_union notes "Result ordering versus DataFusion is also unverified; compare the array_intersect ordering caveat." CometArrayUnion (arrays.scala:454-466) is currently Compatible. Verify whether DataFusion's array_union preserves Spark's left-first-then-new-right-elements ordering. If it does not, raise the support level to Incompatible with a documented reason, matching the array_intersect treatment.
6. CometArrayExcept data-type guard should be in getSupportLevel
CometArrayExcept.isTypeSupported (arrays.scala:342-355) explicitly rejects BinaryType and StructType element types, but the rejection happens in convert (arrays.scala:361-367). Move the check into a getSupportLevel branch so EXPLAIN can report it and the compatibility doc reflects the gap. (Same pattern as item 1.)
7. CometArrayJoin Spark 4.0+ collation coverage
Audit doc notes Spark 4.0 widens ArrayJoin.inputTypes to AbstractArrayType(StringTypeWithCollation(supportsTrimCollation = true)) and that non-binary collations are not propagated. Already tracked under #2190 (umbrella) but worth wiring an explicit hasNonDefaultStringCollation guard in CometArrayJoin.getSupportLevel so the fallback is visible in EXPLAIN, matching the pattern CometArrayIntersect already follows (arrays.scala:207-213).
Surfaced by the audit-comet-expression skill run in #4483.
Tracking issue for follow-up work surfaced by the array expression audit in #4483. Each item below is either a support-level / serde correctness fix that the audit deliberately deferred, or a coverage gap the audit documented but did not implement. Already-filed gaps (#4481 NaN/signed-zero canonicalization, #4482 array_max/array_min NaN ordering, #3178 array_join null handling, #2190 collation, #3338 empty-array coerce_types) are not duplicated here.
High priority
1. Lift
convert-timewithInfofallbacks inarrays.scalaintogetSupportLevelSeveral array serdes bail out from
convertwithwithInfo(...) + return Nonefor cases that are knowable from the expression alone:CometArrayRemove(spark/src/main/scala/org/apache/comet/serde/arrays.scala:40-55) rejects unsupported child types inconvert.CometArrayExcept(arrays.scala:357-374) rejectsBinaryType/StructTypeelement types inconvertrather than viagetSupportLevel.CometArrayPosition(arrays.scala:701-727) rejects all-foldable args and unsupported element types inconvert.CometElementAt(arrays.scala:565-598) rejects non-array input inconvert; theMapTyperejection in particular belongs ingetSupportLevel.CometFlatten(arrays.scala:603-617) rejects unsupported child element types inconvert.CometArrayReverse(arrays.scala:545-556) rejects unsupported child types inconverteven though it already has agetSupportLevel.CometSortArray(arrays.scala:172-182) rejects non-LiteralascendingOrderinconvert. Spark 4.0+ widensascendingOrderto any foldable boolean, so this convert-time rejection is the new fallback path; it should be declared asUnsupported(Some(...))ingetSupportLevelso EXPLAIN surfaces the reason.The
audit-comet-expressionskill (rule 10) requires expression-shape restrictions to be declared asUnsupported(Some(reason))/Incompatible(Some(reason))branches ingetSupportLevelso EXPLAIN surfaces the reason at planning time, the auto-generated compatibility doc picks them up, and the dispatcher can route around them. The audit doc forarray_position,array_remove,element_at,flatten, andsort_arrayexplicitly notes these need to be lifted.2. Documented NaN / canonicalization divergences should flip support level to
IncompatibleThe audit doc flags
Known limitation/Known divergencesub-bullets on several entries, but the corresponding Comet serdes still reportCompatible(skill rule 12 requires this to flip the support level):CometArrayContains(arrays.scala:109-120): registered with nogetSupportLeveloverride (defaults toCompatible); audit doc flags the NaN-canonicalization gap ([Bug] array_distinct / array_union / array_except do not canonicalize NaN like Spark #4481).array_distinct/array_union/array_max/array_min: registered as bareCometScalarFunction(...)inQueryPlanSerde.scala:55-83, so they default toCompatible. Audit doc flags NaN / signed-zero divergences ([Bug] array_distinct / array_union / array_except do not canonicalize NaN like Spark #4481, [Bug] array_max and array_min disagree with Spark on NaN ordering #4482).These should either gain a
CometExpressionSerdewrapper withIncompatible(Some(reason))keyed on float/double element types, or theCometScalarFunctionconstructor should accept an optional support-level callback.3. Dead serde registrations after Spark 4.0 rewrites
Skill rule 13 says unreachable serde mappings should be relabeled or removed so the dispatcher map reflects reality:
CometArrayAppend(arrays.scala:58-107, registered atQueryPlanSerde.scala:52): unreachable on Spark 4.0+ becauseArrayAppendis nowRuntimeReplaceableand rewritten toArrayInsert(arr, Literal(-1), elem). The audit doc confirms dispatch flows throughCometArrayInserton 4.0+. Either drop the 4.0+ registration through the shim or mark it as 3.x-only.CometArrayCompact(arrays.scala:311-333, registered atQueryPlanSerde.scala:53):ArrayCompactis alwaysRuntimeReplaceable(rewritten toArrayFilter(arr, IsNotNull(...))in all supported Spark versions), so the directclassOf[ArrayCompact] -> CometArrayCompactmapping is never hit at convert time. The actual dispatch path isCometArrayFilter.convert -> CometArrayCompact.convert(arrays.scala:625-637). The direct registration should be removed.Medium priority
4.
CometSortArraySpark 4.0+ foldable-boolean coverageSpark 4.0 widens
SortArray.ascendingOrderfrom aLiteral(_: Boolean, BooleanType)to any foldable boolean.CometSortArray(arrays.scala:172-182) still pattern-matches onlyLiteral, sosort_array(arr, NOT some_const)and similar shapes silently fall back on 4.0+. Constant-folding catches most cases but not all. Either evaluate the foldable expression at convert time (preferred) or surface the limitation ingetSupportLevel.5.
array_unionresult ordering vs DataFusion is unverifiedThe audit doc for
array_unionnotes "Result ordering versus DataFusion is also unverified; compare thearray_intersectordering caveat."CometArrayUnion(arrays.scala:454-466) is currentlyCompatible. Verify whether DataFusion'sarray_unionpreserves Spark's left-first-then-new-right-elements ordering. If it does not, raise the support level toIncompatiblewith a documented reason, matching thearray_intersecttreatment.6.
CometArrayExceptdata-type guard should be ingetSupportLevelCometArrayExcept.isTypeSupported(arrays.scala:342-355) explicitly rejectsBinaryTypeandStructTypeelement types, but the rejection happens inconvert(arrays.scala:361-367). Move the check into agetSupportLevelbranch so EXPLAIN can report it and the compatibility doc reflects the gap. (Same pattern as item 1.)7.
CometArrayJoinSpark 4.0+ collation coverageAudit doc notes Spark 4.0 widens
ArrayJoin.inputTypestoAbstractArrayType(StringTypeWithCollation(supportsTrimCollation = true))and that non-binary collations are not propagated. Already tracked under #2190 (umbrella) but worth wiring an explicithasNonDefaultStringCollationguard inCometArrayJoin.getSupportLevelso the fallback is visible in EXPLAIN, matching the patternCometArrayIntersectalready follows (arrays.scala:207-213).Surfaced by the
audit-comet-expressionskill run in #4483.