Skip to content

perf: Short-circuit operator serde [experimental]#3992

Draft
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:short-circuit-serde-checks-3990
Draft

perf: Short-circuit operator serde [experimental]#3992
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:short-circuit-serde-checks-3990

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 18, 2026

Which issue does this PR close?

Closes #3990

Follows on from #3989

Rationale for this change

Reduce planning overhead during AQE.

Once an operator has been tagged with fallback reasons, there is no point in running the compatibility checks again for that operator. This avoids repeatedly recursively checking plans during AQE.

What changes are included in this PR?

How are these changes tested?

Replace the separate STAGE_FALLBACK_TAG with explain-info-based stickiness.
Shuffle path checks (`nativeShuffleFailureReasons`, `columnarShuffleFailureReasons`)
are now pure and return reasons instead of tagging eagerly. A new
`shuffleSupported` coordinator short-circuits on `hasExplainInfo`, tries native
then columnar, and tags via `withInfos` only on total failure. DPP fallback,
which disqualifies both paths, moves into the coordinator. This removes the
need for `CometFallback` and eliminates the semantic split where `withInfo`
could fire for a path-specific failure while the node still converted via a
different path.
…llup

Expand the doc comments on withInfo/withInfos/hasExplainInfo to make clear
that these record fallback reasons surfaced in extended explain output,
and that any call to withInfo is a signal that the node falls back to
Spark.

Also restore the child-expression rollup for native range-partitioning
sort orders that was lost in the earlier refactor: when exprToProto fails
on a sort-order expression, its own fallback reasons (e.g. strict
floating-point sort) are now copied onto the shuffle's reasons so they
surface alongside 'unsupported range partitioning sort order'.
Record the handler class name on operators whose getSupportLevel returned
Unsupported or Incompatible (without allowIncompat), so that later AQE
stage-prep passes skip re-running the same check. The tag is keyed per
handler, so other handlers on the same node (e.g. CometSparkToColumnarExec
on a scan tagged by CometScanRule) are unaffected.

Closes apache#3990
@andygrove andygrove force-pushed the short-circuit-serde-checks-3990 branch from 37af002 to b49942c Compare April 18, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Short-circuit serde compatibility checks when a node is already tagged as falling back

1 participant