Skip to content

feat: add informational message channel distinct from fallback reasons#4509

Open
andygrove wants to merge 5 commits into
apache:mainfrom
andygrove:info-message-channel
Open

feat: add informational message channel distinct from fallback reasons#4509
andygrove wants to merge 5 commits into
apache:mainfrom
andygrove:info-message-channel

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4006.

Depends on and is stacked on #4508 (the withInfo -> withFallbackReason rename). Because the two branches live on a fork, this PR targets main and therefore currently includes #4508's rename commit in its diff. Please review #4508 first; once it merges, rebase will reduce this PR to just the feature commits below.

Rationale for this change

Comet only had one way to tag a plan node with a message, and that message always meant "this node falls back to Spark". There was no way to attach a purely informational note that does not trigger fallback. This is increasingly useful with codegen dispatch: when Comet runs a JVM implementation of an expression even though a faster native implementation exists behind a config, we want to tell the user about the faster path without that note being treated as a fallback.

What changes are included in this PR?

  • A new informational channel, parallel to the fallback channel freed up by refactor: rename withInfo to withFallbackReason for clarity #4508:
    • CometSparkSessionExtensions.withInfo(node, message) records a message on a new CometExplainInfo.EXTENSION_INFO tag. It does not cause fallback: no planning rule reads this tag.
    • Verbose extended explain renders these as a distinct [COMET-INFO: ...] segment, in addition to any [COMET: ...] fallback segment on the same node. The fallback explain list format is unchanged and still excludes info messages.
  • Expression-level info messages are lifted onto the converted operator node in CometExecRule.convertToComet (a single central rollup, applied to all native operators), because verbose explain only traverses plan nodes, not expressions.
  • First consumer: CometDateFormat emits a [COMET-INFO: ...] hint when a natively-supported format is requested but native execution is gated off (non-UTC session timezone with allowIncompatible disabled), so Comet runs the JVM codegen path. The hint names the exact config key to enable the faster native path.

Known limitation for future work: the Spark 4.x CometExprShim node reconstruction copies FALLBACK_REASONS but not EXTENSION_INFO onto the wrapping Invoke. No current code path routes withInfo through those shims, so this is latent. It can be addressed if a future serde tags one of those reconstructed nodes.

How are these changes tested?

New tests in CometExpressionSuite:

  • withInfo does not set a fallback reason and renders as [COMET-INFO: ...] in verbose explain, and a second message accumulates rather than overwriting.
  • date_format takes the JVM codegen path under a non-UTC timezone and surfaces the [COMET-INFO: ...] hint naming the DateFormatClass.allowIncompatible config key.

The full CometExpressionSuite passes (125 succeeded), confirming the central convertToComet rollup does not regress operator conversion. scalastyle:check passes.

andygrove added 3 commits May 28, 2026 18:22
Rename withInfo/withInfos/hasExplainInfo and EXTENSION_INFO to
withFallbackReason/withFallbackReasons/hasFallbackReason and
FALLBACK_REASONS to match their actual semantics (fallback reasons,
not generic info). Also rename the private extensionInfo helper in
ExtendedExplainInfo to fallbackReasons, and update the TreeNodeTag
string from "CometExtensionInfo" to "CometFallbackReasons" so a
future PR can reuse the old string for a distinct tag.
…skip ci]

When date_format gets a natively-supported format string but the session
timezone is non-UTC and allowIncompatible is off, Comet takes the JVM
codegen path. Emit a COMET-INFO hint on the expression and lift
expression-level info messages onto the converted operator centrally in
CometExecRule, so verbose extended explain shows the faster native option
and how to enable it.
@andygrove andygrove marked this pull request as draft May 29, 2026 03:33
# Conflicts:
#	spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala
#	spark/src/main/scala/org/apache/comet/ExtendedExplainInfo.scala
#	spark/src/main/scala/org/apache/comet/serde/contraintExpressions.scala
#	spark/src/main/scala/org/apache/comet/serde/datetime.scala
#	spark/src/main/scala/org/apache/comet/serde/math.scala
#	spark/src/main/scala/org/apache/comet/serde/statics.scala
#	spark/src/main/scala/org/apache/comet/serde/strings.scala
#	spark/src/main/scala/org/apache/comet/serde/structs.scala
#	spark/src/main/scala/org/apache/comet/serde/unixtime.scala
@andygrove andygrove marked this pull request as ready for review May 30, 2026 04:28
Two issues surfaced once CI ran (the feature commits had skipped CI):

- datetime.scala: drop redundant `s` interpolators on two string literals in
  the date_format info hint (scalafix CHECK).
- tpcds q9 golden: the info-message change makes ExtendedExplainInfo omit an
  empty `[COMET: ]` for a bare Spark fallback Project, so the root renders as
  `Project`. Update the approved plan to match.
@andygrove andygrove force-pushed the info-message-channel branch from a5453a6 to 9050599 Compare May 30, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add distinction between "info" and "fallback" messages

1 participant