Skip to content

date/time expression audit follow-ups (from #4448) #4502

@andygrove

Description

@andygrove

Tracking issue for follow-up work surfaced by the date/time expression audit in #4448. Each item below is either a support-level / serde consistency fix that the audit deliberately deferred (because it needs a semantics decision rather than a mechanical edit), or a divergence the audit documented but did not file separately. Findings already covered by an issue filed during the PR (#4449 ANSI next_day, #4450 next_day trim, #4451 ANSI make_date, #3180 Hour/Minute/Second TimestampNTZ, #2013 legacy timezone forms, #2649 non-UTC date_trunc, #16594 / #16577 from_unixtime upstream) are intentionally excluded.

High priority

1. CometUnixTimestamp: getUnsupportedReasons and isSupportedInputType disagree on TimestampNTZType

spark/src/main/scala/org/apache/comet/serde/datetime.scala:309-362. getUnsupportedReasons claims TimestampNTZType is not supported because Comet incorrectly applies timezone conversion to TimestampNTZ values, but isSupportedInputType returns true for TimestampNTZType and getSupportLevel reports Compatible() for it. Either the predicate or the reason is wrong. Per the audit-comet-expression skill, the EXPLAIN-time reason must match the runtime behaviour: if the native path really does apply session-timezone conversion to TIMESTAMP_NTZ (as the documented divergence on unix_timestamp in spark_expressions_support.md suggests), the support level for that input type should be Incompatible(Some(...)) and cross-reference #3180; if it does not, the reason text should be removed.

2. CometDateFormat: gate the Compatible decision in getSupportLevel, not convert

spark/src/main/scala/org/apache/comet/serde/datetime.scala:609-700. getSupportLevel returns Compatible() unconditionally, while convert reads CometConf.isExprAllowIncompat(...) and branches on UTC vs non-UTC session timezone, on whether the format is a whitelisted literal, and on whether the codegen dispatcher is enabled. The audit-comet-expression skill (rule 10) requires expression-shape restrictions to be declared in getSupportLevel so EXPLAIN, the auto-generated compatibility doc, and the dispatcher can all see them. Moving this gating changes the dispatch flow slightly (non-UTC + whitelisted format becomes Incompatible rather than silently routed through the codegen dispatcher) and needs a separate review.

3. Group A codegen-dispatched datetime expressions report Compatible while silently falling back when spark.comet.exec.scalaUDF.codegen.enabled=false

For the codegen-dispatched datetime expressions (notably to_unix_timestamp and date_format per the documented divergence in docs/source/contributor-guide/spark_expressions_support.md), getSupportLevel returns Compatible() while convert returns None when the Scala UDF codegen dispatcher is disabled. The dispatcher flag is not surfaced in EXPLAIN or in the auto-generated compatibility guide, so users see Compatible while the operator actually falls back to Spark. Per the audit skill, this should either be reported as Incompatible(Some(\"requires spark.comet.exec.scalaUDF.codegen.enabled=true\")) or the dispatcher requirement should be surfaced in getCompatibleNotes.

4. CometMakeTimestamp does not honour spark.sql.ansi.enabled

MakeTimestamp honours spark.sql.ansi.enabled (throws on invalid (year, month, day, hour, min, sec[, timezone]), else NULL) in Spark 3.4 / 3.5 / 4.0. The Comet wiring does not check failOnError, so under ANSI mode Comet returns NULL where Spark would throw ansiDateTimeArgumentOutOfRange / invalidFractionOfSecondError. Same shape as the make_date ANSI divergence captured in #4451 / make_date_ansi.sql. Capture an analogous make_timestamp_ansi.sql regression test once a fix lands, and gate the support level appropriately in the meantime.

Medium priority

5. Spark 4.0 StringTypeWithCollation not gated for datetime expressions

The audit documents (in docs/source/contributor-guide/spark_expressions_support.md) that Spark 4.0 widens many string-typed inputTypes on datetime expressions (convert_timezone, date_format, date_trunc, from_unixtime, make_timestamp, next_day, to_unix_timestamp, trunc, unix_timestamp) to StringTypeWithCollation(supportsTrimCollation = true). Per the audit skill (rule 11), non-default collations should flip the support level to Incompatible(Some(...)). Today the serdes accept those inputs without distinguishing collation, so non-default collations are silently treated as compatible. Cross-references #2190 and #4496.

6. CometDateDiff non-wrapping i32 - panics in debug builds on extreme inputs

Documented divergence (line 30 of the PR diff): the native date_diff impl subtracts with i32 -, which Rust panics on in debug builds when the result overflows; Spark wraps. Practically unreachable for DateType inputs (the valid range is far smaller than i32::MAX), but it is an unmasked panic path. Either switch to wrapping_sub to match Spark, or document that release builds wrap (they do) and add a debug-assert pre-check.

Surfaced by the audit-comet-expression skill run in #4448.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions