Skip to content

[Feature] Support Spark expression: timestamp_add_interval #3114

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark timestamp_add_interval function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The TimestampAddInterval expression adds a time interval to a timestamp value. It supports both day-time intervals (containing days, hours, minutes, seconds) and calendar intervals (containing months, days, and microseconds), with proper timezone handling for accurate temporal arithmetic.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

timestamp_column + INTERVAL '1' DAY
timestamp_column + INTERVAL '1-2' YEAR TO MONTH
// DataFrame API usage
df.select(col("timestamp_col") + expr("INTERVAL '1' DAY"))

Arguments:

Argument Type Description
start Expression The base timestamp value (TimestampType or TimestampNTZType)
interval Expression The interval to add (CalendarIntervalType or DayTimeIntervalType)
timeZoneId Option[String] Optional timezone identifier for timezone-aware operations

Return Type: Returns the same data type as the input start expression (either TimestampType or TimestampNTZType).

Supported Data Types:

  • Input timestamp: AnyTimestampType (both TimestampType and TimestampNTZType)
  • Input interval: CalendarIntervalType or DayTimeIntervalType

Edge Cases:

  • Null handling: Returns null if either the timestamp or interval input is null (nullIntolerant = true)
  • Timezone transitions: Properly handles daylight saving time transitions and timezone offset changes
  • Calendar arithmetic: Month additions handle variable month lengths (e.g., adding 1 month to Jan 31 may result in Feb 28/29)
  • Overflow behavior: May produce invalid results for extremely large interval values that exceed timestamp boundaries

Examples:

-- Add 1 day to a timestamp
SELECT timestamp_col + INTERVAL '1' DAY FROM events;

-- Add 2 months and 15 days
SELECT timestamp_col + INTERVAL '2-0' YEAR TO MONTH + INTERVAL '15' DAY FROM events;

-- Add precise time intervals
SELECT timestamp_col + INTERVAL '1 2:30:45.123' DAY TO SECOND FROM events;
// DataFrame API usage
import org.apache.spark.sql.functions._

// Add interval using SQL expression
df.select(col("timestamp_col") + expr("INTERVAL '1' DAY"))

// Using interval functions
df.select(col("timestamp_col") + expr("make_interval(0, 1, 0, 0, 0, 0)"))

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.TimestampAddInterval

Related:

  • TimestampDiff - Calculate difference between timestamps
  • DateAdd - Add days to date values
  • AddMonths - Add months to date/timestamp values
  • IntervalExpression - Create interval literals

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions