Skip to content

[Feature] Support Spark expression: subtract_times #3139

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.

Comet does not currently support the Spark subtract_times function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The SubtractTimes expression calculates the day-time interval between two time values by subtracting the right operand from the left operand. It is implemented as a runtime-replaceable expression that delegates to the DateTimeUtils.subtractTimes method for the actual computation.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

Describe the potential solution

Spark Specification

Syntax:

time_expression1 - time_expression2

Arguments:

Argument Type Description
left Expression The left time expression (minuend)
right Expression The right time expression (subtrahend)

Return Type: Returns a DayTimeIntervalType with precision from HOUR to SECOND representing the interval between the two time values.

Supported Data Types:
This expression accepts any time-related data types as defined by AnyTimeType, which typically includes:

  • TimestampType
  • DateType
  • TimeType (if supported by the SQL dialect)

Edge Cases:

  • Null handling: Returns null if either the left or right expression evaluates to null (null-intolerant behavior)
  • Type compatibility: Both operands must be convertible to time types, otherwise compilation fails
  • Negative intervals: When the right operand represents a later time than the left operand, the result will be a negative interval
  • Precision: Results are limited to day-time intervals with hour-to-second precision, losing any sub-second precision beyond what the interval type supports

Examples:

-- Calculate time difference between timestamps
SELECT TIMESTAMP '2023-01-02 15:30:00' - TIMESTAMP '2023-01-01 10:15:30'
-- Returns: INTERVAL '1 05:14:30' DAY TO SECOND

-- Using with column references
SELECT end_time - start_time AS duration FROM events
// DataFrame API usage
import org.apache.spark.sql.functions._

df.select($"end_time" - $"start_time" as "duration")

// Using expr() for complex expressions
df.select(expr("end_time - start_time") as "time_diff")

Implementation Approach

See the Comet guide on adding new expressions for detailed instructions.

  1. Scala Serde: Add expression handler in spark/src/main/scala/org/apache/comet/serde/
  2. Register: Add to appropriate map in QueryPlanSerde.scala
  3. Protobuf: Add message type in native/proto/src/proto/expr.proto if needed
  4. Rust: Implement in native/spark-expr/src/ (check if DataFusion has built-in support first)

Additional context

Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.SubtractTimes

Related:

  • DateAdd - Adding intervals to dates/timestamps
  • DateSub - Subtracting intervals from dates/timestamps
  • DayTimeIntervalType - The return type for day-time intervals
  • DateTimeUtils - Utility class containing the underlying subtraction logic

This issue was auto-generated from Spark reference documentation.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions