-
Notifications
You must be signed in to change notification settings - Fork 272
Description
What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark subtract_times function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The SubtractTimes expression calculates the day-time interval between two time values by subtracting the right operand from the left operand. It is implemented as a runtime-replaceable expression that delegates to the DateTimeUtils.subtractTimes method for the actual computation.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
time_expression1 - time_expression2Arguments:
| Argument | Type | Description |
|---|---|---|
| left | Expression | The left time expression (minuend) |
| right | Expression | The right time expression (subtrahend) |
Return Type: Returns a DayTimeIntervalType with precision from HOUR to SECOND representing the interval between the two time values.
Supported Data Types:
This expression accepts any time-related data types as defined by AnyTimeType, which typically includes:
- TimestampType
- DateType
- TimeType (if supported by the SQL dialect)
Edge Cases:
- Null handling: Returns null if either the left or right expression evaluates to null (null-intolerant behavior)
- Type compatibility: Both operands must be convertible to time types, otherwise compilation fails
- Negative intervals: When the right operand represents a later time than the left operand, the result will be a negative interval
- Precision: Results are limited to day-time intervals with hour-to-second precision, losing any sub-second precision beyond what the interval type supports
Examples:
-- Calculate time difference between timestamps
SELECT TIMESTAMP '2023-01-02 15:30:00' - TIMESTAMP '2023-01-01 10:15:30'
-- Returns: INTERVAL '1 05:14:30' DAY TO SECOND
-- Using with column references
SELECT end_time - start_time AS duration FROM events// DataFrame API usage
import org.apache.spark.sql.functions._
df.select($"end_time" - $"start_time" as "duration")
// Using expr() for complex expressions
df.select(expr("end_time - start_time") as "time_diff")Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/ - Register: Add to appropriate map in
QueryPlanSerde.scala - Protobuf: Add message type in
native/proto/src/proto/expr.protoif needed - Rust: Implement in
native/spark-expr/src/(check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.SubtractTimes
Related:
DateAdd- Adding intervals to dates/timestampsDateSub- Subtracting intervals from dates/timestampsDayTimeIntervalType- The return type for day-time intervalsDateTimeUtils- Utility class containing the underlying subtraction logic
This issue was auto-generated from Spark reference documentation.