[SPARK-54507][SQL] Add time_bucket function for TIME type bucketing #53237
+780
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds a new
time_bucket()SQL function that buckets TIME values into fixed-width intervals, returning the start time of each bucket. This enables histogram generation and time-of-day pattern analysis for TIME columns.Why are the changes needed?
The TIME type currently lacks a bucketing function for aggregation and analysis. Users cannot easily group TIME values by arbitrary intervals (e.g., 15-minute or 1-hour buckets) without complex manual calculations.
Current Gap:
Existing functions don't support TIME bucketing:
window(): Only works with TIMESTAMP, not TIME. Returns a struct, not a scalar.date_trunc(): Doesn't support TIME typetime_trunc(): Only supports fixed calendar units (HOUR, MINUTE), not arbitrary intervals like "15 minutes" or "90 minutes"Current workarounds are error-prone, hard to maintain:
Use Cases:
This function addresses common real-world analytics needs:
Industry Precedent:
DATE_BUCKET()supports TIME type bucketingtime_bucket()is one of their most popular functions for time-series analyticsDoes this PR introduce any user-facing change?
Yes. This PR adds a new SQL function
time_bucket()available in SQL, Scala, Python, and Spark Connect.Function Signature
Parameters:
bucket_width: A day-time interval expression (e.g.,INTERVAL '15' MINUTE)time: A TIME value to bucketBehavior:
Examples
Example 1: Basic Bucketing
Example 2: Retail Analytics - Peak Shopping Hours
Example 3: Healthcare - Appointment Scheduling
Example 4: Edge Cases
Scala API
Python API
How was this patch tested?
Added tests in
TimeFunctionsSuiteBaseandsql-tests/inputs/time.sqlWas this patch authored or co-authored using generative AI tooling?
No