Skip to content

Add polymorphic window functions#18169

Merged
yashmayya merged 2 commits intoapache:masterfrom
yashmayya:polymorphic-window-functions
Apr 14, 2026
Merged

Add polymorphic window functions#18169
yashmayya merged 2 commits intoapache:masterfrom
yashmayya:polymorphic-window-functions

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

  • Resolve TODO in WindowValueAggregatorFactory by adding type-specific window value aggregator implementations
  • Add SumLongWindowValueAggregator for INT/LONG to avoid precision loss when summing large long values (> 2^53)
  • Add SumBigDecimalWindowValueAggregator for BIG_DECIMAL to preserve full decimal precision
  • Add primitive MinIntWindowValueAggregator / MaxIntWindowValueAggregator using fastutil IntArrayFIFOQueue
  • Add primitive MinLongWindowValueAggregator / MaxLongWindowValueAggregator using fastutil LongArrayFIFOQueue
  • Add MinComparableWindowValueAggregator / MaxComparableWindowValueAggregator as fallback for types like BIG_DECIMAL
  • Factory now dispatches based on columnDataType.getStoredType() instead of always using double-based aggregators
  • Add 56 unit tests covering factory dispatch, all new aggregators, null handling, removal support, and precision preservation

@yashmayya yashmayya requested a review from Copilot April 11, 2026 05:06
@yashmayya yashmayya added enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine window-functions Related to SQL window functions on the multi-stage query engine labels Apr 11, 2026
@yashmayya yashmayya force-pushed the polymorphic-window-functions branch from e81fbd0 to de1f0c3 Compare April 11, 2026 05:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds type-specific window value aggregators (and corresponding factory dispatch) so window aggregations preserve precision for integral and BIG_DECIMAL types, and reduces boxing for MIN/MAX on primitive types in pinot-query-runtime.

Changes:

  • Update WindowValueAggregatorFactory to dispatch SUM/MIN/MAX aggregators based on columnDataType.getStoredType().
  • Add new polymorphic implementations: SumLong*, SumBigDecimal*, primitive Min/Max(Int|Long)*, and Min/MaxComparable* fallbacks; rename existing double-based implementations for clarity.
  • Add a comprehensive WindowValueAggregatorTest suite covering factory dispatch, null/removal behavior, and precision.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/WindowValueAggregatorFactory.java Dispatch to type-specific SUM/MIN/MAX implementations using stored type.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/SumLongWindowValueAggregator.java New long-accumulating SUM to avoid double precision loss for large integral values.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/SumDoubleWindowValueAggregator.java Rename/refocus existing SUM implementation as double-based.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/SumBigDecimalWindowValueAggregator.java New BigDecimal SUM implementation intended to preserve decimal precision.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MinIntWindowValueAggregator.java New primitive INT MIN with fastutil deque for sliding windows.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MaxIntWindowValueAggregator.java New primitive INT MAX with fastutil deque for sliding windows.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MinLongWindowValueAggregator.java New primitive LONG MIN with fastutil deque (precision-safe vs double).
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MaxLongWindowValueAggregator.java New primitive LONG MAX with fastutil deque (precision-safe vs double).
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MinDoubleWindowValueAggregator.java Rename existing MIN implementation as double-based.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MaxDoubleWindowValueAggregator.java Rename existing MAX implementation as double-based.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MinComparableWindowValueAggregator.java New Comparable-based MIN fallback to preserve non-double types.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/MaxComparableWindowValueAggregator.java New Comparable-based MAX fallback to preserve non-double types.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/BoolAndWindowValueAggregator.java Rename class to match file name and window-aggregator naming.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/window/aggregate/BoolOrWindowValueAggregator.java Rename class to match file name and window-aggregator naming.
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/window/aggregate/WindowValueAggregatorTest.java New unit tests covering factory dispatch and aggregator behaviors/precision.

if (value instanceof BigDecimal) {
return (BigDecimal) value;
}
return BigDecimal.valueOf(((Number) value).doubleValue());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it makes sense, but practically this probably won't be reachable because the factory routes INT / LONG to SumLongWindowValueAggregator, so values arriving at SumBigDecimalWindowValueAggregator should already be BigDecimal instances. It's a low effort fix worth doing though so I've made it.

Comment on lines +470 to +475
@Test(expectedExceptions = UnsupportedOperationException.class)
public void testComparableMinRemovalUnsupported() {
WindowValueAggregator<Object> agg = new MinComparableWindowValueAggregator(false);
agg.addValue(1);
agg.removeValue(1);
}
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 87.98283% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.35%. Comparing base (d299729) to head (60a085a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...window/aggregate/MaxLongWindowValueAggregator.java 72.41% 4 Missing and 4 partials ⚠️
...window/aggregate/MinLongWindowValueAggregator.java 75.86% 4 Missing and 3 partials ⚠️
.../window/aggregate/MaxIntWindowValueAggregator.java 86.20% 1 Missing and 3 partials ⚠️
.../aggregate/MaxComparableWindowValueAggregator.java 92.30% 0 Missing and 2 partials ⚠️
.../aggregate/MinComparableWindowValueAggregator.java 92.30% 0 Missing and 2 partials ⚠️
.../window/aggregate/MinIntWindowValueAggregator.java 93.10% 0 Missing and 2 partials ⚠️
.../aggregate/SumBigDecimalWindowValueAggregator.java 91.30% 0 Missing and 2 partials ⚠️
...window/aggregate/SumLongWindowValueAggregator.java 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18169      +/-   ##
============================================
+ Coverage     63.31%   63.35%   +0.04%     
  Complexity     1627     1627              
============================================
  Files          3229     3237       +8     
  Lines        196707   196930     +223     
  Branches      30408    30444      +36     
============================================
+ Hits         124538   124773     +235     
+ Misses        62192    62159      -33     
- Partials       9977     9998      +21     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.30% <87.98%> (+8.03%) ⬆️
java-21 63.32% <87.98%> (+0.02%) ⬆️
temurin 63.35% <87.98%> (+0.04%) ⬆️
unittests 63.35% <87.98%> (+0.04%) ⬆️
unittests1 55.32% <87.98%> (+0.03%) ⬆️
unittests2 34.95% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya marked this pull request as ready for review April 12, 2026 03:15

@Override
public void addValue(@Nullable Object value) {
if (value != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but how do we handle null first/last?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation (both pre- and post-PR), nulls are simply skipped in addValue() / removeValue() - they don't participate in MIN/MAX (and other aggregations), which is standard SQL aggregate-over-nulls behavior. For other non aggregate window functions like FIRST_VALUE, LAST_VALUE, we support customizable null handling using the standard IGNORE NULLS / RESPECT NULLS modifiers - #14264. The actual null ordering gets pushed into the sort created below the window by the planner.

Comment on lines +53 to +55
while (!_deque.isEmpty() && _deque.lastLong() > longValue) {
_deque.dequeueLastLong();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand: It looks like in _deque we have N copies of the min value. It should be more efficient to just keep the min and the number of repetitions we had, right? I guess I'm missing something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the classic monotonic deque (ascending minima) pattern for O(N) amortized sliding-window min / max. The deque does not store N copies of the min. It maintains a monotonically non-decreasing sequence of future minimum candidates:

  • On addValue(x): pop from back while back > x, then push x. The deque stays sorted ascending.
  • On removeValue(x): if front == x, pop from front.
  • getCurrentAggregatedValue(): return front (the current window minimum).

Example for window sliding over [3, 1, 4, 1, 5]:

  • Add 3: deque = [3]
  • Add 1: deque = [1] (3 removed, since 3 > 1)
  • Add 4: deque = [1, 4]
  • Remove 3: front is 1 != 3, no-op (3 was already purged)
  • Add 1: deque = [1, 1] (4 removed, since 4 > 1)
  • Remove 1: front == 1, pop → deque = [1]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "min + count" approach would fail because when the last copy of the min is removed, you'd need an O(K) rescan to find the next min.

@yashmayya yashmayya force-pushed the polymorphic-window-functions branch from 27c2b07 to 60a085a Compare April 14, 2026 21:17
@yashmayya yashmayya merged commit e85fcd3 into apache:master Apr 14, 2026
16 checks passed
xiangfu0 pushed a commit to pinot-contrib/pinot-docs that referenced this pull request Apr 15, 2026
@xiangfu0
Copy link
Copy Markdown
Contributor

Documentation PR has been created for the polymorphic window function precision improvements: pinot-contrib/pinot-docs#743

The docs PR adds information about the type-specific aggregators (SumLongWindowValueAggregator, SumBigDecimalWindowValueAggregator, and primitive-based MIN/MAX aggregators) that preserve precision for LONG and BIG_DECIMAL types.

xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Apr 15, 2026
…ache/pinot#18169)

Merged PR 743: Documents the type-specific aggregator improvements in window functions introduced in apache/pinot#18169.
xiangfu0 pushed a commit to pinot-contrib/pinot-docs that referenced this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine window-functions Related to SQL window functions on the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants