-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support DISTINCT early-termination budgets (row/no-change/time) and surface partial-result flags #17247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support DISTINCT early-termination budgets (row/no-change/time) and surface partial-result flags #17247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements early-termination controls for DISTINCT queries by adding two new query options (maxRowsInDistinct and numRowsWithoutChangeInDistinct) and plumbing them through the distinct pipeline. The key changes enable operators to stop processing early when row budgets are exhausted or when no new distinct values are found, and properly surface these conditions in broker responses and statistics.
Key Changes
- Added two new query options for controlling DISTINCT early termination
- Implemented row budget tracking and enforcement in distinct executors
- Added early termination reason enum and metadata propagation through the query pipeline
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java | Added query option keys for distinct early termination |
| pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/results/BaseResultsBlock.java | Added EarlyTerminationReason enum and metadata support |
| pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutor.java | Added default methods for row budget tracking in distinct executors |
| pinot-core/src/main/java/org/apache/pinot/core/query/distinct/BaseSingleColumnDistinctExecutor.java | Implemented row budget enforcement with proper range clamping |
| pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawMultiColumnDistinctExecutor.java | Added row budget tracking for multi-column raw executor |
| pinot-core/src/main/java/org/apache/pinot/core/query/distinct/dictionary/DictionaryBasedMultiColumnDistinctExecutor.java | Added row budget tracking for multi-column dictionary executor |
| pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java | Implemented early termination logic with row budget and no-change detection |
| pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedDistinctOperator.java | Added row budget clamping for dictionary-based distinct queries |
| pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java | Added utility methods to parse distinct early termination options |
| pinot-common/src/main/java/org/apache/pinot/common/response/broker/BrokerResponseNative.java | Added fields and methods for distinct early termination flags |
| pinot-common/src/main/java/org/apache/pinot/common/response/broker/BrokerResponseNativeV2.java | Added multi-stage support for distinct early termination flags |
| pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTable.java | Added EARLY_TERMINATION_REASON metadata key |
| pinot-core/src/main/java/org/apache/pinot/core/query/reduce/ExecutionStatsAggregator.java | Added aggregation logic for distinct early termination reasons |
| pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/LeafOperator.java | Added stat key handling for distinct early termination |
| pinot-core/src/test/java/org/apache/pinot/queries/DistinctQueriesTest.java | Added unit tests for distinct early termination |
| pinot-core/src/test/java/org/apache/pinot/core/query/distinct/DistinctExecutorEarlyTerminationTest.java | Added executor-level tests for row budget enforcement |
| pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/LeafOperatorTest.java | Added tests for early termination stat recording |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java | Added integration tests covering both single and multi-stage engines |
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17247 +/- ##
============================================
- Coverage 63.28% 63.27% -0.02%
Complexity 1476 1476
============================================
Files 3162 3164 +2
Lines 188701 189288 +587
Branches 28877 29050 +173
============================================
+ Hits 119425 119777 +352
- Misses 60017 60152 +135
- Partials 9259 9359 +100
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7b07625 to
5e2cbc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...ore/src/main/java/org/apache/pinot/core/query/distinct/BaseSingleColumnDistinctExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java
Outdated
Show resolved
Hide resolved
5e2cbc9 to
20eb844
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept is okay, but the support needs to be integrated into the combine operator. Currently it is a per segment limit, which is not very useful
20eb844 to
dc1ab0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.
...e/src/main/java/org/apache/pinot/core/query/distinct/raw/RawMultiColumnDistinctExecutor.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java
Outdated
Show resolved
Hide resolved
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...ore/src/test/java/org/apache/pinot/core/operator/combine/DistinctResultsBlockMergerTest.java
Outdated
Show resolved
Hide resolved
55bf800 to
a7214d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutorFactory.java
Outdated
Show resolved
Hide resolved
d8910c4 to
7b038c8
Compare
a5f6d80 to
d5cb462
Compare
d5cb462 to
cd3d015
Compare
dc73f87 to
82adb3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.
...core/src/main/java/org/apache/pinot/core/query/distinct/DistinctEarlyTerminationContext.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/query/distinct/DistinctEarlyTerminationContext.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/operator/combine/merger/DistinctResultsBlockMerger.java
Show resolved
Hide resolved
...ation-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/pinot/core/query/distinct/DistinctExecutorEarlyTerminationTest.java
Show resolved
Hide resolved
82adb3c to
a641555
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated no new comments.
Support early termination in combine operator
a641555 to
53cd208
Compare
Changes
Add
maxRowsInDistinct/numRowsWithoutChangeInDistinct/maxExecutionTimeMsInDistinctplumbing throughout the distinct pipeline: query options, executors, dictionary plan, and broker/server metadata so early termination is reported consistentlyTeach every distinct executor (raw & dictionary, single & multi column) to respect a per-query row budget by clamping block ranges, and surface the remaining allowance to
DistinctOperatorUpdate
DistinctOperatorandDictionaryBasedDistinctOperatorto program the row budget, compute accurate stats, and attach the standard early-termination reason on the results blockAdd lightweight executor-level tests plus a custom integration suite that covers scalar/MV/multi-column distinct queries and both early-termination knobs for single- and multi-stage engines
Release Notes
Pinot now lets you short‑circuit expensive DISTINCT scans by telling the server when enough rows have been examined. Two broker query options control this behavior:
Sample Queries
Limit the Total Rows Scanned
Stops after 10k rows per server. The result set may contain fewer keys than the column’s full cardinality, but it returns quickly on large tables.
Stop Once the Distinct Set Stabilizes
Once the server reads 5k additional rows without discovering a new tenant, it stops and marks the response as numRowsWithoutChangeInDistinctReached=true.
Combine Both Guards
This query will exit as soon as either budget is exhausted.
Response Metadata
When an early stop happens you’ll see the following broker response fields set to
true:Downstream clients should check these flags before trusting the result.