Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Nov 20, 2025

Changes

Add maxRowsInDistinct/numRowsWithoutChangeInDistinct/maxExecutionTimeMsInDistinct plumbing throughout the distinct pipeline: query options, executors, dictionary plan, and broker/server metadata so early termination is reported consistently

Teach every distinct executor (raw & dictionary, single & multi column) to respect a per-query row budget by clamping block ranges, and surface the remaining allowance to DistinctOperator

Update DistinctOperator and DictionaryBasedDistinctOperator to program the row budget, compute accurate stats, and attach the standard early-termination reason on the results block

Add lightweight executor-level tests plus a custom integration suite that covers scalar/MV/multi-column distinct queries and both early-termination knobs for single- and multi-stage engines

Release Notes

Pinot now lets you short‑circuit expensive DISTINCT scans by telling the server when enough rows have been examined. Two broker query options control this behavior:

Option Type Effect
maxRowsInDistinct positive integer Stop reading once this many rows have been processed, even if more rows remain in the segment. Useful when you only need a best‑effort subset of keys.
numRowsWithoutChangeInDistinct positive integer Stop reading after this many additional rows fail to produce a new distinct key. Ideal for low‑cardinality columns—once the set stabilizes, you can quit early.
maxExecutionTimeMsInDistinct positive integer Stop reading after certain time for a block.

Both options apply per server. When either trigger fires, the server marks the response as a partial result (maxRowsInDistinctReached or numRowsWithoutChangeInDistinctReached), so the broker and client can decide whether to trust or retry the query.

Sample Queries

Limit the Total Rows Scanned

SET "maxRowsInDistinct" = 10_000;
SELECT DISTINCT city_id FROM trips WHERE status = 'COMPLETED';

Stops after 10k rows per server. The result set may contain fewer keys than the column’s full cardinality, but it returns quickly on large tables.

Stop Once the Distinct Set Stabilizes

SET "numRowsWithoutChangeInDistinct" = 5_000;
SELECT DISTINCT tenant_id FROM impressions WHERE date = '2024-09-01';

Once the server reads 5k additional rows without discovering a new tenant, it stops and marks the response as numRowsWithoutChangeInDistinctReached=true.

Combine Both Guards

SET "maxRowsInDistinct" = 20_000;
SET "numRowsWithoutChangeInDistinct" = 2_000;
SELECT DISTINCT campaign_id FROM clicks WHERE country = 'US';

This query will exit as soon as either budget is exhausted.

Response Metadata

When an early stop happens you’ll see the following broker response fields set to true:

maxRowsInDistinctReached
numRowsWithoutChangeInDistinctReached
partialResult

Downstream clients should check these flags before trusting the result.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements early-termination controls for DISTINCT queries by adding two new query options (maxRowsInDistinct and numRowsWithoutChangeInDistinct) and plumbing them through the distinct pipeline. The key changes enable operators to stop processing early when row budgets are exhausted or when no new distinct values are found, and properly surface these conditions in broker responses and statistics.

Key Changes

  • Added two new query options for controlling DISTINCT early termination
  • Implemented row budget tracking and enforcement in distinct executors
  • Added early termination reason enum and metadata propagation through the query pipeline

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java Added query option keys for distinct early termination
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/results/BaseResultsBlock.java Added EarlyTerminationReason enum and metadata support
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctExecutor.java Added default methods for row budget tracking in distinct executors
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/BaseSingleColumnDistinctExecutor.java Implemented row budget enforcement with proper range clamping
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/raw/RawMultiColumnDistinctExecutor.java Added row budget tracking for multi-column raw executor
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/dictionary/DictionaryBasedMultiColumnDistinctExecutor.java Added row budget tracking for multi-column dictionary executor
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DistinctOperator.java Implemented early termination logic with row budget and no-change detection
pinot-core/src/main/java/org/apache/pinot/core/operator/query/DictionaryBasedDistinctOperator.java Added row budget clamping for dictionary-based distinct queries
pinot-common/src/main/java/org/apache/pinot/common/utils/config/QueryOptionsUtils.java Added utility methods to parse distinct early termination options
pinot-common/src/main/java/org/apache/pinot/common/response/broker/BrokerResponseNative.java Added fields and methods for distinct early termination flags
pinot-common/src/main/java/org/apache/pinot/common/response/broker/BrokerResponseNativeV2.java Added multi-stage support for distinct early termination flags
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTable.java Added EARLY_TERMINATION_REASON metadata key
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/ExecutionStatsAggregator.java Added aggregation logic for distinct early termination reasons
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/LeafOperator.java Added stat key handling for distinct early termination
pinot-core/src/test/java/org/apache/pinot/queries/DistinctQueriesTest.java Added unit tests for distinct early termination
pinot-core/src/test/java/org/apache/pinot/core/query/distinct/DistinctExecutorEarlyTerminationTest.java Added executor-level tests for row budget enforcement
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/LeafOperatorTest.java Added tests for early termination stat recording
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/DistinctQueriesTest.java Added integration tests covering both single and multi-stage engines

@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 57.54986% with 298 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (7ca1984) to head (53cd208).

Files with missing lines Patch % Lines
...perator/query/DictionaryBasedDistinctOperator.java 39.40% 86 Missing and 37 partials ⚠️
...y/distinct/raw/RawMultiColumnDistinctExecutor.java 34.37% 34 Missing and 8 partials ⚠️
...ry/DictionaryBasedMultiColumnDistinctExecutor.java 35.48% 23 Missing and 17 partials ⚠️
...he/pinot/core/operator/query/DistinctOperator.java 75.00% 7 Missing and 9 partials ⚠️
...uery/distinct/DistinctEarlyTerminationContext.java 81.42% 7 Missing and 6 partials ⚠️
...ery/distinct/BaseSingleColumnDistinctExecutor.java 80.32% 7 Missing and 5 partials ⚠️
...common/response/broker/BrokerResponseNativeV2.java 26.66% 11 Missing ⚠️
...tor/combine/merger/DistinctResultsBlockMerger.java 76.59% 3 Missing and 8 partials ⚠️
...e/pinot/common/utils/config/QueryOptionsUtils.java 58.82% 6 Missing and 1 partial ⚠️
...he/pinot/core/query/distinct/DistinctExecutor.java 0.00% 7 Missing ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17247      +/-   ##
============================================
- Coverage     63.28%   63.27%   -0.02%     
  Complexity     1476     1476              
============================================
  Files          3162     3164       +2     
  Lines        188701   189288     +587     
  Branches      28877    29050     +173     
============================================
+ Hits         119425   119777     +352     
- Misses        60017    60152     +135     
- Partials       9259     9359     +100     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.23% <57.54%> (-0.01%) ⬇️
java-21 63.21% <57.54%> (-0.04%) ⬇️
temurin 63.27% <57.54%> (-0.02%) ⬇️
unittests 63.27% <57.54%> (-0.02%) ⬇️
unittests1 55.62% <57.96%> (+0.02%) ⬆️
unittests2 33.92% <1.42%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch 2 times, most recently from 7b07625 to 5e2cbc9 Compare November 21, 2025 08:43
@xiangfu0 xiangfu0 requested a review from Copilot November 21, 2025 08:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch from 5e2cbc9 to 20eb844 Compare November 21, 2025 09:21
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept is okay, but the support needs to be integrated into the combine operator. Currently it is a per segment limit, which is not very useful

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch from 20eb844 to dc1ab0d Compare November 30, 2025 07:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch 5 times, most recently from 55bf800 to a7214d2 Compare December 3, 2025 04:56
@xiangfu0 xiangfu0 requested a review from Copilot December 10, 2025 03:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch 5 times, most recently from d8910c4 to 7b038c8 Compare December 16, 2025 02:39
@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch 3 times, most recently from a5f6d80 to d5cb462 Compare December 19, 2025 22:59
@xiangfu0 xiangfu0 changed the title Add distinct early‑termination controls with executor/unit/integration coverage Support DISTINCT early-termination budgets (row/no-change/time) and surface partial-result flags Dec 20, 2025
@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch from d5cb462 to cd3d015 Compare December 21, 2025 09:10
@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch 3 times, most recently from dc73f87 to 82adb3c Compare January 4, 2026 03:21
@xiangfu0 xiangfu0 requested a review from Copilot January 4, 2026 04:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.

@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch from 82adb3c to a641555 Compare January 4, 2026 15:00
@xiangfu0 xiangfu0 requested a review from Copilot January 4, 2026 15:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated no new comments.

Support early termination in combine operator
@xiangfu0 xiangfu0 force-pushed the early-termination-distinct-operator branch from a641555 to 53cd208 Compare January 7, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants