Skip to content

[multistage] add maxRowsInJoin, maxRowsInWindow, numGroups to query response#17784

Open
dang-stripe wants to merge 1 commit intoapache:masterfrom
dang-stripe:dang-operator-limits-in-response
Open

[multistage] add maxRowsInJoin, maxRowsInWindow, numGroups to query response#17784
dang-stripe wants to merge 1 commit intoapache:masterfrom
dang-stripe:dang-operator-limits-in-response

Conversation

@dang-stripe
Copy link
Contributor

@dang-stripe dang-stripe commented Feb 27, 2026

summary

this addresses #17565

  • this adds maxRowsInJoin, maxRowsInWindow, numGroups to the query response so we can monitor how close queries are to their respective limits before queries start failing (throw overflow mode) or returning partial results (break overflow mode)
  • these fields are only updated for multistage. it does not update numGroups for single stage.

cc @Jackie-Jiang @gortiz @suvodeep-pyne

testing

  • added unit tests to cover edge cases
  • ran queries on a test cluster + table to verify fields return intended values. note that exchange_rates is a dim table.

# Query Type Mode Query maxRowsInJoin numGroups maxRowsInWindow Error? Result
Baseline
1Simple SELECTnormal
SELECT order_id, amount, currency
FROM orders
ORDER BY order_id
LIMIT 3
000PASS
JOIN
2Hash JOINnormal
SELECT o.order_id, o.amount, r.rate
FROM orders AS o
JOIN exchange_rates AS r
  ON o.currency = r.currency
LIMIT 5
400PASS
3Lookup JOINnormal
SELECT /*+ joinOptions(join_strategy='lookup') */
  o.order_id, o.amount, r.rate
FROM orders AS o
JOIN exchange_rates AS r
  ON o.currency = r.currency
LIMIT 5
0 (lookup bypasses HashJoinOperator — no hash table built)00PASS
4Hash JOINBREAK (limit=10)
SELECT /*+ joinOptions(
    max_rows_in_join='10',
    join_overflow_mode='BREAK') */
  o.order_id, o.amount, r.rate
FROM orders AS o
JOIN exchange_rates AS r
  ON o.currency = r.currency
LIMIT 5
10 (limit reached)00PASS
5Hash JOIN (self-join)THROW (limit=10)
SELECT /*+ joinOptions(
    max_rows_in_join='10',
    join_overflow_mode='THROW') */
  a.order_id, b.amount
FROM orders AS a
JOIN orders AS b
  ON a.customer_id = b.customer_id
LIMIT 5
252500245 — resource limit exceededPASS
GROUP BY
6GROUP BY (high card.)normal
SELECT order_id, COUNT(*) as cnt
FROM orders
GROUP BY order_id
ORDER BY cnt DESC
LIMIT 5
025050PASS
7GROUP BY (low card.)normal
SELECT currency, COUNT(*) as cnt,
       SUM(amount) as total_amount
FROM orders
GROUP BY currency
010PASS
8Global aggregatenormal
SELECT SUM(amount) as total,
       AVG(amount) as avg_amount
FROM orders
000PASS
9GROUP BYBREAK (limit=100)
SELECT /*+ aggOptions(
    num_groups_limit='100') */
  order_id, COUNT(*) as cnt
FROM orders
GROUP BY order_id
ORDER BY cnt DESC
LIMIT 5
0100 (limit reached)0PASS
10GROUP BYTHROW (limit=10)
SELECT /*+ aggOptions(
    num_groups_limit='10',
    error_on_num_groups_limit='true') */
  order_id, COUNT(*) as cnt
FROM orders
GROUP BY order_id
ORDER BY cnt DESC
LIMIT 5
01001000 — query execution errorPASS
Window
11Window ROW_NUMBER()normal
SELECT order_id, amount,
  ROW_NUMBER() OVER (ORDER BY amount DESC) as rn
FROM orders
LIMIT 5
005000PASS
12WindowBREAK (limit=100)
SET maxRowsInWindow = 100;
SET windowOverflowMode = 'BREAK';
SELECT order_id, amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) as rn
FROM orders
LIMIT 5
00100 (limit reached)PASS
13WindowTHROW (limit=10)
SET maxRowsInWindow = 10;
SELECT order_id, amount,
ROW_NUMBER() OVER (ORDER BY amount DESC) as rn
FROM orders
LIMIT 5
005000245 — resource limit exceededPASS
Combined operators
14JOIN + GROUP BYnormal
SELECT r.currency, COUNT(*) as cnt,
       SUM(o.amount) as total
FROM orders AS o
JOIN exchange_rates AS r
  ON o.currency = r.currency
GROUP BY r.currency
310PASS
15JOIN + GROUP BY + Windownormal
WITH grouped AS (
  SELECT o.currency, r.rate,
         COUNT(*) as cnt,
         SUM(o.amount) as total
  FROM orders AS o
  JOIN exchange_rates AS r
    ON o.currency = r.currency
  GROUP BY o.currency, r.rate
)
SELECT currency, rate, cnt, total,
  ROW_NUMBER() OVER (ORDER BY total DESC) as rn
FROM grouped
311PASS

@dang-stripe dang-stripe changed the title add maxRowsInJoin, maxRowsInWindow, numGroups to query response [multistage] add maxRowsInJoin, maxRowsInWindow, numGroups to query response Feb 27, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 71.87500% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.22%. Comparing base (325ac2a) to head (1720686).
⚠️ Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
...common/response/broker/BrokerResponseNativeV2.java 58.33% 5 Missing ⚠️
...t/common/response/broker/BrokerResponseNative.java 0.00% 3 Missing ⚠️
...not/query/runtime/operator/MultiStageOperator.java 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17784      +/-   ##
============================================
+ Coverage     63.17%   63.22%   +0.04%     
  Complexity     1454     1454              
============================================
  Files          3176     3183       +7     
  Lines        191025   191459     +434     
  Branches      29206    29273      +67     
============================================
+ Hits         120688   121044     +356     
- Misses        60920    60979      +59     
- Partials       9417     9436      +19     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.19% <71.87%> (+0.04%) ⬆️
java-21 63.18% <71.87%> (+0.04%) ⬆️
temurin 63.22% <71.87%> (+0.04%) ⬆️
unittests 63.21% <71.87%> (+0.04%) ⬆️
unittests1 55.61% <71.87%> (+0.07%) ⬆️
unittests2 34.11% <0.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya added observability multi-stage Related to the multi-stage query engine labels Feb 27, 2026
@gortiz
Copy link
Contributor

gortiz commented Feb 27, 2026

I would like to look for other approaches instead of this one. I have the feeling we are bloating up the response with too many attributes that are either circumstancial or derivable from the query stats.

I understand this means to move the processing to the client side, which may not be the best, but we can have other alternatives like some optional response decorator that takes tle stageStats and create new attributes based on that. With a system like this you can create your own decorators without increasing the already too large metadata response

@dang-stripe
Copy link
Contributor Author

@gortiz replicating the logic for how these limits are enforced on the client seems like a significant burden. i believe stage stats is not a stable API yet either meaning this can break across releases. if that is the expectation, ideally there should be a canonical library for processing stage stats.

i feel that the current approach supplements the existing fields in the query response (maxRowsInJoinReached, maxRowsInWindowReached) well without having to understand the complexity of stage stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine observability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants