Skip to content

Sample per-block resource usage in ProjectionOperator for OOM accounting#18620

Closed
yashmayya wants to merge 2 commits into
apache:masterfrom
yashmayya:add-groupby-leaf-oom-sampling
Closed

Sample per-block resource usage in ProjectionOperator for OOM accounting#18620
yashmayya wants to merge 2 commits into
apache:masterfrom
yashmayya:add-groupby-leaf-oom-sampling

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented May 28, 2026

Summary

Scan-based query operators (aggregation, group-by, distinct, selection) pull data block-by-block through the shared ProjectionOperator. The projected block is where per-block row materialization happens and, for keyed operators, where the downstream hash table grows. Previously this shared path did not sample resource usage, so a long-running scan could allocate steadily while the query's tracked memory footprint lagged behind actual allocation — weakening per-query attribution for the OOM-protection framework.

This adds a single QueryThreadContext.sampleUsage() call in ProjectionOperator#getNextBlock(), just before each projected block is returned. Because every scan-based operator shares this projection path, instrumenting it once keeps OOM accounting fresh across all of them, once per block (MAX_DOC_PER_CALL rows), with negligible overhead.

Notes

  • Termination is intentionally not re-checked here: BaseOperator#nextBlock() already performs a per-block termination check on every operator, so cancelled / timed-out queries already bail out at block boundaries.
  • The child DocIdSetOperator also samples once per block, so this is a second sample on the common path — intentionally so: the docIdSet-level sample is taken before the block's column values are materialized, whereas this one is taken after DataBlockCache#initNewBlock, capturing the projection-side allocation. Sampling at the shared projection layer also gives uniform per-block coverage regardless of which BaseDocIdSetOperator feeds it.

@yashmayya yashmayya requested a review from Jackie-Jiang May 28, 2026 20:58
@yashmayya yashmayya added query Related to query processing oom-protection Related to out-of-memory protection mechanisms labels May 28, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.41%. Comparing base (aee5a1f) to head (b165b89).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18620      +/-   ##
============================================
- Coverage     64.45%   64.41%   -0.05%     
  Complexity     1137     1137              
============================================
  Files          3337     3337              
  Lines        206067   206068       +1     
  Branches      32127    32127              
============================================
- Hits         132816   132729      -87     
- Misses        62599    62710     +111     
+ Partials      10652    10629      -23     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.41% <100.00%> (-0.05%) ⬇️
temurin 64.41% <100.00%> (-0.05%) ⬇️
unittests 64.40% <100.00%> (-0.05%) ⬇️
unittests1 56.83% <100.00%> (+<0.01%) ⬆️
unittests2 36.93% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya changed the title Add OOM sampling and termination checks to leaf group-by aggregation Sample per-block resource usage in ProjectionOperator for OOM accounting May 28, 2026
@yashmayya
Copy link
Copy Markdown
Contributor Author

Closing this. On reflection, the standard scan path (ProjectPlanNode -> DocIdSetPlanNode -> DocIdSetOperator) already calls QueryThreadContext.sampleUsage() once per block, and group-by / aggregation / distinct / selection all go through it. The projection-level sample fires at essentially the same point in each block iteration (column materialization is lazy, happening later when the executor reads the block), so it adds no freshness over the existing per-block sample on the main path -- it would only backfill the bounded BitmapDocIdSetOperator-fed paths. Not worth the redundancy.

@yashmayya yashmayya closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oom-protection Related to out-of-memory protection mechanisms query Related to query processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants