Skip to content

[Multi-stage] Support null in aggregate and filter#10799

Merged
Jackie-Jiang merged 1 commit intoapache:masterfrom
Jackie-Jiang:multi_stage_null_handling
May 25, 2023
Merged

[Multi-stage] Support null in aggregate and filter#10799
Jackie-Jiang merged 1 commit intoapache:masterfrom
Jackie-Jiang:multi_stage_null_handling

Conversation

@Jackie-Jiang
Copy link
Copy Markdown
Contributor

No description provided.

@Jackie-Jiang Jackie-Jiang added feature multi-stage Related to the multi-stage query engine labels May 23, 2023
@Jackie-Jiang Jackie-Jiang requested a review from walterddr May 23, 2023 23:08
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 23, 2023

Codecov Report

Merging #10799 (4a06147) into master (c4ee122) will decrease coverage by 0.05%.
The diff coverage is 56.86%.

@@             Coverage Diff              @@
##             master   #10799      +/-   ##
============================================
- Coverage     70.38%   70.33%   -0.05%     
- Complexity     6493     6497       +4     
============================================
  Files          2158     2158              
  Lines        116023   116060      +37     
  Branches      17563    17561       -2     
============================================
- Hits          81657    81628      -29     
- Misses        28676    28726      +50     
- Partials       5690     5706      +16     
Flag Coverage Δ
integration1 24.18% <0.00%> (+0.06%) ⬆️
integration2 23.75% <0.00%> (-0.02%) ⬇️
unittests1 67.86% <56.86%> (-0.07%) ⬇️
unittests2 13.66% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ry/runtime/operator/utils/FunctionInvokeUtils.java 100.00% <ø> (ø)
...query/runtime/operator/utils/AggregationUtils.java 74.35% <34.48%> (-22.20%) ⬇️
...inot/query/runtime/operator/AggregateOperator.java 78.12% <48.71%> (-14.19%) ⬇️
...query/runtime/operator/operands/FilterOperand.java 73.77% <66.66%> (-1.23%) ⬇️
...che/pinot/query/planner/logical/RexExpression.java 95.52% <100.00%> (+1.67%) ⬆️
...lanner/partitioning/FieldSelectionKeySelector.java 62.96% <100.00%> (+2.96%) ⬆️
...uery/runtime/operator/WindowAggregateOperator.java 95.28% <100.00%> (ø)
...ry/runtime/operator/operands/TransformOperand.java 67.85% <100.00%> (-5.22%) ⬇️

... and 34 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang Jackie-Jiang force-pushed the multi_stage_null_handling branch from c3e69c2 to f3d2259 Compare May 24, 2023 00:05
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, today the leaf node leverages the nullValueBitmap, but this is not available in the intermediate stages.

Example:

    if (_nullHandlingEnabled) {
      // TODO: avoid the null bitmap check when it is null or empty for better performance.
      RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
      if (nullBitmap == null) {
        nullBitmap = new RoaringBitmap();
      }
      aggregateNullHandlingEnabled(length, aggregationResultHolder, blockValSet, nullBitmap);
      return;
    }

Now we usually split the LogicalAggregate into an intermediate stage (which may or may not run on leaf) and a final stage (which probably won't run on leaf as it needs to do the final aggregations). Does this mean that the nullValueBitmap won't be leveraged if this intermediate stage doesn't run in the leaf? will that affect query results?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to piece out what this comment is about, here is my understanding:

  • intermediate stage is run on List<Object[]> so it can naturally represent null with the object array
  • what you mentioned with nullValueBitmap is being used in 2 places
    • in leaf operators that will read it out when enableNullHandling flag is turned on
    • in data table ser/de --> this also means when intermediate stage received the payload, it will deserialized it into List<Object[]> with the null put right into places.

please let me know if this answers your question

Copy link
Copy Markdown
Contributor

@somandal somandal May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting back on this! So to make sure I understand correctly, when the data is sent from leaf to intermediate, the null values will be correctly set to null in the List<Object[]>? Thus making a simple null check like this good enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. If null handling is enabled, null should be properly set into the Object[], and the null bitmap is already dropped

Copy link
Copy Markdown
Contributor

@walterddr walterddr May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the data is sent from leaf to intermediate, the null values will be correctly set to null in the List<Object[]>? Thus making a simple null check like this good enough?

this statement is not entirely true.

  • when sent, the data is still Ser/De into bitmap (see DataBlock)
  • when used, it is pulled from TransferableBlock which has 2 member variable format: (1) DataBlock _dataBlock format and (2) List<Object[]> _container
  • it is when the lazy eval of the _container getter that puts the null in the right place

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for explaining in detail! Checking out that part of the code

@Jackie-Jiang Jackie-Jiang force-pushed the multi_stage_null_handling branch from f3d2259 to 4a06147 Compare May 24, 2023 23:55
@Jackie-Jiang Jackie-Jiang merged commit fcaebab into apache:master May 25, 2023
@Jackie-Jiang Jackie-Jiang deleted the multi_stage_null_handling branch May 25, 2023 17:17
@xiangfu0 xiangfu0 added the feature New functionality label Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New functionality multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants