[Multi-stage] Support null in aggregate and filter#10799
[Multi-stage] Support null in aggregate and filter#10799Jackie-Jiang merged 1 commit intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10799 +/- ##
============================================
- Coverage 70.38% 70.33% -0.05%
- Complexity 6493 6497 +4
============================================
Files 2158 2158
Lines 116023 116060 +37
Branches 17563 17561 -2
============================================
- Hits 81657 81628 -29
- Misses 28676 28726 +50
- Partials 5690 5706 +16
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 34 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
c3e69c2 to
f3d2259
Compare
There was a problem hiding this comment.
Just curious, today the leaf node leverages the nullValueBitmap, but this is not available in the intermediate stages.
Example:
if (_nullHandlingEnabled) {
// TODO: avoid the null bitmap check when it is null or empty for better performance.
RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
if (nullBitmap == null) {
nullBitmap = new RoaringBitmap();
}
aggregateNullHandlingEnabled(length, aggregationResultHolder, blockValSet, nullBitmap);
return;
}
Now we usually split the LogicalAggregate into an intermediate stage (which may or may not run on leaf) and a final stage (which probably won't run on leaf as it needs to do the final aggregations). Does this mean that the nullValueBitmap won't be leveraged if this intermediate stage doesn't run in the leaf? will that affect query results?
There was a problem hiding this comment.
try to piece out what this comment is about, here is my understanding:
- intermediate stage is run on
List<Object[]>so it can naturally represent null with the object array - what you mentioned with nullValueBitmap is being used in 2 places
- in leaf operators that will read it out when
enableNullHandlingflag is turned on - in data table ser/de --> this also means when intermediate stage received the payload, it will deserialized it into
List<Object[]>with the null put right into places.
- in leaf operators that will read it out when
please let me know if this answers your question
There was a problem hiding this comment.
Thanks for getting back on this! So to make sure I understand correctly, when the data is sent from leaf to intermediate, the null values will be correctly set to null in the List<Object[]>? Thus making a simple null check like this good enough?
There was a problem hiding this comment.
Correct. If null handling is enabled, null should be properly set into the Object[], and the null bitmap is already dropped
There was a problem hiding this comment.
when the data is sent from leaf to intermediate, the null values will be correctly set to null in the
List<Object[]>? Thus making a simple null check like this good enough?
this statement is not entirely true.
- when sent, the data is still Ser/De into bitmap (see
DataBlock) - when used, it is pulled from
TransferableBlockwhich has 2 member variable format: (1)DataBlock _dataBlockformat and (2)List<Object[]> _container - it is when the lazy eval of the
_containergetter that puts the null in the right place
There was a problem hiding this comment.
Got it, thanks for explaining in detail! Checking out that part of the code
f3d2259 to
4a06147
Compare
No description provided.