fix data table vs. block format issue by walterddr · Pull Request #8860 · apache/pinot

walterddr · 2022-06-08T06:08:40Z

previously multi-stage engine assumes no backward compatibility issue with existing pinot query server. however we are still depending on pinot-server to return DataTableImplV3 byte format to intermediate multi-stage engine.

To avoid duplicate ser/de we decided to backward compatible support these old ser/de formats. namely, the old 8-bytes floating point.

also, this:

doesn't work with OBJECT/BYTES_ARRAY type
doesn't support BYTES type as old bytes are converted into hexString and encoded as STRING column.

codecov-commenter · 2022-06-08T06:45:13Z

Codecov Report

Merging #8860 (b681aa9) into master (81bda1d) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #8860      +/-   ##
============================================
+ Coverage     69.85%   69.86%   +0.01%     
- Complexity     4669     4671       +2     
============================================
  Files          1803     1803              
  Lines         93735    93737       +2     
  Branches      13932    13932              
============================================
+ Hits          65474    65485      +11     
+ Misses        23730    23720      -10     
- Partials       4531     4532       +1

Flag	Coverage Δ
integration1	`26.44% <0.00%> (+<0.01%)`	⬆️
integration2	`24.88% <0.00%> (-0.31%)`	⬇️
unittests1	`66.57% <100.00%> (+0.05%)`	⬆️
unittests2	`15.46% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...he/pinot/core/common/datatable/DataTableUtils.java	`78.76% <ø> (ø)`
...va/org/apache/pinot/query/runtime/QueryRunner.java	`86.66% <ø> (ø)`
...e/pinot/query/runtime/blocks/DataBlockBuilder.java	`74.46% <100.00%> (+0.21%)`	⬆️
...che/pinot/query/runtime/blocks/DataBlockUtils.java	`80.95% <100.00%> (ø)`
...data/manager/realtime/DefaultSegmentCommitter.java	`0.00% <0.00%> (-80.00%)`	⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java	`43.56% <0.00%> (-18.82%)`	⬇️
...nction/DistinctCountBitmapAggregationFunction.java	`47.15% <0.00%> (-12.44%)`	⬇️
...data/manager/realtime/SegmentCommitterFactory.java	`88.23% <0.00%> (-11.77%)`	⬇️
...altime/ServerSegmentCompletionProtocolHandler.java	`51.88% <0.00%> (-6.61%)`	⬇️
.../filter/predicate/InPredicateEvaluatorFactory.java	`70.45% <0.00%> (-4.55%)`	⬇️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81bda1d...b681aa9. Read the comment docs.

Jackie-Jiang

There are several inefficiency in the current data table implementation. I'd suggest making a V4 to address the inefficiency instead of making the new code follow the bad format.

Things we want to improve:

Store float as 4 bytes
Do not serialize bytes to string
For each column, add an optional bitmap to store the nulls for the column

walterddr · 2022-06-08T22:42:26Z

There are several inefficiency in the current data table implementation. I'd suggest making a V4 to address the inefficiency instead of making the new code follow the bad format.

Things we want to improve:

Store float as 4 bytes

Do not serialize bytes to string

For each column, add an optional bitmap to store the nulls for the column

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

Jackie-Jiang · 2022-06-09T00:07:25Z

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

How much effort does it take to make the new engine work on a new data table format if we add one? The goal is to avoid spending a lot of effort for backward compatibility when we upgrade the new format. Since the engine is new added, it would be good to make it directly work on the latest format

walterddr · 2022-06-09T00:11:05Z

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

How much effort does it take to make the new engine work on a new data table format if we add one? The goal is to avoid spending a lot of effort for backward compatibility when we upgrade the new format. Since the engine is new added, it would be good to make it directly work on the latest format

no effort. as it is already assuming new format.
the problem is backward compatibility is not handled correctly when move from V2 to V3 --> the data format are still managed in a common Utils class. I will refactor our the common util class and make 2 versions one for V2, V3 and one for V4.

fix data table vs. block format issue

b681aa9

Jackie-Jiang reviewed Jun 8, 2022

View reviewed changes

walterddr closed this Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix data table vs. block format issue#8860

fix data table vs. block format issue#8860
walterddr wants to merge 1 commit intoapache:masterfrom
walterddr:multi_stage_query_engine_fix_datatable

walterddr commented Jun 8, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 8, 2022 •

edited

Loading

Uh oh!

Jackie-Jiang left a comment

Uh oh!

walterddr commented Jun 8, 2022

Uh oh!

Jackie-Jiang commented Jun 9, 2022

Uh oh!

walterddr commented Jun 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

walterddr commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

walterddr commented Jun 8, 2022

Uh oh!

Jackie-Jiang commented Jun 9, 2022

Uh oh!

walterddr commented Jun 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

walterddr commented Jun 8, 2022 •

edited

Loading

codecov-commenter commented Jun 8, 2022 •

edited

Loading