Skip to content

fix data table vs. block format issue#8860

Closed
walterddr wants to merge 1 commit intoapache:masterfrom
walterddr:multi_stage_query_engine_fix_datatable
Closed

fix data table vs. block format issue#8860
walterddr wants to merge 1 commit intoapache:masterfrom
walterddr:multi_stage_query_engine_fix_datatable

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented Jun 8, 2022

previously multi-stage engine assumes no backward compatibility issue with existing pinot query server. however we are still depending on pinot-server to return DataTableImplV3 byte format to intermediate multi-stage engine.

To avoid duplicate ser/de we decided to backward compatible support these old ser/de formats. namely, the old 8-bytes floating point.

also, this:

  • doesn't work with OBJECT/BYTES_ARRAY type
  • doesn't support BYTES type as old bytes are converted into hexString and encoded as STRING column.

@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2022

Codecov Report

Merging #8860 (b681aa9) into master (81bda1d) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #8860      +/-   ##
============================================
+ Coverage     69.85%   69.86%   +0.01%     
- Complexity     4669     4671       +2     
============================================
  Files          1803     1803              
  Lines         93735    93737       +2     
  Branches      13932    13932              
============================================
+ Hits          65474    65485      +11     
+ Misses        23730    23720      -10     
- Partials       4531     4532       +1     
Flag Coverage Δ
integration1 26.44% <0.00%> (+<0.01%) ⬆️
integration2 24.88% <0.00%> (-0.31%) ⬇️
unittests1 66.57% <100.00%> (+0.05%) ⬆️
unittests2 15.46% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...he/pinot/core/common/datatable/DataTableUtils.java 78.76% <ø> (ø)
...va/org/apache/pinot/query/runtime/QueryRunner.java 86.66% <ø> (ø)
...e/pinot/query/runtime/blocks/DataBlockBuilder.java 74.46% <100.00%> (+0.21%) ⬆️
...che/pinot/query/runtime/blocks/DataBlockUtils.java 80.95% <100.00%> (ø)
...data/manager/realtime/DefaultSegmentCommitter.java 0.00% <0.00%> (-80.00%) ⬇️
...er/api/resources/LLCSegmentCompletionHandlers.java 43.56% <0.00%> (-18.82%) ⬇️
...nction/DistinctCountBitmapAggregationFunction.java 47.15% <0.00%> (-12.44%) ⬇️
...data/manager/realtime/SegmentCommitterFactory.java 88.23% <0.00%> (-11.77%) ⬇️
...altime/ServerSegmentCompletionProtocolHandler.java 51.88% <0.00%> (-6.61%) ⬇️
.../filter/predicate/InPredicateEvaluatorFactory.java 70.45% <0.00%> (-4.55%) ⬇️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81bda1d...b681aa9. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several inefficiency in the current data table implementation. I'd suggest making a V4 to address the inefficiency instead of making the new code follow the bad format.

Things we want to improve:

  • Store float as 4 bytes
  • Do not serialize bytes to string
  • For each column, add an optional bitmap to store the nulls for the column

@walterddr
Copy link
Contributor Author

There are several inefficiency in the current data table implementation. I'd suggest making a V4 to address the inefficiency instead of making the new code follow the bad format.

Things we want to improve:

  • Store float as 4 bytes
  • Do not serialize bytes to string
  • For each column, add an optional bitmap to store the nulls for the column

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

@Jackie-Jiang
Copy link
Contributor

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

How much effort does it take to make the new engine work on a new data table format if we add one? The goal is to avoid spending a lot of effort for backward compatibility when we upgrade the new format. Since the engine is new added, it would be good to make it directly work on the latest format

@walterddr
Copy link
Contributor Author

ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR.

How much effort does it take to make the new engine work on a new data table format if we add one? The goal is to avoid spending a lot of effort for backward compatibility when we upgrade the new format. Since the engine is new added, it would be good to make it directly work on the latest format

no effort. as it is already assuming new format.
the problem is backward compatibility is not handled correctly when move from V2 to V3 --> the data format are still managed in a common Utils class. I will refactor our the common util class and make 2 versions one for V2, V3 and one for V4.

@walterddr walterddr closed this Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants