fix data table vs. block format issue#8860
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8860 +/- ##
============================================
+ Coverage 69.85% 69.86% +0.01%
- Complexity 4669 4671 +2
============================================
Files 1803 1803
Lines 93735 93737 +2
Branches 13932 13932
============================================
+ Hits 65474 65485 +11
+ Misses 23730 23720 -10
- Partials 4531 4532 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Jackie-Jiang
left a comment
There was a problem hiding this comment.
There are several inefficiency in the current data table implementation. I'd suggest making a V4 to address the inefficiency instead of making the new code follow the bad format.
Things we want to improve:
- Store float as 4 bytes
- Do not serialize bytes to string
- For each column, add an optional bitmap to store the nulls for the column
ok. in this case I would suggest we add FLOAT/BYTES to the ignore list and still merge the integration test PR. |
How much effort does it take to make the new engine work on a new data table format if we add one? The goal is to avoid spending a lot of effort for backward compatibility when we upgrade the new format. Since the engine is new added, it would be good to make it directly work on the latest format |
no effort. as it is already assuming new format. |
previously multi-stage engine assumes no backward compatibility issue with existing pinot query server. however we are still depending on pinot-server to return
DataTableImplV3byte format to intermediate multi-stage engine.To avoid duplicate ser/de we decided to backward compatible support these old ser/de formats. namely, the old 8-bytes floating point.
also, this:
OBJECT/BYTES_ARRAYtypeBYTEStype as old bytes are converted into hexString and encoded as STRING column.