Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Benchmark and maybe Override DeltaBitPackEncoder Defaults #34536

Closed
mapleFU opened this issue Mar 11, 2023 · 6 comments · Fixed by #34632
Closed

[C++][Parquet] Benchmark and maybe Override DeltaBitPackEncoder Defaults #34536

mapleFU opened this issue Mar 11, 2023 · 6 comments · Fixed by #34632

Comments

@mapleFU
Copy link
Member

mapleFU commented Mar 11, 2023

Describe the enhancement requested

Parquet C++ DELTA_BINARY_PACKED uses:

  • Block size - the number of values per block (default: 128)
  • Miniblock count - the number of miniblocks per block (default: 4)

As mentioned in apache/arrow-rs#2282 . We may benchmark it and make it suitable for different bitwidth.

Component(s)

C++, Parquet

@mapleFU mapleFU changed the title [C++][Parquet] Override DeltaBitPackEncoder Defaults [C++][Parquet] Benchmark and maybe Override DeltaBitPackEncoder Defaults Mar 11, 2023
@mapleFU
Copy link
Member Author

mapleFU commented Mar 11, 2023

cc @rok

@rok
Copy link
Member

rok commented Mar 11, 2023

Benchmarking with overridden defaults makes a lot of sense yes! Do you think different bitwidth ranges (of random data) have different optimal results? If they have strong influence we might want to document that.

@mapleFU
Copy link
Member Author

mapleFU commented Mar 13, 2023

As arrow-rs says, it use DeltaBinaryPacked as default encoding as default encoder for integers. And it found out that, for uniform distributed numbers, wider bitwidth should be much better.

I'll benchmark delta binary packed for different size and different input distribution in x86_64 and neon machine, and find out if we should make it larger. To be honest, the best way should be adaptive encoding, but I'm not so familiar with encoding algorithms

By the way, tustvoid mentions that, http://arxiv.org/pdf/1209.2137v5.pdf declares why shouldn't we encoding delta in this way...

@wgtmac
Copy link
Member

wgtmac commented Mar 14, 2023

This is interesting. We need to make these parameters configurable via ColumnProperties or something before benchmarking.

By the way, I really doubt we can reach to a clear answer to this question in the end. The best encoding ratio is the entropy of the data which requires a precise knowledge of probability and distribution of all input words. As the result is highly dependent on the data distribution, we need to define and prepare some datasets with distinct distribution and pattern.

@mapleFU
Copy link
Member Author

mapleFU commented Mar 18, 2023

On My M1 MacOS:

Current ( BlockSize: 128, BlockCnt: 4)

BM_DeltaBitPackingDecode_Int64_Fixed/1024        13582 ns        13525 ns        51456 bytes_per_second=577.645M/s items_per_second=75.7131M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096        52663 ns        52510 ns        13405 bytes_per_second=595.123M/s items_per_second=78.0039M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768      415123 ns       414050 ns         1676 bytes_per_second=603.793M/s items_per_second=79.1403M/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536      855424 ns       836675 ns          843 bytes_per_second=597.604M/s items_per_second=78.3291M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        5906 ns         5831 ns       123092 bytes_per_second=1.30833G/s items_per_second=175.6M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096       21125 ns        21035 ns        32590 bytes_per_second=1.4508G/s items_per_second=194.723M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768     163213 ns       162927 ns         4316 bytes_per_second=1.49847G/s items_per_second=201.121M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536     333152 ns       327162 ns         2155 bytes_per_second=1.49248G/s items_per_second=200.317M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          6933 ns         6912 ns       101821 bytes_per_second=1.10379G/s items_per_second=148.149M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096         25010 ns        24927 ns        27997 bytes_per_second=1.22428G/s items_per_second=164.32M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768       197694 ns       195819 ns         3572 bytes_per_second=1.24677G/s items_per_second=167.339M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536       386595 ns       386369 ns         1795 bytes_per_second=1.26377G/s items_per_second=169.62M/s

After (BlockSize: 256, BlockCnt: 4)

BM_DeltaBitPackingDecode_Int64_Fixed/1024         9901 ns         9886 ns        70292 bytes_per_second=790.285M/s items_per_second=103.584M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096        37284 ns        37213 ns        18831 bytes_per_second=839.771M/s items_per_second=110.07M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768      305428 ns       298120 ns         2386 bytes_per_second=838.589M/s items_per_second=109.916M/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536      597997 ns       594524 ns         1171 bytes_per_second=841.008M/s items_per_second=110.233M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        4738 ns         4736 ns       147537 bytes_per_second=1.61106G/s items_per_second=216.233M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096       16861 ns        16857 ns        41654 bytes_per_second=1.8104G/s items_per_second=242.987M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768     130294 ns       130280 ns         5395 bytes_per_second=1.87397G/s items_per_second=251.52M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536     259451 ns       259450 ns         2695 bytes_per_second=1.88198G/s items_per_second=252.595M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          5335 ns         5334 ns       131512 bytes_per_second=1.43021G/s items_per_second=191.96M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096         18847 ns        18846 ns        37204 bytes_per_second=1.61932G/s items_per_second=217.342M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768       145769 ns       145764 ns         4815 bytes_per_second=1.67491G/s items_per_second=224.802M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536       289912 ns       289889 ns         2413 bytes_per_second=1.68438G/s items_per_second=226.073M/s

@mapleFU
Copy link
Member Author

mapleFU commented Mar 18, 2023

On my PC (Ryzen 3800X, avx2 enabled):

Before:

BM_DeltaBitPackingDecode_Int64_Fixed/1024         1600 ns         1600 ns       491568 bytes_per_second=4.76804G/s items_per_second=639.955M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         5246 ns         5246 ns       127230 bytes_per_second=5.81734G/s items_per_second=780.79M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       40950 ns        40950 ns        17193 bytes_per_second=5.96191G/s items_per_second=800.194M/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       86954 ns        86954 ns         7647 bytes_per_second=5.61541G/s items_per_second=753.688M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        1397 ns         1397 ns       533580 bytes_per_second=5.4631G/s items_per_second=733.244M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        4861 ns         4861 ns       130775 bytes_per_second=6.27746G/s items_per_second=842.547M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      38218 ns        38218 ns        18204 bytes_per_second=6.38816G/s items_per_second=857.405M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      81387 ns        81378 ns         9067 bytes_per_second=6.00015G/s items_per_second=805.326M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1667 ns         1667 ns       415468 bytes_per_second=4.57783G/s items_per_second=614.426M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          6611 ns         6611 ns       106650 bytes_per_second=4.61636G/s items_per_second=619.598M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        50174 ns        50175 ns        13207 bytes_per_second=4.86579G/s items_per_second=653.076M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        95179 ns        95177 ns         7163 bytes_per_second=5.13025G/s items_per_second=688.57M/s

After

BM_DeltaBitPackingDecode_Int64_Fixed/1024         1263 ns         1263 ns       566989 bytes_per_second=6.04143G/s items_per_second=810.867M/s
BM_DeltaBitPackingDecode_Int64_Fixed/4096         4947 ns         4947 ns       147214 bytes_per_second=6.16904G/s items_per_second=827.995M/s
BM_DeltaBitPackingDecode_Int64_Fixed/32768       34564 ns        34563 ns        18313 bytes_per_second=7.06362G/s items_per_second=948.063M/s
BM_DeltaBitPackingDecode_Int64_Fixed/65536       71620 ns        71620 ns        10407 bytes_per_second=6.8177G/s items_per_second=915.057M/s
BM_DeltaBitPackingDecode_Int64_Narrow/1024        1126 ns         1126 ns       668056 bytes_per_second=6.77524G/s items_per_second=909.358M/s
BM_DeltaBitPackingDecode_Int64_Narrow/4096        3870 ns         3870 ns       178266 bytes_per_second=7.8859G/s items_per_second=1058.43M/s
BM_DeltaBitPackingDecode_Int64_Narrow/32768      30225 ns        30225 ns        22369 bytes_per_second=8.07734G/s items_per_second=1084.12M/s
BM_DeltaBitPackingDecode_Int64_Narrow/65536      62196 ns        62196 ns        11439 bytes_per_second=7.8507G/s items_per_second=1053.7M/s
BM_DeltaBitPackingDecode_Int64_Wide/1024          1138 ns         1138 ns       643474 bytes_per_second=6.70583G/s items_per_second=900.041M/s
BM_DeltaBitPackingDecode_Int64_Wide/4096          4244 ns         4244 ns       168230 bytes_per_second=7.19138G/s items_per_second=965.21M/s
BM_DeltaBitPackingDecode_Int64_Wide/32768        33419 ns        33420 ns        21727 bytes_per_second=7.30532G/s items_per_second=980.504M/s
BM_DeltaBitPackingDecode_Int64_Wide/65536        75581 ns        75581 ns        10711 bytes_per_second=6.46034G/s items_per_second=867.093M/s

@wjones127 wjones127 added this to the 12.0.0 milestone Mar 31, 2023
wjones127 pushed a commit that referenced this issue Mar 31, 2023
…oder (#34632)

### Rationale for this change

Change default DeltaBitPackEncoder BlockSize from 32 to 64.

### What changes are included in this PR?

Tiny block size change, and an trivial optimization.

### Are these changes tested?

No

### Are there any user-facing changes?

No

* Closes: #34536

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…ackEncoder (apache#34632)

### Rationale for this change

Change default DeltaBitPackEncoder BlockSize from 32 to 64.

### What changes are included in this PR?

Tiny block size change, and an trivial optimization.

### Are these changes tested?

No

### Are there any user-facing changes?

No

* Closes: apache#34536

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants