[C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED #14951

pitrou · 2022-12-14T16:58:49Z

Describe the enhancement requested

Now that we support the DELTA_BINARY_PACKED encoding for both reading and writing, we can add some benchmarks for it.

Component(s)

C++, Parquet

mapleFU · 2022-12-28T07:20:48Z

Should it just follow src/parquet/encoding_benchmark.cc? I'd like to have a try.

rok · 2022-12-28T18:13:59Z

That seems like the right place @mapleFU. I think you can go ahead and open a PR! :)

mapleFU · 2022-12-31T13:34:59Z

By the way, should we generate the input using random? Seems that if we use a incremental number or fixed number, the data of DELTA_BINARY_PACKED should be well compressed and optimized, which is kind of unfair

pitrou · 2022-12-31T13:40:05Z

By the way, should we generate the input using random? Seems that if we use a incremental number or fixed number, the data of DELTA_BINARY_PACKED should be well compressed and optimized, which is kind of unfair

Use random but with two different magnitudes (narrow or wide) such as to exercise both the easily compressible and poorly compressible cases.

mapleFU · 2022-12-31T14:26:51Z

By the way, I want to ask that what's the differences between column_io_benchmark, column_reader_benchmark and encoding_benchmark? And do we have some benchmark for compress + encoding? Because some encoding will not benifit a lot when encoding, but will greatly improve the performance or saving the space when compress and encoding, like byte_stream_split.

pitrou · 2023-01-03T13:13:08Z

And do we have some benchmark for compress + encoding? Because some encoding will not benifit a lot when encoding, but will greatly improve the performance or saving the space when compress and encoding, like byte_stream_split.

I'm not convinced it's useful to add compression here. We're not writing the compression routines ourselves.

mapleFU · 2023-01-03T13:40:52Z

I'm not convinced it's useful to add compression here. We're not writing the compression routines ourselves.

Yes, you're right. But I think if compression is not tested, maybe we will only have benchmark for partial of reading parquet. For example, some benchmark would be "the decoder is how much slower than PlainDecoder" when CPU prefetch works well.

…ing (#15140) This patch support benchmark for DELTA_BINARY_PACKED. Different from PLAIN, it should considering the cases that data can or cannot be well compressed * Closes: #14951 Lead-authored-by: mwish <anmmscs_maple@qq.com> Co-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

… encoding (apache#15140) This patch support benchmark for DELTA_BINARY_PACKED. Different from PLAIN, it should considering the cases that data can or cannot be well compressed * Closes: apache#14951 Lead-authored-by: mwish <anmmscs_maple@qq.com> Co-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou added Type: enhancement Component: Parquet Component: C++ Component: Benchmarking good-first-issue labels Dec 14, 2022

github-actions bot mentioned this issue Dec 31, 2022

GH-14951: [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED encoding #15140

Merged

github-actions bot assigned mapleFU Dec 31, 2022

pitrou closed this as completed in #15140 Jan 3, 2023

pitrou added this to the 11.0.0 milestone Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED #14951

[C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED #14951

pitrou commented Dec 14, 2022

mapleFU commented Dec 28, 2022

rok commented Dec 28, 2022

mapleFU commented Dec 31, 2022

pitrou commented Dec 31, 2022

mapleFU commented Dec 31, 2022

pitrou commented Jan 3, 2023

mapleFU commented Jan 3, 2023

[C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED #14951

[C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED #14951

Comments

pitrou commented Dec 14, 2022

Describe the enhancement requested

Component(s)

mapleFU commented Dec 28, 2022

rok commented Dec 28, 2022

mapleFU commented Dec 31, 2022

pitrou commented Dec 31, 2022

mapleFU commented Dec 31, 2022

pitrou commented Jan 3, 2023

mapleFU commented Jan 3, 2023