GH-36297: [C++][Parquet] Benchmark for non-binary dict encoding #36298

mapleFU · 2023-06-26T09:10:29Z

Rationale for this change

Add benchmark for non-binary dict encoding

What changes are included in this PR?

Add benchmark BM_DictEncodingInt64

Are these changes tested?

no need

Are there any user-facing changes?

no

Closes: [C++][Parquet] Benchmark Dict Encoding for non-binary types #36297

github-actions · 2023-06-26T09:10:59Z

⚠️ GitHub issue #36297 has been automatically assigned in GitHub to PR creator.

mapleFU · 2023-06-26T12:34:31Z

@pitrou Would you mind take a look? Or here are already some benchmarks here?

pitrou · 2023-06-26T13:17:24Z

Why not, but is this useful? Is dictionary encoding often used with integers?

mapleFU · 2023-06-26T13:29:05Z

Hmm sometimes we use dictionary with integer, and I found that there're not benchmark for it. So add this benchmark.

pitrou · 2023-06-26T13:53:47Z

Are there decoding benchmarks already?

cpp/src/parquet/encoding_benchmark.cc

mapleFU · 2023-06-26T15:42:44Z

@pitrou Seems that we already have some decode test before. But they only test one value, see DecodeDict. I've unify the EncodeDict as same as DecodeDict

pitrou

Thanks @mapleFU ! A couple other suggestions.

cpp/src/parquet/encoding_benchmark.cc

pitrou · 2023-06-27T15:36:48Z

cpp/src/parquet/encoding_benchmark.cc

+    encoder->FlushValues();
+  }
+
+  state.SetBytesProcessed(state.iterations() * state.range(0) * sizeof(T));


Can you also add a SetItemsProcessed here and possibly in other benchmark functions?

And by the way:

Suggested change

state.SetBytesProcessed(state.iterations() * state.range(0) * sizeof(T));

state.SetBytesProcessed(state.iterations() * num_values * sizeof(T));

You addressed only one comment here.

Sorry for missing the message, I've fixed it and paste the result

mapleFU · 2023-06-27T16:00:25Z

Comment fixed

mapleFU · 2023-06-27T16:50:00Z

M1 MacOS, clang++13.0, Release(O2):

BM_DictDecodingInt64_repeats/1024                                     1016 ns         1013 ns       694975 bytes_per_second=7.53056G/s items_per_second=1010.73M/s
BM_DictDecodingInt64_repeats/4096                                     1373 ns         1369 ns       480367 bytes_per_second=22.2907G/s items_per_second=2.99181G/s
BM_DictDecodingInt64_repeats/32768                                    4692 ns         4672 ns       141777 bytes_per_second=52.2591G/s items_per_second=7.0141G/s
BM_DictDecodingInt64_repeats/65536                                    8837 ns         8828 ns        71320 bytes_per_second=55.3124G/s items_per_second=7.42391G/s
BM_DictEncodingInt64_repeats/1024                                     6120 ns         5971 ns       117212 bytes_per_second=1.27764G/s items_per_second=171.482M/s
BM_DictEncodingInt64_repeats/4096                                    23239 ns        23204 ns        30078 bytes_per_second=1.31518G/s items_per_second=176.52M/s
BM_DictEncodingInt64_repeats/32768                                  192336 ns       185895 ns         3687 bytes_per_second=1.31332G/s items_per_second=176.271M/s
BM_DictEncodingInt64_repeats/65536                                  386575 ns       371757 ns         1870 bytes_per_second=1.31344G/s items_per_second=176.287M/s
BM_DictDecodingInt64_literals/1024                                    2027 ns         1862 ns       376684 bytes_per_second=4.09711G/s items_per_second=549.904M/s
BM_DictDecodingInt64_literals/4096                                    5082 ns         4902 ns       143489 bytes_per_second=6.22589G/s items_per_second=835.625M/s
BM_DictDecodingInt64_literals/32768                                  68385 ns        59901 ns        12076 bytes_per_second=4.07575G/s items_per_second=547.038M/s
BM_DictDecodingInt64_literals/65536                                 114003 ns       111464 ns         6023 bytes_per_second=4.38064G/s items_per_second=587.959M/s
BM_DictEncodingInt64_literals/1024                                    7405 ns         7394 ns        94886 bytes_per_second=1056.53M/s items_per_second=138.482M/s
BM_DictEncodingInt64_literals/4096                                   30762 ns        30462 ns        23066 bytes_per_second=1025.87M/s items_per_second=134.463M/s
BM_DictEncodingInt64_literals/32768                                 267431 ns       267176 ns         2494 bytes_per_second=935.713M/s items_per_second=122.646M/s
BM_DictEncodingInt64_literals/65536                                 581665 ns       580486 ns         1139 bytes_per_second=861.347M/s items_per_second=112.898M/s

conbench-apache-arrow · 2023-07-01T00:54:37Z

Conbench analyzed the 6 benchmark runs on commit 4198aacf.

There were 8 benchmark results indicating a performance regression:

Commit Run on arm64-t4g-linux-compute at 2023-06-27 21:08:26Z
- params=threads:4/task_cost:100000/real_time, source=cpp-micro, suite=arrow-thread-pool-benchmark
Commit Run on arm64-m6g-linux-compute at 2023-06-27 21:03:58Z
- params=/ConvertToSparseCOOTensorInt32, source=cpp-micro, suite=arrow-tensor-conversion-benchmark
and 6 more (see the report linked below)

The full Conbench report has more details.

[ADD] Adding PREDICT_FALSE for Dict encoding

e51df4a

mapleFU requested a review from wjones127 as a code owner June 26, 2023 09:10

github-actions bot added Component: C++ Component: Parquet awaiting review Awaiting review labels Jun 26, 2023

pitrou reviewed Jun 26, 2023

View reviewed changes

cpp/src/parquet/encoding_benchmark.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 26, 2023

Merge branch 'main' into parquet/optimize-dict-encoding-and-benchmark

56376de

[Update] using EncodeDict to unify

ac35a89

mapleFU force-pushed the parquet/optimize-dict-encoding-and-benchmark branch from 19bfd91 to ac35a89 Compare June 26, 2023 15:46

pitrou requested changes Jun 27, 2023

View reviewed changes

fix comment

f233f75

mapleFU force-pushed the parquet/optimize-dict-encoding-and-benchmark branch from df7c6ab to f233f75 Compare June 27, 2023 16:50

pitrou approved these changes Jun 27, 2023

View reviewed changes

pitrou merged commit 4198aac into apache:main Jun 27, 2023
32 of 33 checks passed

pitrou removed the awaiting committer review Awaiting committer review label Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-36297: [C++][Parquet] Benchmark for non-binary dict encoding #36298

GH-36297: [C++][Parquet] Benchmark for non-binary dict encoding #36298

mapleFU commented Jun 26, 2023 •

edited by github-actions bot

github-actions bot commented Jun 26, 2023

mapleFU commented Jun 26, 2023

pitrou commented Jun 26, 2023

mapleFU commented Jun 26, 2023 •

edited

pitrou commented Jun 26, 2023

mapleFU commented Jun 26, 2023

pitrou left a comment

pitrou Jun 27, 2023

pitrou Jun 27, 2023

pitrou Jun 27, 2023

mapleFU Jun 27, 2023

mapleFU commented Jun 27, 2023

mapleFU commented Jun 27, 2023

conbench-apache-arrow bot commented Jul 1, 2023

	state.SetBytesProcessed(state.iterations() * state.range(0) * sizeof(T));
	state.SetBytesProcessed(state.iterations() * num_values * sizeof(T));

GH-36297: [C++][Parquet] Benchmark for non-binary dict encoding #36298

GH-36297: [C++][Parquet] Benchmark for non-binary dict encoding #36298

Conversation

mapleFU commented Jun 26, 2023 • edited by github-actions bot

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Jun 26, 2023

mapleFU commented Jun 26, 2023

pitrou commented Jun 26, 2023

mapleFU commented Jun 26, 2023 • edited

pitrou commented Jun 26, 2023

mapleFU commented Jun 26, 2023

pitrou left a comment

Choose a reason for hiding this comment

pitrou Jun 27, 2023

Choose a reason for hiding this comment

pitrou Jun 27, 2023

Choose a reason for hiding this comment

pitrou Jun 27, 2023

Choose a reason for hiding this comment

mapleFU Jun 27, 2023

Choose a reason for hiding this comment

mapleFU commented Jun 27, 2023

mapleFU commented Jun 27, 2023

conbench-apache-arrow bot commented Jul 1, 2023

mapleFU commented Jun 26, 2023 •

edited by github-actions bot

mapleFU commented Jun 26, 2023 •

edited