ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee #9638

lidavidm · 2021-03-05T19:00:19Z

This adds a microbenchmark for SimplifyWithGuarantee which, especially for a large dataset, can contribute a significant amount of time to reading a dataset, as it's used to evaluate partition expressions against the filter. This was used to help investigate ARROW-11781.

Two different filters are tested: one is fully simplified, and one has had casts inserted (which will happen if you Bind() against a schema with different types).
Two different partition expressions are tested: one is fully simplified, and one compares against dictionary-encoded values (which will happen by default if you infer the schema for a Hive-partitioned, for example).

All 4 combinations are additionally tested both when the filter matches the expression and when it does not match.

lidavidm · 2021-03-05T19:00:40Z

Results

--------------------------------------------------------------------------------------------------------------
Benchmark                                                                    Time             CPU   Iterations
--------------------------------------------------------------------------------------------------------------
SimplifyQueryWithGuarantee/negative_lhs_simple_guarantee_simple           7729 ns         7729 ns        88034
SimplifyQueryWithGuarantee/negative_lhs_cast_guarantee_simple            10769 ns        10769 ns        64463
SimplifyQueryWithGuarantee/negative_lhs_simple_guarantee_dictionary      17703 ns        17703 ns        39026
SimplifyQueryWithGuarantee/negative_lhs_cast_guarantee_dictionary        21208 ns        21207 ns        32716
SimplifyQueryWithGuarantee/positive_lhs_simple_guarantee_simple           7689 ns         7689 ns        88028
SimplifyQueryWithGuarantee/positive_lhs_cast_guarantee_simple            10793 ns        10793 ns        63819
SimplifyQueryWithGuarantee/positive_lhs_simple_guarantee_dictionary      18147 ns        18146 ns        39027
SimplifyQueryWithGuarantee/positive_lhs_cast_guarantee_dictionary        21193 ns        21193 ns        33022

github-actions · 2021-03-05T19:00:52Z

https://issues.apache.org/jira/browse/ARROW-11877

cpp/src/arrow/dataset/expression_benchmark.cc

bkietz

LGTM, thanks for doing this!

This adds a microbenchmark for SimplifyWithGuarantee which, especially for a large dataset, can contribute a significant amount of time to reading a dataset, as it's used to evaluate partition expressions against the filter. This was used to help investigate ARROW-11781. Two different filters are tested: one is fully simplified, and one has had casts inserted (which will happen if you Bind() against a schema with different types). Two different partition expressions are tested: one is fully simplified, and one compares against dictionary-encoded values (which will happen by default if you infer the schema for a Hive-partitioned, for example). All 4 combinations are additionally tested both when the filter matches the expression and when it does not match. Closes apache#9638 from lidavidm/arrow-11877 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>

lidavidm added the Component: C++ label Mar 5, 2021

jorisvandenbossche requested a review from bkietz March 8, 2021 13:08

bkietz requested changes Mar 9, 2021

View reviewed changes

lidavidm force-pushed the arrow-11877 branch from af8e312 to 10af1e6 Compare March 9, 2021 14:45

ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee

a98327c

lidavidm force-pushed the arrow-11877 branch from 10af1e6 to a98327c Compare March 10, 2021 13:17

bkietz approved these changes Mar 10, 2021

View reviewed changes

bkietz closed this in 2ace1e3 Mar 10, 2021

asfimport mentioned this pull request Mar 10, 2021

[C++] Add initial microbenchmarks for Dataset internals #27719

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee #9638

ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee #9638

lidavidm commented Mar 5, 2021 •

edited

Loading

lidavidm commented Mar 5, 2021

github-actions bot commented Mar 5, 2021

bkietz left a comment

ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee #9638

ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee #9638

Conversation

lidavidm commented Mar 5, 2021 • edited Loading

lidavidm commented Mar 5, 2021

github-actions bot commented Mar 5, 2021

bkietz left a comment

Choose a reason for hiding this comment

lidavidm commented Mar 5, 2021 •

edited

Loading