Extend aggregation benchmarks #5096

jhorstmann · 2023-11-18T18:53:36Z

Which issue does this PR close?

Preparation for #5032.

Rationale for this change

To better evaluate a autovectorized version of the aggregation kernels we should benchmark more data types and not only f32.

I also noticed that because of the relatively small batch size, the final reduction step of multi-lane aggregations has a large impact on the total timings. The PR increases the batch size to 64k, which matches the batch size used in the arithmetic and comparison benchmarks.

What changes are included in this PR?

Add benchmarks for float64 and integer types
Measure throughput
Increase batch size so that the final reduction step has less of an impact

Are there any user-facing changes?

no

- Add benchmarks for float64 and integer types - Measure throughput - Increase batch size so that the final reduction step has less of an impact

tustvold · 2023-11-18T20:58:52Z

arrow/benches/aggregate_kernels.rs

-    criterion::black_box(min_string(arr_a).unwrap());
+fn primitive_benchmark<T: ArrowNumericType>(c: &mut Criterion, name: &str)
+where
+    Standard: Distribution<T::Native>,


I doubt it matters for this benchmark, but it is perhaps worth noting that the standard distribution for floats is only between 0 and 1. I don't think this would make a difference to timings, but FYI

Good to know, and agree it shouldn't affect the timings. The bound is required by bench_utils::create_primitive_array.

tustvold · 2023-11-18T20:59:36Z

arrow/benches/aggregate_kernels.rs

+        .throughput(Throughput::Bytes(
+            (std::mem::size_of::<T::Native>() * BATCH_SIZE) as u64,
+        ))
+        .bench_function("sum nonnull", |b| b.iter(|| sum(&nonnull_array)))


I'm surprised this isn't overflowing, unless sum always wraps?

It is indeed always wrapping, scalar version goes through ArrowNativeTypeOp::add_wrapping and I guess the simd version wraps by default. There seems to be a separate sum_checked kernels, I'm not sure yet whether that could be vectorized.

Extend aggregation benchmarks

8dcf811

- Add benchmarks for float64 and integer types - Measure throughput - Increase batch size so that the final reduction step has less of an impact

github-actions bot added the arrow Changes to the arrow crate label Nov 18, 2023

tustvold approved these changes Nov 18, 2023

View reviewed changes

tustvold merged commit 61da64a into apache:master Nov 18, 2023
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend aggregation benchmarks #5096

Extend aggregation benchmarks #5096

jhorstmann commented Nov 18, 2023

tustvold Nov 18, 2023

jhorstmann Nov 18, 2023

tustvold Nov 18, 2023

jhorstmann Nov 18, 2023

Extend aggregation benchmarks #5096

Extend aggregation benchmarks #5096

Conversation

jhorstmann commented Nov 18, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold Nov 18, 2023

Choose a reason for hiding this comment

jhorstmann Nov 18, 2023

Choose a reason for hiding this comment

tustvold Nov 18, 2023

Choose a reason for hiding this comment

jhorstmann Nov 18, 2023

Choose a reason for hiding this comment