chacha20: Use criterion for benchmarking #62

tarcieri · 2019-10-19T17:20:17Z

...with the criterion-cycles-per-byte plugin

This unfortunately means we can no-longer run tests for chacha20 against Rust 1.27.0, since it adds 2018 edition dev-dependencies. However, we can still confirm release builds against this version work.

As we're starting to reach a microoptimization stage, I think criterion will be extremely helpful in determining if our microoptimizations are actually improving performance.

Currently I'm getting between 5.1 - 5.8 cpb across the tests on a Kaby Lake i7. Curiously -Ctarget-cpu=native seems to negatively impact performance. I'm not seeing much difference between the software and SSE2 backends: +5% on the chacha20/apply_keystream/1024 benchmark, negligible on the others (i.e. +1-2%)

tarcieri · 2019-10-19T17:20:46Z

@srijs this should unblock using other dev-dependencies which require a recent Rust, I hope

...with the `criterion-cycles-per-byte` plugin This unfortunately means we can no-longer run tests for chacha20 against Rust 1.27.0, since it adds 2018 edition `dev-dependencies`. However, we can still confirm release builds against this version work. As we're starting to reach a microoptimization stage, I think criterion will be extremely helpful in determining if our microoptimizations are actually improving performance. Currently I'm getting between 5.1 - 5.8 cpb across the tests on a Kaby Lake i7. Curiously `-Ctarget-cpu=native` seems to negatively impact performance. I'm not seeing much difference between the software and SSE2 backends: +5% on the `chacha20/apply_keystream/1024` benchmark, negligable on the others (i.e. +1-2%)

srijs · 2019-10-20T01:04:51Z

It's weird that you see so little difference here...

On my machine, across all input sizes, criterion shows around +25% improvement on throughput between sse2 and the scalar version, which scalar at ~4.3cpb and sse2 at ~3.4cpb.

tarcieri · 2019-10-20T01:08:47Z

@srijs are you using something other than RUSTFLAGS="-Ctarget-feature=+sse2"?

srijs · 2019-10-20T01:10:18Z

I'm not using any additional RUSTFLAGS right now, just plain cargo +nightly bench.

Let me see what happens if I add this flag explicitly (although afaik rustc should enable the sse2 feature for my target by default).

tarcieri · 2019-10-20T01:16:18Z

How are you switching between the two implementations then? Setting target-cpu? Something else?

srijs · 2019-10-20T01:18:56Z

Applying the flag does not change the outcome for me.

To switch between implementations, I've just been reverting my commit. Perhaps we could add an on-by-default feature flag that can be turned off to opt-out of simd, which may come in useful for these types of comparative benchmarks.

If you're using the flag to switch, are you sure you're not benchmarking the same sse2 implementation both times? Afaik sse2 is enabled by default on a few targets (mine included).

tarcieri · 2019-10-20T01:22:00Z

If you're using the flag to switch, are you sure you're not benchmarking the same sse2 implementation both times?

That seems like the most likely explanation, thanks

tarcieri · 2019-10-20T03:25:20Z

On my machine, across all input sizes, criterion shows around +25% improvement on throughput between sse2 and the scalar version, which scalar at ~4.3cpb and sse2 at ~3.4cpb.

Just tried reverting the commit and I'm seeing similar numbers: ~6.7 cpb without, 5.3 cpb with, for a ~25% speedup

newpavlov · 2019-11-18T01:41:07Z

Curiously -Ctarget-cpu=native seems to negatively impact performance.

Yeah, I've observed the same effect several times on different crates as well. Initially I thought the reason is in AVX2 frequency scaling, but now I think that LLVM does not work that well with target-cpu=native for some reason.

tarcieri force-pushed the chacha20/criterion branch from 2c55331 to 9562b98 Compare October 19, 2019 17:30

tarcieri merged commit 033af9f into master Oct 19, 2019

tarcieri deleted the chacha20/criterion branch October 19, 2019 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chacha20: Use criterion for benchmarking #62

chacha20: Use criterion for benchmarking #62

tarcieri commented Oct 19, 2019

tarcieri commented Oct 19, 2019 •

edited

srijs commented Oct 20, 2019

tarcieri commented Oct 20, 2019

srijs commented Oct 20, 2019

tarcieri commented Oct 20, 2019

srijs commented Oct 20, 2019 •

edited

tarcieri commented Oct 20, 2019

tarcieri commented Oct 20, 2019

newpavlov commented Nov 18, 2019

chacha20: Use criterion for benchmarking #62

chacha20: Use criterion for benchmarking #62

Conversation

tarcieri commented Oct 19, 2019

tarcieri commented Oct 19, 2019 • edited

srijs commented Oct 20, 2019

tarcieri commented Oct 20, 2019

srijs commented Oct 20, 2019

tarcieri commented Oct 20, 2019

srijs commented Oct 20, 2019 • edited

tarcieri commented Oct 20, 2019

tarcieri commented Oct 20, 2019

newpavlov commented Nov 18, 2019

tarcieri commented Oct 19, 2019 •

edited

srijs commented Oct 20, 2019 •

edited