Add historical performance benchmark #4083

tinzh · 2023-06-29T23:02:48Z

Description of changes:

Add historical performance benchmark up to v1.3.16 (Jun 2022), before which there are breaking API changes that require refactor of bench harness. Script checks out old version of s2n-tls, runs benches, and stores results in a csv. The graph_perf binary parses the results csv and generates a graph.

Testing:

This adds no library code, so no new tests are needed. To test, just run benchmarks.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

bindings/rust/bench/historical-perf/historical-perf.svg

bindings/rust/bench/historical-perf/bench-past.sh

bindings/rust/bench/src/bin/graph_perf.rs

bindings/rust/bench/historical-perf/bench-past.sh

bindings/rust/bench/benches/handshake.rs

bindings/rust/bench/historical-perf/bench-past.sh

bindings/rust/bench/Cargo.toml

toidiu · 2023-07-12T20:52:27Z

bindings/rust/bench/benches/handshake.rs

+            #[cfg(not(feature = "s2n-only"))]
+            {
+                bench_handshake_for_library::<RustlsHarness>(
+                    &mut bench_group,
+                    "rustls",
+                    handshake_type,
+                    ec_group,
+                );
+                bench_handshake_for_library::<OpenSslHarness>(
+                    &mut bench_group,
+                    "openssl",
+                    handshake_type,
+                    ec_group,
+                );
+            }


Rather than doing this as a feature, is it possible to have 3 different bench harness (one for each TLS provider)? Ideally you can only run Openssl or Rustls also.

cargo bench -- s2n --exact --no-fail-fast

Talked offline, decided that keeping the feature was ultimately worth it because historical performance builds/runs faster without compiling the rustls/openssl benches. However, we'll rename the feature to historical-perf to more accurately reflect the purpose of the feature and call out what it does in the readme.

toidiu · 2023-07-12T20:57:32Z

bindings/rust/bench/benches/handshake.rs

+        // generate all inputs (TlsBenchHarness structs) before benchmarking handshakes
+        // timing only includes negotiation, not config/connection initialization


Suggested change

// generate all inputs (TlsBenchHarness structs) before benchmarking handshakes

// timing only includes negotiation, not config/connection initialization

// Generate all harnesses (TlsBenchHarness structs) so that benchmark only include

// negotiation and not config/connection initialization

How much time does conn/config initialization take? It should be pretty small since should be using s2n_config_new_minimal by default now. We had a regression with that so it might actually be good to have a benchmark for that separately.

Good idea, but that would require a little bit of a refactor to the current benching harness, especially to separate out the config initialization and the conn initialization. I have a refactor planned that includes separating those two out, and I could possibly add a bench for that then.

I'm pretty sure that the initialization took quite a bit of time? Taking out the initialization part was one of the first things I did, so it's been a while since I benched with it.

toidiu · 2023-07-12T21:06:52Z

bindings/rust/bench/benches/handshake.rs

+                    // if harness invalid, do nothing but don't panic
+                    // useful for historical performance bench to ignore configs
+                    // invalid only for past versions of s2n-tls


Suggested change

// if harness invalid, do nothing but don't panic

// useful for historical performance bench to ignore configs

// invalid only for past versions of s2n-tls

// Ignore failure since some past versions of s2n-tls have a different API and therefore

// fail to build. Since the results are graphed as part of historical perf, its possible to

// extrapolate any missing data.

This fail-safe part of the code isn't because of API changes causing build failure, but rather API changes that cause a runtime error on initialization (namely trying to use security policies that were only added recently). I'm going to change the comment to harness with certain parameters fail to initialize for some past versions of s2n-tls, but missing data can be visually interpolated in the historical performance graph.

nit: I've been generally using lowercase for inline comments like // ... but capitalize sentences for doc comments with /// .... Is capitalizing multiline inline comments a specific style I should be following, personal preference, a mix of those, or etc.?

toidiu · 2023-07-12T21:09:25Z

bindings/rust/bench/historical-perf/bench-past.sh

+# immediately bail if any command fails
+set -e


can you move this to right above bin/bash

I think the shebang needs to be the first thing, but I can move the set -e above the suppressing stdout.

bindings/rust/bench/historical-perf/bench-past.sh

…; responded to pr comments

bindings/rust/bench/historical-perf/bench-past.sh

bindings/rust/bench/src/bin/graph_perf.rs

toidiu · 2023-07-17T18:26:13Z

bindings/rust/bench/src/bin/graph_perf.rs

+};
+
+/// Return (f64, f64) of (mean, standard error) from Criterion result JSON
+fn process_single_json(path: &Path) -> (f64, f64) {


To make this more "rusty", you can define a custom data structure:

struct BenchOutput { mean: f64, std_err: f64 }

You can also define a custom error and then return a Result<BenchOutput, CustomErr>.

For custom errors checkout https://github.com/dtolnay/thiserror

This could also replace your BenchGroupData

toidiu · 2023-07-17T18:27:11Z

bindings/rust/bench/src/bin/graph_perf.rs

+    path::Path,
+};
+
+/// Return (f64, f64) of (mean, standard error) from Criterion result JSON


comments should just be 2 //

I've been using the /// for doc comments, including on private methods, for readability. Even if it should be //, the rest of the bench codebase already uses /// for all function descriptions (but does use // everywhere else).

toidiu · 2023-07-17T18:30:19Z

bindings/rust/bench/src/bin/graph_perf.rs

+}
+
+/// Plots given data with given chart parameters
+fn plot_bench_groups<F: Fn(&f64) -> String>(


looks like you are parsing and plotting the data here. you could separate this to make it easier to understand and reason about

bindings/rust/bench/historical-perf/bench-past.sh

toidiu · 2023-07-18T18:31:59Z

bindings/rust/bench/historical-perf/bench-past.sh

can you move this to a folder called bench/scripts/bench_historical_perf.sh

I have the script in the separate historical-perf folder mainly as a place to hold the .svg artifacts that the script outputs. If I move the script into a scripts folder where would the .svg outputs live?

Also, if there was a scripts folder, the use-awslc-*.sh scripts and memory/bench-memory.sh from the memory bench PR, and certs/generate_certs.sh should all maybe also be moved there too then (from certs/)? Overall, I feel like these scripts have not a lot in common with each other, and it'd make sense to separate them out.

It probably isn't ideal to have so many scripts strewn everywhere, but a lot of different tools and build configs need to be clobbered together and just using cargo hasn't let me do what we need to bench everything.

The .svg can be output to the target folder since thats where artifacts go. Like you said there are alot of scripts and its good to have them in one place. I would say yea all the scripts should move to the same folder.

I'll wait for a refactor pr (which should be right after all the current bench prs get merged) then to change it, since this would span multiple pr's and current work.

toidiu · 2023-07-18T22:47:03Z

bindings/rust/bench/src/bin/graph_perf.rs

+    let versions = get_unique_versions(&handshake_data)
+        .into_iter()
+        .chain(get_unique_versions(&throughput_data).into_iter())
+        .chain((15..16).chain(30..38).map(|p| Version::new(1, 3, p)))
+        .collect::<BTreeSet<Version>>()
+        .into_iter()
+        .collect::<Vec<Version>>();
+
+    // map versions to x coordinates
+    let version_to_x = versions
+        .iter()
+        .enumerate()
+        .map(|(i, version)| (version, i as i32))
+        .collect::<HashMap<&Version, i32>>();


nit: wonder if its possible to simplify this. Try to avoid doing .collect more than once, since that allocates. The cool thing about operations on iter (map, chain, filter) is that they are lazy.. meaning that it doest create a collection at each step. A collection is only created when you do collect so its ideal to have only one.

I chained the collect::<BTreeSet<Version>>() and collect::Vec<Version>() calls because I need to sort and remove duplicates from the iterator, but I think extending a set from one get_unique_versions() call would be better; just changed it.

goatgoose · 2023-07-19T16:40:46Z

bindings/rust/bench/README.md

+- Since the benches are run over a long time, noise on the machine can cause variability, as seen in the throughput graph.
+- The variability can be seen with throughput especially because it is calculated as the inverse of time taken.
+
+![historical-perf-handshake](https://github.com/tinzh/s2n-tls/assets/76919968/b6448634-e6d1-4724-ab91-7efc26485274)


Putting the charts in the markdown makes sense I think, since you're able to add some context about the results here and it isn't just an image checked in by itself. However, in this case, I think I'd rather the images be in the s2n-tls repo than linked to in your fork. Maybes in bench/images or something?

Ok got it. Do you see all of the images, including the ones from memory benching, living here? Or just these particular images because they're in the readme (because they take a while to generate?)

I don't think it necessarily makes sense to have standalone images committed into the repo. It seems better if they're included as part of the documentation, so you can give some context for them like you did in this readme. So I think images should just be included that are used in the readme.

I think it also makes sense particularly for the historical benchmarking because the single snapshot is useful. For other benchmarking, showing results from old versions (after we create releases and don't update the charts) doesn't seem as useful.

Ok got it. I'll add *.svg back into the gitignore but just specifically allowlist the historical-perf svgs. Maybe other generated images can still be put into images/, but they just won't be checked in.

bindings/rust/bench/README.md

bindings/rust/bench/src/bin/graph_perf.rs

toidiu · 2023-07-19T18:19:52Z

bindings/rust/bench/historical-perf/bench-past.sh

+# make Cargo.toml point s2n-tls to the cloned old version
+sed -i "s|s2n-tls = .*|s2n-tls = { path = \"target/s2n-tls/bindings/rust/s2n-tls\" }|" Cargo.toml 
+
+# ensure Cargo.toml gets changed back on exit; retains original exit status
+trap "{ status=$?; sed -i 's|s2n-tls = .*|s2n-tls = { path = \"../s2n-tls\" }|' $bench_path/Cargo.toml; exit $status; }" EXIT


Ahh I missed this. Modifying the source is pretty hacky. If you really want to do this then I would suggest a Cargo.template which you modify and we can only run the project from the shell scripts.

How would the Cargo.template work? Is it just a copy of the Cargo.toml that's modified with the new path? Or is it a special cargo feature?

You can specify a different Cargo.toml, but --manifest-path requires the different toml file to be called exactly Cargo.toml, so it'd need to be stored in a different directory. However, storing the Cargo.toml in a different directory changes all of the path in the Cargo.toml since it's all relative to where the Cargo.toml lives. I really couldn't find a better option. Comment above talking about this: #4083 (comment)

Also, I wanted everything to run with cargo bench without a separate run script, since it makes the most sense for the user experience. Yes this is a little hacky, but I think it's well worth the ease of use, and it seems to be the only way to do it.

commit eafb8a2 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 18:05:50 2023 -0400 shell -> bash commit 12071b8 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 17:59:09 2023 -0400 add ubuntu quickstart back to readme commit 10bf557 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 17:52:06 2023 -0400 fixes commit 74adf8d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 16:46:19 2023 -0400 fixes commit 0548d07 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 16:43:08 2023 -0400 consolidate commit cbe8f2d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 14:55:30 2023 -0400 remove old doc sections commit f194321 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 12:25:28 2023 -0400 more content commit 882eb1d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 09:08:24 2023 -0400 fixes commit ce37d0e Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 09:03:45 2023 -0400 fixes commit 011d15f Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 22:59:51 2023 -0400 cmake consuming commit 7feadc1 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 22:27:02 2023 -0400 fixes commit 2914950 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 21:34:24 2023 -0400 traditional make commit 02f9841 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 19:45:43 2023 -0400 s2n-tls build section commit 86c4983 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 11:56:32 2023 -0400 Update build documentation commit ea6d02a Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Fri Jul 28 16:49:21 2023 -0400 bindings: release 0.0.35 (aws#4122) commit 35d08ba Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Fri Jul 28 12:31:21 2023 -0700 refactor(bench): separate out client and server connections in benching harness (aws#4113) Enables more better control of connections for benching experiments commit 65e74ca Author: Lindsay Stewart <slindsay@amazon.com> Date: Wed Jul 26 02:26:40 2023 -0700 Print error for 32bit test (aws#4107) commit b0b253e Author: toidiu <apoorv@toidiu.com> Date: Wed Jul 26 00:30:44 2023 -0700 ktls: set keys on socket and enable ktls (aws#4071) commit 403d5e6 Author: Lindsay Stewart <slindsay@amazon.com> Date: Tue Jul 25 16:03:09 2023 -0700 Trying to use an invalid ticket should not mutate state (aws#4110) commit bce2b1a Author: James Mayclin <maycj@amazon.com> Date: Tue Jul 25 14:44:33 2023 -0700 fix: get_session behavior for TLS 1.3 (aws#4104) commit 6881358 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Tue Jul 25 10:10:21 2023 -0700 feat(bench): add different certificate signature algorithms to benchmarks (aws#4080) commit aab13d5 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Mon Jul 24 18:17:30 2023 -0700 feat(bench): add memory bench with valgrind/massif (aws#4081) commit 20b0174 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Mon Jul 24 13:26:32 2023 -0700 feat(bench): add historical performance benchmark (aws#4083) commit 5cc827d Author: Doug Chapman <54039637+dougch@users.noreply.github.com> Date: Thu Jul 20 11:50:50 2023 -0700 nix: pin corretto version (aws#4103)

tinzh added 4 commits June 29, 2023 22:59

historical benchmarking

fc1b4c4

updated historical perf bash script, moved files around

bc422ed

made graph of historical benchmarks better

250e497

added comments, cleaned up bench-past.sh

df88281

tinzh marked this pull request as ready for review June 29, 2023 23:46

tinzh requested review from maddeleine and jmayclin June 29, 2023 23:46

fixed fmt and clippy errors

53f66c6

tinzh requested review from goatgoose and toidiu and removed request for maddeleine and jmayclin July 6, 2023 23:03

goatgoose reviewed Jul 7, 2023

View reviewed changes

updated comments and changed benching script

449804f

tinzh requested a review from goatgoose July 7, 2023 21:59

goatgoose reviewed Jul 10, 2023

View reviewed changes

bindings/rust/bench/historical-perf/bench-past.sh Outdated Show resolved Hide resolved

made bench-past.sh more fail-safe

5224448

tinzh requested a review from goatgoose July 10, 2023 23:55

Merge remote-tracking branch 'origin/main' into historical-perf

4b9e918

tinzh force-pushed the historical-perf branch from ac8201a to 4b9e918 Compare July 11, 2023 00:08

changed sed to include quotes in replaced dependency

31359cc

goatgoose mentioned this pull request Jul 11, 2023

Add memory bench with valgrind/massif #4081

Merged

made plotters dependency more broad

0394b20

toidiu reviewed Jul 12, 2023

View reviewed changes

tinzh and others added 5 commits July 13, 2023 02:01

changed how historical benching worked, consolidated parsing+graphing…

4ca4cfb

…; responded to pr comments

Merge remote-tracking branch 'origin/main' into historical-perf

0f5933e

added hist bench for throughput

ab9598c

Update README.md with sample hist perf graphs

3eb6293

updated plotting for throughput

2940295

tinzh force-pushed the historical-perf branch from 8933f3b to 2940295 Compare July 14, 2023 19:18

tinzh and others added 2 commits July 14, 2023 20:46

changed padding on graph

dac82d3

Update README.md

ec5bc82

tinzh requested a review from toidiu July 14, 2023 22:03

toidiu reviewed Jul 17, 2023

View reviewed changes

tinzh and others added 3 commits July 18, 2023 01:42

refactor

be63ddf

Update README.md with graphs from latest commit

2bcecea

added key points on x axis back to graph

04f13d1

tinzh requested a review from toidiu July 18, 2023 16:03

toidiu reviewed Jul 18, 2023

View reviewed changes

changed shebang

23d32a1

tinzh requested a review from toidiu July 18, 2023 19:26

toidiu approved these changes Jul 18, 2023

View reviewed changes

toidiu reviewed Jul 18, 2023

View reviewed changes

changed version union in graph perf

5585a51

goatgoose reviewed Jul 19, 2023

View reviewed changes

pr feedback

e05e197

tinzh requested a review from goatgoose July 19, 2023 18:10

toidiu reviewed Jul 19, 2023

View reviewed changes

add back gitignore

516096a

goatgoose approved these changes Jul 21, 2023

View reviewed changes

Merge branch 'main' into historical-perf

afc9c9d

tinzh enabled auto-merge (squash) July 21, 2023 20:55

dougch added the s2n-core team label Jul 24, 2023

tinzh merged commit 20b0174 into aws:main Jul 24, 2023
27 of 28 checks passed

tinzh deleted the historical-perf branch July 25, 2023 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add historical performance benchmark #4083

Add historical performance benchmark #4083

tinzh commented Jun 29, 2023

toidiu Jul 12, 2023

tinzh Jul 12, 2023

toidiu Jul 12, 2023

toidiu Jul 12, 2023

tinzh Jul 12, 2023

tinzh Jul 12, 2023

toidiu Jul 12, 2023

tinzh Jul 12, 2023 •

edited

Loading

toidiu Jul 12, 2023

tinzh Jul 12, 2023

toidiu Jul 17, 2023

toidiu Jul 17, 2023

toidiu Jul 17, 2023

tinzh Jul 18, 2023

toidiu Jul 17, 2023

toidiu Jul 18, 2023

tinzh Jul 18, 2023 •

edited

Loading

toidiu Jul 18, 2023

tinzh Jul 19, 2023

toidiu Jul 18, 2023

tinzh Jul 19, 2023

goatgoose Jul 19, 2023

tinzh Jul 19, 2023

goatgoose Jul 19, 2023 •

edited

Loading

tinzh Jul 19, 2023

toidiu Jul 19, 2023

goatgoose Jul 19, 2023

tinzh Jul 19, 2023

		// generate all inputs (TlsBenchHarness structs) before benchmarking handshakes
		// timing only includes negotiation, not config/connection initialization

Add historical performance benchmark #4083

Add historical performance benchmark #4083

Conversation

tinzh commented Jun 29, 2023

Description of changes:

Testing:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tinzh Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tinzh Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goatgoose Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tinzh Jul 12, 2023 •

edited

Loading

tinzh Jul 18, 2023 •

edited

Loading

goatgoose Jul 19, 2023 •

edited

Loading