-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add historical performance benchmark #4083
Conversation
#[cfg(not(feature = "s2n-only"))] | ||
{ | ||
bench_handshake_for_library::<RustlsHarness>( | ||
&mut bench_group, | ||
"rustls", | ||
handshake_type, | ||
ec_group, | ||
); | ||
bench_handshake_for_library::<OpenSslHarness>( | ||
&mut bench_group, | ||
"openssl", | ||
handshake_type, | ||
ec_group, | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this as a feature, is it possible to have 3 different bench harness (one for each TLS provider)? Ideally you can only run Openssl or Rustls also.
cargo bench -- s2n --exact --no-fail-fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline, decided that keeping the feature was ultimately worth it because historical performance builds/runs faster without compiling the rustls/openssl benches. However, we'll rename the feature to historical-perf
to more accurately reflect the purpose of the feature and call out what it does in the readme.
// generate all inputs (TlsBenchHarness structs) before benchmarking handshakes | ||
// timing only includes negotiation, not config/connection initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// generate all inputs (TlsBenchHarness structs) before benchmarking handshakes | |
// timing only includes negotiation, not config/connection initialization | |
// Generate all harnesses (TlsBenchHarness structs) so that benchmark only include | |
// negotiation and not config/connection initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much time does conn/config initialization take? It should be pretty small since should be using s2n_config_new_minimal by default now. We had a regression with that so it might actually be good to have a benchmark for that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but that would require a little bit of a refactor to the current benching harness, especially to separate out the config initialization and the conn initialization. I have a refactor planned that includes separating those two out, and I could possibly add a bench for that then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that the initialization took quite a bit of time? Taking out the initialization part was one of the first things I did, so it's been a while since I benched with it.
// if harness invalid, do nothing but don't panic | ||
// useful for historical performance bench to ignore configs | ||
// invalid only for past versions of s2n-tls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if harness invalid, do nothing but don't panic | |
// useful for historical performance bench to ignore configs | |
// invalid only for past versions of s2n-tls | |
// Ignore failure since some past versions of s2n-tls have a different API and therefore | |
// fail to build. Since the results are graphed as part of historical perf, its possible to | |
// extrapolate any missing data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fail-safe part of the code isn't because of API changes causing build failure, but rather API changes that cause a runtime error on initialization (namely trying to use security policies that were only added recently). I'm going to change the comment to harness with certain parameters fail to initialize for some past versions of s2n-tls, but missing data can be visually interpolated in the historical performance graph
.
nit: I've been generally using lowercase for inline comments like // ...
but capitalize sentences for doc comments with /// ...
. Is capitalizing multiline inline comments a specific style I should be following, personal preference, a mix of those, or etc.?
# immediately bail if any command fails | ||
set -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to right above bin/bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the shebang needs to be the first thing, but I can move the set -e
above the suppressing stdout.
…; responded to pr comments
}; | ||
|
||
/// Return (f64, f64) of (mean, standard error) from Criterion result JSON | ||
fn process_single_json(path: &Path) -> (f64, f64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this more "rusty", you can define a custom data structure:
struct BenchOutput {
mean: f64,
std_err: f64
}
You can also define a custom error and then return a Result<BenchOutput, CustomErr>
.
For custom errors checkout https://github.com/dtolnay/thiserror
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also replace your BenchGroupData
path::Path, | ||
}; | ||
|
||
/// Return (f64, f64) of (mean, standard error) from Criterion result JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments should just be 2 //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using the ///
for doc comments, including on private methods, for readability. Even if it should be //
, the rest of the bench codebase already uses ///
for all function descriptions (but does use //
everywhere else).
} | ||
|
||
/// Plots given data with given chart parameters | ||
fn plot_bench_groups<F: Fn(&f64) -> String>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you are parsing and plotting the data here. you could separate this to make it easier to understand and reason about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to a folder called bench/scripts/bench_historical_perf.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the script in the separate historical-perf
folder mainly as a place to hold the .svg artifacts that the script outputs. If I move the script into a scripts
folder where would the .svg outputs live?
Also, if there was a scripts
folder, the use-awslc-*.sh
scripts and memory/bench-memory.sh
from the memory bench PR, and certs/generate_certs.sh
should all maybe also be moved there too then (from certs/
)? Overall, I feel like these scripts have not a lot in common with each other, and it'd make sense to separate them out.
It probably isn't ideal to have so many scripts strewn everywhere, but a lot of different tools and build configs need to be clobbered together and just using cargo hasn't let me do what we need to bench everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .svg can be output to the target folder since thats where artifacts go. Like you said there are alot of scripts and its good to have them in one place. I would say yea all the scripts should move to the same folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait for a refactor pr (which should be right after all the current bench prs get merged) then to change it, since this would span multiple pr's and current work.
let versions = get_unique_versions(&handshake_data) | ||
.into_iter() | ||
.chain(get_unique_versions(&throughput_data).into_iter()) | ||
.chain((15..16).chain(30..38).map(|p| Version::new(1, 3, p))) | ||
.collect::<BTreeSet<Version>>() | ||
.into_iter() | ||
.collect::<Vec<Version>>(); | ||
|
||
// map versions to x coordinates | ||
let version_to_x = versions | ||
.iter() | ||
.enumerate() | ||
.map(|(i, version)| (version, i as i32)) | ||
.collect::<HashMap<&Version, i32>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wonder if its possible to simplify this. Try to avoid doing .collect
more than once, since that allocates. The cool thing about operations on iter (map, chain, filter) is that they are lazy.. meaning that it doest create a collection at each step. A collection is only created when you do collect
so its ideal to have only one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chained the collect::<BTreeSet<Version>>()
and collect::Vec<Version>()
calls because I need to sort and remove duplicates from the iterator, but I think extending a set from one get_unique_versions()
call would be better; just changed it.
bindings/rust/bench/README.md
Outdated
- Since the benches are run over a long time, noise on the machine can cause variability, as seen in the throughput graph. | ||
- The variability can be seen with throughput especially because it is calculated as the inverse of time taken. | ||
|
||
![historical-perf-handshake](https://github.com/tinzh/s2n-tls/assets/76919968/b6448634-e6d1-4724-ab91-7efc26485274) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the charts in the markdown makes sense I think, since you're able to add some context about the results here and it isn't just an image checked in by itself. However, in this case, I think I'd rather the images be in the s2n-tls repo than linked to in your fork. Maybes in bench/images or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it. Do you see all of the images, including the ones from memory benching, living here? Or just these particular images because they're in the readme (because they take a while to generate?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it necessarily makes sense to have standalone images committed into the repo. It seems better if they're included as part of the documentation, so you can give some context for them like you did in this readme. So I think images should just be included that are used in the readme.
I think it also makes sense particularly for the historical benchmarking because the single snapshot is useful. For other benchmarking, showing results from old versions (after we create releases and don't update the charts) doesn't seem as useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it. I'll add *.svg back into the gitignore but just specifically allowlist the historical-perf svgs. Maybe other generated images can still be put into images/
, but they just won't be checked in.
# make Cargo.toml point s2n-tls to the cloned old version | ||
sed -i "s|s2n-tls = .*|s2n-tls = { path = \"target/s2n-tls/bindings/rust/s2n-tls\" }|" Cargo.toml | ||
|
||
# ensure Cargo.toml gets changed back on exit; retains original exit status | ||
trap "{ status=$?; sed -i 's|s2n-tls = .*|s2n-tls = { path = \"../s2n-tls\" }|' $bench_path/Cargo.toml; exit $status; }" EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I missed this. Modifying the source is pretty hacky. If you really want to do this then I would suggest a Cargo.template
which you modify and we can only run the project from the shell scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the Cargo.template work? Is it just a copy of the Cargo.toml that's modified with the new path? Or is it a special cargo feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can specify a different Cargo.toml
, but --manifest-path
requires the different toml file to be called exactly Cargo.toml
, so it'd need to be stored in a different directory. However, storing the Cargo.toml
in a different directory changes all of the path in the Cargo.toml
since it's all relative to where the Cargo.toml
lives. I really couldn't find a better option. Comment above talking about this: #4083 (comment)
Also, I wanted everything to run with cargo bench
without a separate run script, since it makes the most sense for the user experience. Yes this is a little hacky, but I think it's well worth the ease of use, and it seems to be the only way to do it.
commit eafb8a2 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 18:05:50 2023 -0400 shell -> bash commit 12071b8 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 17:59:09 2023 -0400 add ubuntu quickstart back to readme commit 10bf557 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 17:52:06 2023 -0400 fixes commit 74adf8d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 16:46:19 2023 -0400 fixes commit 0548d07 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 16:43:08 2023 -0400 consolidate commit cbe8f2d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 14:55:30 2023 -0400 remove old doc sections commit f194321 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 12:25:28 2023 -0400 more content commit 882eb1d Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 09:08:24 2023 -0400 fixes commit ce37d0e Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Mon Jul 31 09:03:45 2023 -0400 fixes commit 011d15f Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 22:59:51 2023 -0400 cmake consuming commit 7feadc1 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 22:27:02 2023 -0400 fixes commit 2914950 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 21:34:24 2023 -0400 traditional make commit 02f9841 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 19:45:43 2023 -0400 s2n-tls build section commit 86c4983 Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Sat Jul 29 11:56:32 2023 -0400 Update build documentation commit ea6d02a Author: Sam Clark <3758302+goatgoose@users.noreply.github.com> Date: Fri Jul 28 16:49:21 2023 -0400 bindings: release 0.0.35 (aws#4122) commit 35d08ba Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Fri Jul 28 12:31:21 2023 -0700 refactor(bench): separate out client and server connections in benching harness (aws#4113) Enables more better control of connections for benching experiments commit 65e74ca Author: Lindsay Stewart <slindsay@amazon.com> Date: Wed Jul 26 02:26:40 2023 -0700 Print error for 32bit test (aws#4107) commit b0b253e Author: toidiu <apoorv@toidiu.com> Date: Wed Jul 26 00:30:44 2023 -0700 ktls: set keys on socket and enable ktls (aws#4071) commit 403d5e6 Author: Lindsay Stewart <slindsay@amazon.com> Date: Tue Jul 25 16:03:09 2023 -0700 Trying to use an invalid ticket should not mutate state (aws#4110) commit bce2b1a Author: James Mayclin <maycj@amazon.com> Date: Tue Jul 25 14:44:33 2023 -0700 fix: get_session behavior for TLS 1.3 (aws#4104) commit 6881358 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Tue Jul 25 10:10:21 2023 -0700 feat(bench): add different certificate signature algorithms to benchmarks (aws#4080) commit aab13d5 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Mon Jul 24 18:17:30 2023 -0700 feat(bench): add memory bench with valgrind/massif (aws#4081) commit 20b0174 Author: Justin Zhang <76919968+tinzh@users.noreply.github.com> Date: Mon Jul 24 13:26:32 2023 -0700 feat(bench): add historical performance benchmark (aws#4083) commit 5cc827d Author: Doug Chapman <54039637+dougch@users.noreply.github.com> Date: Thu Jul 20 11:50:50 2023 -0700 nix: pin corretto version (aws#4103)
Description of changes:
Add historical performance benchmark up to v1.3.16 (Jun 2022), before which there are breaking API changes that require refactor of bench harness. Script checks out old version of s2n-tls, runs benches, and stores results in a csv. The graph_perf binary parses the results csv and generates a graph.
Testing:
This adds no library code, so no new tests are needed. To test, just run benchmarks.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.