Skip to content

Better Shuttle Benchmarks#189

Merged
sarsko merged 1 commit intoawslabs:mainfrom
dylanjwolff:better-benchmarks
Aug 6, 2025
Merged

Better Shuttle Benchmarks#189
sarsko merged 1 commit intoawslabs:mainfrom
dylanjwolff:better-benchmarks

Conversation

@dylanjwolff
Copy link
Copy Markdown
Contributor

This PR improves the benchmarking suite of Shuttle.

(A) The benchmarks can now be run without vector-clock overhead by using the bench-no-vector-clocks feature. This feature is now enabled in the benchmarking Github action.

(B) The benchmarks have also been expanded and parameterized by the number of tasks and number of total events across all threads. More specifically:

  1. Creation and Startup:create.rs has been added to give a rough gauge of thread creation time and Shuttle startup overhead
  2. Parameterization: The existing benchmarks in counter.rs and lock.rs have been parameterized and set to run in a "narrow" [5 tasks] and "wide" [100 tasks] configuration. The number of total events/operations remains constant [10000 events] across these configurations (the number of events per task is computed from the tasks and the total number of events).
  3. Scaling An additional set of benchmark groups has been added for showing scaling as the number of tasks and events increase. The sampling rate and warmup time has been greatly reduced for these to allow for the many new configurations (3*5*2=30) to be run in a reasonable amount of time. As such, statistical comparisons on individual data-points from these scaling benchmarks are not particularly informative. Unfortunately, Criterion doesn't have an automatic way to visualize trends, so the reports generated for these are not easy to interpret. In the future, it might be useful to generate custom reports from the raw data instead.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment thread .github/workflows/tests.yml
@sarsko
Copy link
Copy Markdown
Contributor

sarsko commented Aug 1, 2025

Do we want an "even wider" bench (eg 1000 or 10000 tasks)? 100 is just not that many.

@sarsko
Copy link
Copy Markdown
Contributor

sarsko commented Aug 1, 2025

Just noting that bench-no-vector-clocks is fine by me (rumor has it that I suggested it), but it is also kinda bad style. Features are supposed to be additive, and this breaks that. We should have the feature separated from the rest, with a comment explaining, and also comment that this feature should never ever exist in a Cargo.TOML (or other config file), ie. it should only be invoked as we do here, as a part of a cargo ... command.

@dylanjwolff dylanjwolff force-pushed the better-benchmarks branch 2 times, most recently from 17fa1b5 to 2a476c2 Compare August 1, 2025 03:54
@dylanjwolff
Copy link
Copy Markdown
Contributor Author

Do we want an "even wider" bench (eg 1000 or 10000 tasks)? 100 is just not that many.

I added an instance of the scaling benchmark with 1024 tasks. I'd guess that any issues with scaling threads should be visible at that size.

Just noting that bench-no-vector-clocks is fine by me (rumor has it that I suggested it), but it is also kinda bad style. Features are supposed to be additive, and this breaks that. We should have the feature separated from the rest, with a comment explaining, and also comment that this feature should never ever exist in a Cargo.TOML (or other config file), ie. it should only be invoked as we do here, as a part of a cargo ... command.

Yeah, it's a bit unfortunate, but I do suspect mostly it won't be noticed or used. I've added a comment to Cargo.toml.

Comment thread shuttle/src/sync/once.rs
@sarsko sarsko merged commit 39581cb into awslabs:main Aug 6, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants