Skip to content

fix(bench): eliminate per-sample scheduler setup cost in history benchmarks#55

Merged
deepjoy merged 5 commits into
mainfrom
improve-bench
Mar 19, 2026
Merged

fix(bench): eliminate per-sample scheduler setup cost in history benchmarks#55
deepjoy merged 5 commits into
mainfrom
improve-bench

Conversation

@deepjoy
Copy link
Copy Markdown
Owner

@deepjoy deepjoy commented Mar 19, 2026

Summary

  • Move build_scheduler_with_history outside bench_with_input so the
    scheduler is seeded once per history_size, not once per Criterion sample.
  • Cache the critcmp binary between CI runs to avoid recompiling it on
    every workflow execution.

Problem

Each history benchmark called build_scheduler_with_history(n) inside
iter_custom, so Criterion's 20 samples for history_size=5000 triggered
100 000 task completions of setup work per benchmark group. This blew past
Criterion's default measurement-time budget and produced "unable to complete
N samples" warnings in CI.

Solution

Build the scheduler once with rt.block_on(...) before bench_with_input,
clone the TaskStore handle, and pass it into the closure. Only the actual
query (history / history_stats / history_by_type) is measured in the loop.

deepjoy added 3 commits March 18, 2026 19:18
… setup cost

Seeding N tasks was inside iter_custom, so it ran 20× per configuration.
For history_size=5000 that meant 100k task completions just for setup,
blowing Criterion's measurement time and causing "unable to complete" warnings.
Move the build outside bench_with_input so setup runs once per size.
…te warnings

Criterion's default 5s measurement window is too tight for CI runners:
with sample_size=20, each sample must finish in ≤250ms. On a 2-core
GitHub Actions runner the async overhead per sample can exceed this,
triggering "unable to complete N samples in 5.0s". Setting 30s gives
Criterion a comfortable window without making CI unreasonably slow.
@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Comparison

Click to expand
group                                       current
-----                                       -------
backoff_delay/constant                      1.00     50.4±0.03ns        ? ?/sec
backoff_delay/exponential                   1.00    219.5±0.92ns        ? ?/sec
backoff_delay/exponential_jitter            1.00    386.4±1.02ns        ? ?/sec
backoff_delay/linear                        1.00     82.0±0.23ns        ? ?/sec
batch_submit_1000                           1.00     38.6±2.08ms        ? ?/sec
byte_progress/byte_reporting_500            1.00    637.5±7.66ms        ? ?/sec
byte_progress/noop_500                      1.00   601.0±10.99ms        ? ?/sec
byte_progress_snapshot_100_tasks            1.00    191.0±2.63ms        ? ?/sec
concurrency_scaling/1                       1.00   723.7±15.12ms        ? ?/sec
concurrency_scaling/2                       1.00   592.0±12.14ms        ? ?/sec
concurrency_scaling/4                       1.00    593.1±9.96ms        ? ?/sec
concurrency_scaling/8                       1.00   596.9±10.46ms        ? ?/sec
count_by_tags/100                           1.00    120.4±5.05µs        ? ?/sec
count_by_tags/1000                          1.00    195.3±8.16µs        ? ?/sec
count_by_tags/5000                          1.00   697.6±20.41µs        ? ?/sec
dep_chain_dispatch/10                       1.00     26.9±0.45ms        ? ?/sec
dep_chain_dispatch/25                       1.00     60.9±1.23ms        ? ?/sec
dep_chain_dispatch/50                       1.00    124.0±2.35ms        ? ?/sec
dep_chain_submit/10                         1.00     11.5±0.30ms        ? ?/sec
dep_chain_submit/200                        1.00   537.2±11.58ms        ? ?/sec
dep_chain_submit/50                         1.00     51.3±2.81ms        ? ?/sec
dep_fan_in_dispatch/10                      1.00     24.2±0.29ms        ? ?/sec
dep_fan_in_dispatch/100                     1.00    143.4±3.98ms        ? ?/sec
dep_fan_in_dispatch/50                      1.00     76.8±1.77ms        ? ?/sec
dispatch_and_complete_1000                  1.00  1192.5±15.93ms        ? ?/sec
dispatch_group_scaling/1                    1.00   633.7±11.57ms        ? ?/sec
dispatch_group_scaling/10                   1.00   630.4±10.49ms        ? ?/sec
dispatch_group_scaling/100                  1.00   635.0±11.45ms        ? ?/sec
dispatch_group_scaling/50                   1.00   636.6±12.57ms        ? ?/sec
dispatch_no_groups_500                      1.00   590.2±12.87ms        ? ?/sec
dispatch_one_group_500                      1.00   635.5±12.33ms        ? ?/sec
dispatch_permanent_failure_500              1.00   542.1±11.54ms        ? ?/sec
history_by_type/100                         1.00  1108.5±14.94µs        ? ?/sec
history_by_type/1000                        1.00  1160.7±23.89µs        ? ?/sec
history_by_type/5000                        1.00  1197.5±26.51µs        ? ?/sec
history_query/100                           1.00    667.0±9.64µs        ? ?/sec
history_query/1000                          1.00    683.1±9.82µs        ? ?/sec
history_query/5000                          1.00    719.5±7.83µs        ? ?/sec
history_stats/100                           1.00    133.2±2.88µs        ? ?/sec
history_stats/1000                          1.00    328.5±2.11µs        ? ?/sec
history_stats/5000                          1.00   1205.7±5.88µs        ? ?/sec
mixed_priority_dispatch_500                 1.00   600.3±11.59ms        ? ?/sec
peek_next/100                               1.00     46.7±0.99ms        ? ?/sec
peek_next/1000                              1.00    194.9±5.01ms        ? ?/sec
peek_next/5000                              1.00   862.9±14.70ms        ? ?/sec
query_by_tags/100                           1.00  1376.2±98.56µs        ? ?/sec
query_by_tags/1000                          1.00     12.5±1.16ms        ? ?/sec
query_by_tags/5000                          1.00     63.8±6.42ms        ? ?/sec
retryable_dead_letter/constant              1.00    293.2±7.54ms        ? ?/sec
retryable_dead_letter/exponential           1.00    296.4±6.74ms        ? ?/sec
retryable_dead_letter/exponential_jitter    1.00    293.8±8.51ms        ? ?/sec
retryable_dead_letter/linear                1.00    296.1±6.36ms        ? ?/sec
submit_1000_tasks                           1.00    175.1±5.40ms        ? ?/sec
submit_dedup_hit_1000                       1.00    225.2±7.15ms        ? ?/sec
submit_with_tags/0                          1.00     91.4±3.26ms        ? ?/sec
submit_with_tags/10                         1.00    221.3±7.19ms        ? ?/sec
submit_with_tags/20                         1.00   352.2±11.02ms        ? ?/sec
submit_with_tags/5                          1.00    156.6±5.08ms        ? ?/sec
tag_values/100                              1.00    126.3±5.06µs        ? ?/sec
tag_values/1000                             1.00    188.4±6.36µs        ? ?/sec
tag_values/5000                             1.00   536.8±22.89µs        ? ?/sec

Switch all benches from iter() to iter_custom(), constructing the
scheduler/store outside the timed region so only the measured workload
is counted. Also adds pprof flamegraph support to the dep_chain_submit
group and a new profile_dep_chain example for one-shot timing breakdowns.
@deepjoy deepjoy enabled auto-merge (squash) March 19, 2026 03:57
@deepjoy deepjoy merged commit 6f2ba74 into main Mar 19, 2026
1 of 2 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 19, 2026
deepjoy pushed a commit that referenced this pull request Mar 19, 2026
## 🤖 New release

* `taskmill`: 0.5.0 -> 0.5.1 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [0.5.1](v0.5.0...v0.5.1)
- 2026-03-19

### Fixed

- *(bench)* eliminate per-sample scheduler setup cost in history
benchmarks ([#55](#55))
- *(bench)* remove premature cancellation token call in history
benchmark setup ([#54](#54))
- *(ci)* bootstrap _benchmarks branch on first push to main
([#53](#53))
- *(ci)* restore stderr capture for benchmark output on main
([#51](#51))
- *(ci)* exclude lib target from cargo bench to fix benchmark CI
([#49](#49))

### Other

- decompose internal god objects into focused, single-responsibility
modules ([#56](#56))
- eliminate stringly-typed history status and DRY violations
([#52](#52))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant