Skip to content

perf!: improve benchmark throughput across submit, dispatch, retry, and failure paths#80

Merged
deepjoy merged 8 commits into
mainfrom
improve-benchmark-performance
Mar 24, 2026
Merged

perf!: improve benchmark throughput across submit, dispatch, retry, and failure paths#80
deepjoy merged 8 commits into
mainfrom
improve-benchmark-performance

Conversation

@deepjoy
Copy link
Copy Markdown
Owner

@deepjoy deepjoy commented Mar 24, 2026

Summary

Reduces per-task overhead across every scheduler hot path — submit, dispatch, retry, completion, and failure — through transaction coalescing, lazy data population, inline zero-delay retries, and query-only-what-you-need optimizations. Also brings all markdown docs and rustdoc up to date with the 0.6 API.

Breaking change

  • tasks_by_tags()task_ids_by_tags() and tasks_by_tag_key_prefix()task_ids_by_tag_key_prefix() — both now return Vec<i64> instead of Vec<TaskRecord>. Callers that need full records must follow up with task_by_id().

Performance improvements

  • Submit coalescing — concurrent submit() calls are batched into a single SQLite transaction via leader election; uncontended callers take a zero-overhead fast path
  • Skip requeue on fresh stores — dedup-hit path elides the requeue UPDATE when no task has ever been dispatched (has_running flag)
  • Batch terminal failures — parentless terminal failures are coalesced through an unbounded channel (mirroring completion coalescing), amortizing WAL sync; parent failures still process inline to preserve fail-fast cascade ordering
  • Inline zero-delay retries — retries with zero delay re-execute in the same spawned task instead of requeueing through SQLite
  • Lazy tag populationpop_next/peek_next/history list queries no longer JOIN tags by default; callers opt in via populate_tags() / populate_history_tags()
  • Covering index for history — new idx_history_type(task_type, completed_at DESC) speeds up history_stats, history_by_type, and avg_throughput
  • Widen completion coalescing windowyield_now() before leader-election drain lets more completions accumulate per batch
  • Skip no-subscriber broadcasts — gate event_tx.send() behind receiver_count() > 0

Documentation

  • Updated all code examples across 13 markdown files and lib.rs rustdoc for the 0.6 API (DomainTaskContext, spawn_child_with, child_of)
  • Documented tag query rename, covering index, dead_letter history status, and inline retry flow
  • Bumped stale version strings (0.3/0.4/0.50.6)

deepjoy added 8 commits March 23, 2026 17:07
Rename `tasks_by_tags` → `task_ids_by_tags` and
`tasks_by_tag_key_prefix` → `task_ids_by_tag_key_prefix` across store,
scheduler, domain, and module layers. Queries now SELECT only `t.id`,
skip `populate_tags`, and drop the ORDER BY clause — avoiding full row
deserialization and N+1 tag lookups when callers (cancel_by_tag,
cancel_by_tag_key_prefix) only need the ID. Extracts shared join-building
logic into `build_tag_join_sql`.
…g channel

Mirror the existing completion coalescing pattern for task failures.
Parentless terminal failures are sent through an unbounded channel and
drained in batches (by leader election or the run loop), amortizing
SQLite WAL sync overhead. Failures with parents still process inline
to preserve fail-fast cascade ordering.

Also fix has_paused_tasks to start false and let the builder set it
only when the persistent store actually contains paused tasks.
Move tag population out of pop_next/peek_next into explicit caller
sites so the JOIN is only paid when tags are actually needed.

Add inline retry loop for zero-delay retries: instead of requeueing
to pending and re-popping through SQLite, re-execute the task
directly in the same spawned future via increment_retry().
Make tag population opt-in for list queries (history, history_by_type,
history_by_key, dead_letter_tasks, failed_tasks) to avoid N+1 tag
lookups when callers don't need tags. Add idx_history_type covering
index on (task_type, completed_at DESC) to speed up history_stats,
history_by_type, and avg_throughput queries. Refactor history benchmarks
to populate via store directly instead of spinning up a full scheduler.
Two submit-path optimizations from plan 043:

1. Submit coalescing (Option 1): TaskStore::submit() now uses a leader-
   election pattern (mirroring completion/failure coalescing). Concurrent
   callers are batched into a single BEGIN/COMMIT transaction. An
   uncontended fast path avoids channel overhead entirely so sequential
   callers see zero regression.

2. Skip requeue UPDATE (Option 2): Added `has_running` atomic flag to
   TaskStore. When no task has ever been dispatched (common during bulk
   submit-then-run), the requeue UPDATE in skip_existing() is elided,
   saving one SQL round-trip per dedup hit.

Benchmark impact vs baseline:
- submit_dedup_hit/1000: -8% (skip-requeue)
- batch_submit/1000:     -14% (skip-requeue)
- submit_tasks/1000:     neutral (fast path)
…casts

Add yield_now() before leader-election drain so more completions
accumulate per batch. Gate all event_tx.send() calls behind
receiver_count() > 0 to avoid broadcast channel overhead when no
subscribers exist.

dispatch_and_complete/1000: 169µs → 149µs/task (−17%, +21% throughput)
- Rename tasks_by_tags → task_ids_by_tags, tasks_by_tag_key_prefix →
  task_ids_by_tag_key_prefix (now return Vec<i64>) in query-apis and
  multi-module-apps docs
- Add inline zero-delay retry path to design.md retry flow
- Document new idx_history_type covering index and missing dead_letter
  history status in persistence-and-recovery.md
- Replace &TaskContext with DomainTaskContext<'_, D> in all code
  examples across 10 doc files and lib.rs rustdoc
- Update spawn_child → spawn_child_with, .parent() → .child_of(&ctx)
  in quick-start.md
- Bump stale version strings (0.3/0.4/0.5 → 0.6) in Cargo.toml
  snippets
- Mark raw_executor as removed in configuration.md Domain builder table
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Benchmark Comparison

Click to expand
group                                       main                                    pr
-----                                       ----                                    --
backoff_delay/constant                      1.07     46.6±0.55ns 409.7 MElem/sec    1.00     43.6±0.09ns 437.8 MElem/sec
backoff_delay/exponential                   1.00    182.5±3.34ns 104.5 MElem/sec    1.02    186.2±1.52ns 102.4 MElem/sec
backoff_delay/exponential_jitter            1.00    267.0±0.65ns 71.4 MElem/sec     1.53    407.7±1.56ns 46.8 MElem/sec
backoff_delay/linear                        1.08     81.9±0.42ns 232.9 MElem/sec    1.00     76.0±0.54ns 251.0 MElem/sec
batch_submit/1000                           1.03     36.4±2.85ms 26.9 KElem/sec     1.00     35.4±3.33ms 27.6 KElem/sec
byte_progress/byte_reporting_500            1.28    258.1±5.51ms  1937 Elem/sec     1.00    202.0±4.51ms  2.4 KElem/sec
byte_progress/noop_500                      1.30    255.9±6.36ms  1954 Elem/sec     1.00    196.1±4.87ms  2.5 KElem/sec
byte_progress_snapshot/100_tasks            1.18     86.7±2.44ms  1153 Elem/sec     1.00     73.2±2.77ms  1366 Elem/sec
concurrency_scaling/1                       1.00    393.5±7.23ms  1270 Elem/sec     1.01    397.4±6.35ms  1258 Elem/sec
concurrency_scaling/2                       1.13    329.3±5.37ms  1518 Elem/sec     1.00    290.2±5.42ms  1722 Elem/sec
concurrency_scaling/4                       1.17    287.3±6.46ms  1740 Elem/sec     1.00    244.8±6.64ms  2042 Elem/sec
concurrency_scaling/8                       1.31    257.4±4.45ms  1942 Elem/sec     1.00    196.3±4.85ms  2.5 KElem/sec
count_by_tags/100                           1.00    133.2±4.52µs  7.3 KElem/sec     1.01    134.2±4.72µs  7.3 KElem/sec
count_by_tags/1000                          1.02    223.7±7.49µs  4.4 KElem/sec     1.00    220.1±4.77µs  4.4 KElem/sec
count_by_tags/5000                          1.00    615.4±6.61µs  1625 Elem/sec     1.00   616.0±10.60µs  1623 Elem/sec
dep_chain_dispatch/10                       1.01     11.4±0.12ms   875 Elem/sec     1.00     11.3±0.20ms   882 Elem/sec
dep_chain_dispatch/25                       1.01     28.1±0.44ms   888 Elem/sec     1.00     27.8±0.52ms   898 Elem/sec
dep_chain_dispatch/50                       1.02     56.4±0.83ms   885 Elem/sec     1.00     55.5±0.90ms   900 Elem/sec
dep_chain_submit/10                         1.00      3.1±0.15ms  3.2 KElem/sec     1.01      3.1±0.13ms  3.1 KElem/sec
dep_chain_submit/200                        1.00     80.9±5.47ms  2.4 KElem/sec     1.01     81.5±5.31ms  2.4 KElem/sec
dep_chain_submit/50                         1.00     17.4±0.91ms  2.8 KElem/sec     1.02     17.7±1.16ms  2.8 KElem/sec
dep_fan_in_dispatch/10                      1.17      7.3±0.07ms  1514 Elem/sec     1.00      6.2±0.13ms  1767 Elem/sec
dep_fan_in_dispatch/100                     1.27     55.4±1.00ms  1821 Elem/sec     1.00     43.7±1.02ms  2.3 KElem/sec
dep_fan_in_dispatch/50                      1.26     28.8±1.02ms  1773 Elem/sec     1.00     22.9±0.46ms  2.2 KElem/sec
dispatch_and_complete/1000                  1.29   512.7±11.82ms  1950 Elem/sec     1.00    397.7±8.55ms  2.5 KElem/sec
dispatch_group_scaling/1                    1.01    442.8±9.29ms  1129 Elem/sec     1.00    438.0±5.83ms  1141 Elem/sec
dispatch_group_scaling/10                   1.01    445.3±6.23ms  1122 Elem/sec     1.00    440.1±6.50ms  1136 Elem/sec
dispatch_group_scaling/100                  1.01    443.7±6.06ms  1126 Elem/sec     1.00    438.7±5.68ms  1139 Elem/sec
dispatch_group_scaling/50                   1.01    445.4±6.14ms  1122 Elem/sec     1.00    441.2±7.08ms  1133 Elem/sec
dispatch_no_groups/500                      1.30    258.1±5.59ms  1937 Elem/sec     1.00    198.3±4.93ms  2.5 KElem/sec
dispatch_one_group/500                      1.02    450.8±6.51ms  1109 Elem/sec     1.00    440.8±8.57ms  1134 Elem/sec
dispatch_permanent_failure/500              1.02    390.5±5.03ms  1280 Elem/sec     1.00    381.9±7.81ms  1309 Elem/sec
history_by_type/100                         3.99   885.2±22.62µs  1129 Elem/sec     1.00    221.6±5.58µs  4.4 KElem/sec
history_by_type/1000                        1.16   923.8±49.47µs  1082 Elem/sec     1.00   794.2±41.77µs  1259 Elem/sec
history_by_type/5000                        1.12   919.6±28.29µs  1087 Elem/sec     1.00   824.7±54.86µs  1212 Elem/sec
history_query/100                           1.23   540.5±11.45µs  1850 Elem/sec     1.00   440.1±14.65µs  2.2 KElem/sec
history_query/1000                          1.28    551.1±8.16µs  1814 Elem/sec     1.00   429.0±17.59µs  2.3 KElem/sec
history_query/5000                          1.31    558.5±8.98µs  1790 Elem/sec     1.00   427.7±15.19µs  2.3 KElem/sec
history_stats/100                           1.17    155.5±1.94µs  6.3 KElem/sec     1.00    132.4±1.54µs  7.4 KElem/sec
history_stats/1000                          1.85    369.4±1.28µs  2.6 KElem/sec     1.00    199.4±2.22µs  4.9 KElem/sec
history_stats/5000                          2.75   1331.9±6.86µs   750 Elem/sec     1.00    484.0±3.31µs  2.0 KElem/sec
mixed_priority_dispatch/500                 1.17    285.2±5.47ms  1753 Elem/sec     1.00    244.1±6.46ms  2.0 KElem/sec
peek_next/100                               1.00    126.4±4.47µs  7.7 KElem/sec     1.00    126.7±4.48µs  7.7 KElem/sec
peek_next/1000                              1.00    126.1±5.19µs  7.7 KElem/sec     1.02    128.4±4.79µs  7.6 KElem/sec
peek_next/5000                              1.00    126.7±4.20µs  7.7 KElem/sec     1.00    126.6±4.81µs  7.7 KElem/sec
query_by_tags/100                           1.00  1262.1±132.83µs   792 Elem/sec  
query_by_tags/1000                          1.00      9.9±0.42ms   101 Elem/sec   
query_by_tags/5000                          1.00     53.2±3.25ms    18 Elem/sec   
query_ids_by_tags/100                                                               1.00    187.6±6.29µs  5.2 KElem/sec
query_ids_by_tags/1000                                                              1.00   840.4±20.21µs  1189 Elem/sec
query_ids_by_tags/5000                                                              1.00      3.7±0.08ms   270 Elem/sec
retryable_dead_letter/constant              1.18    138.1±0.69ms   724 Elem/sec     1.00    116.7±1.04ms   856 Elem/sec
retryable_dead_letter/exponential           1.17    137.5±1.33ms   727 Elem/sec     1.00    117.2±1.18ms   852 Elem/sec
retryable_dead_letter/exponential_jitter    1.18    138.6±2.10ms   721 Elem/sec     1.00    117.5±1.41ms   851 Elem/sec
retryable_dead_letter/linear                1.18    137.4±1.34ms   727 Elem/sec     1.00    116.8±0.85ms   856 Elem/sec
submit_dedup_hit/1000                       1.15   250.6±12.71ms  3.9 KElem/sec     1.00    217.6±9.61ms  4.5 KElem/sec
submit_tasks/1000                           1.00    188.7±7.85ms  5.2 KElem/sec     1.01    190.9±7.18ms  5.1 KElem/sec
submit_with_tags/0                          1.00     93.1±4.23ms  5.2 KElem/sec     1.03     96.0±4.62ms  5.1 KElem/sec
submit_with_tags/10                         1.01   257.4±15.16ms  1942 Elem/sec     1.00   255.5±14.61ms  1957 Elem/sec
submit_with_tags/20                         1.00   415.3±22.49ms  1204 Elem/sec     1.00   416.3±23.32ms  1201 Elem/sec
submit_with_tags/5                          1.01    175.4±9.92ms  2.8 KElem/sec     1.00    173.4±8.90ms  2.8 KElem/sec
tag_values/100                              1.00    138.6±5.52µs  7.0 KElem/sec     1.02    140.7±5.46µs  6.9 KElem/sec
tag_values/1000                             1.01    200.0±6.47µs  4.9 KElem/sec     1.00    198.6±5.70µs  4.9 KElem/sec
tag_values/5000                             1.03    479.7±6.81µs  2.0 KElem/sec     1.00    466.7±7.29µs  2.1 KElem/sec

@deepjoy deepjoy merged commit dc7823a into main Mar 24, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant