perf!: improve benchmark throughput across submit, dispatch, retry, and failure paths by deepjoy · Pull Request #80 · deepjoy/taskmill

deepjoy · 2026-03-24T04:25:31Z

Summary

Reduces per-task overhead across every scheduler hot path — submit, dispatch, retry, completion, and failure — through transaction coalescing, lazy data population, inline zero-delay retries, and query-only-what-you-need optimizations. Also brings all markdown docs and rustdoc up to date with the 0.6 API.

Breaking change

tasks_by_tags() → task_ids_by_tags() and tasks_by_tag_key_prefix() → task_ids_by_tag_key_prefix() — both now return Vec<i64> instead of Vec<TaskRecord>. Callers that need full records must follow up with task_by_id().

Performance improvements

Submit coalescing — concurrent submit() calls are batched into a single SQLite transaction via leader election; uncontended callers take a zero-overhead fast path
Skip requeue on fresh stores — dedup-hit path elides the requeue UPDATE when no task has ever been dispatched (has_running flag)
Batch terminal failures — parentless terminal failures are coalesced through an unbounded channel (mirroring completion coalescing), amortizing WAL sync; parent failures still process inline to preserve fail-fast cascade ordering
Inline zero-delay retries — retries with zero delay re-execute in the same spawned task instead of requeueing through SQLite
Lazy tag population — pop_next/peek_next/history list queries no longer JOIN tags by default; callers opt in via populate_tags() / populate_history_tags()
Covering index for history — new idx_history_type(task_type, completed_at DESC) speeds up history_stats, history_by_type, and avg_throughput
Widen completion coalescing window — yield_now() before leader-election drain lets more completions accumulate per batch
Skip no-subscriber broadcasts — gate event_tx.send() behind receiver_count() > 0

Documentation

Updated all code examples across 13 markdown files and lib.rs rustdoc for the 0.6 API (DomainTaskContext, spawn_child_with, child_of)
Documented tag query rename, covering index, dead_letter history status, and inline retry flow
Bumped stale version strings (0.3/0.4/0.5 → 0.6)

Rename `tasks_by_tags` → `task_ids_by_tags` and `tasks_by_tag_key_prefix` → `task_ids_by_tag_key_prefix` across store, scheduler, domain, and module layers. Queries now SELECT only `t.id`, skip `populate_tags`, and drop the ORDER BY clause — avoiding full row deserialization and N+1 tag lookups when callers (cancel_by_tag, cancel_by_tag_key_prefix) only need the ID. Extracts shared join-building logic into `build_tag_join_sql`.

…g channel Mirror the existing completion coalescing pattern for task failures. Parentless terminal failures are sent through an unbounded channel and drained in batches (by leader election or the run loop), amortizing SQLite WAL sync overhead. Failures with parents still process inline to preserve fail-fast cascade ordering. Also fix has_paused_tasks to start false and let the builder set it only when the persistent store actually contains paused tasks.

Move tag population out of pop_next/peek_next into explicit caller sites so the JOIN is only paid when tags are actually needed. Add inline retry loop for zero-delay retries: instead of requeueing to pending and re-popping through SQLite, re-execute the task directly in the same spawned future via increment_retry().

Make tag population opt-in for list queries (history, history_by_type, history_by_key, dead_letter_tasks, failed_tasks) to avoid N+1 tag lookups when callers don't need tags. Add idx_history_type covering index on (task_type, completed_at DESC) to speed up history_stats, history_by_type, and avg_throughput queries. Refactor history benchmarks to populate via store directly instead of spinning up a full scheduler.

Two submit-path optimizations from plan 043: 1. Submit coalescing (Option 1): TaskStore::submit() now uses a leader- election pattern (mirroring completion/failure coalescing). Concurrent callers are batched into a single BEGIN/COMMIT transaction. An uncontended fast path avoids channel overhead entirely so sequential callers see zero regression. 2. Skip requeue UPDATE (Option 2): Added `has_running` atomic flag to TaskStore. When no task has ever been dispatched (common during bulk submit-then-run), the requeue UPDATE in skip_existing() is elided, saving one SQL round-trip per dedup hit. Benchmark impact vs baseline: - submit_dedup_hit/1000: -8% (skip-requeue) - batch_submit/1000: -14% (skip-requeue) - submit_tasks/1000: neutral (fast path)

…casts Add yield_now() before leader-election drain so more completions accumulate per batch. Gate all event_tx.send() calls behind receiver_count() > 0 to avoid broadcast channel overhead when no subscribers exist. dispatch_and_complete/1000: 169µs → 149µs/task (−17%, +21% throughput)

- Rename tasks_by_tags → task_ids_by_tags, tasks_by_tag_key_prefix → task_ids_by_tag_key_prefix (now return Vec<i64>) in query-apis and multi-module-apps docs - Add inline zero-delay retry path to design.md retry flow - Document new idx_history_type covering index and missing dead_letter history status in persistence-and-recovery.md - Replace &TaskContext with DomainTaskContext<'_, D> in all code examples across 10 doc files and lib.rs rustdoc - Update spawn_child → spawn_child_with, .parent() → .child_of(&ctx) in quick-start.md - Bump stale version strings (0.3/0.4/0.5 → 0.6) in Cargo.toml snippets - Mark raw_executor as removed in configuration.md Domain builder table

github-actions · 2026-03-24T04:50:24Z

Benchmark Comparison

Click to expand

group                                       main                                    pr
-----                                       ----                                    --
backoff_delay/constant                      1.07     46.6±0.55ns 409.7 MElem/sec    1.00     43.6±0.09ns 437.8 MElem/sec
backoff_delay/exponential                   1.00    182.5±3.34ns 104.5 MElem/sec    1.02    186.2±1.52ns 102.4 MElem/sec
backoff_delay/exponential_jitter            1.00    267.0±0.65ns 71.4 MElem/sec     1.53    407.7±1.56ns 46.8 MElem/sec
backoff_delay/linear                        1.08     81.9±0.42ns 232.9 MElem/sec    1.00     76.0±0.54ns 251.0 MElem/sec
batch_submit/1000                           1.03     36.4±2.85ms 26.9 KElem/sec     1.00     35.4±3.33ms 27.6 KElem/sec
byte_progress/byte_reporting_500            1.28    258.1±5.51ms  1937 Elem/sec     1.00    202.0±4.51ms  2.4 KElem/sec
byte_progress/noop_500                      1.30    255.9±6.36ms  1954 Elem/sec     1.00    196.1±4.87ms  2.5 KElem/sec
byte_progress_snapshot/100_tasks            1.18     86.7±2.44ms  1153 Elem/sec     1.00     73.2±2.77ms  1366 Elem/sec
concurrency_scaling/1                       1.00    393.5±7.23ms  1270 Elem/sec     1.01    397.4±6.35ms  1258 Elem/sec
concurrency_scaling/2                       1.13    329.3±5.37ms  1518 Elem/sec     1.00    290.2±5.42ms  1722 Elem/sec
concurrency_scaling/4                       1.17    287.3±6.46ms  1740 Elem/sec     1.00    244.8±6.64ms  2042 Elem/sec
concurrency_scaling/8                       1.31    257.4±4.45ms  1942 Elem/sec     1.00    196.3±4.85ms  2.5 KElem/sec
count_by_tags/100                           1.00    133.2±4.52µs  7.3 KElem/sec     1.01    134.2±4.72µs  7.3 KElem/sec
count_by_tags/1000                          1.02    223.7±7.49µs  4.4 KElem/sec     1.00    220.1±4.77µs  4.4 KElem/sec
count_by_tags/5000                          1.00    615.4±6.61µs  1625 Elem/sec     1.00   616.0±10.60µs  1623 Elem/sec
dep_chain_dispatch/10                       1.01     11.4±0.12ms   875 Elem/sec     1.00     11.3±0.20ms   882 Elem/sec
dep_chain_dispatch/25                       1.01     28.1±0.44ms   888 Elem/sec     1.00     27.8±0.52ms   898 Elem/sec
dep_chain_dispatch/50                       1.02     56.4±0.83ms   885 Elem/sec     1.00     55.5±0.90ms   900 Elem/sec
dep_chain_submit/10                         1.00      3.1±0.15ms  3.2 KElem/sec     1.01      3.1±0.13ms  3.1 KElem/sec
dep_chain_submit/200                        1.00     80.9±5.47ms  2.4 KElem/sec     1.01     81.5±5.31ms  2.4 KElem/sec
dep_chain_submit/50                         1.00     17.4±0.91ms  2.8 KElem/sec     1.02     17.7±1.16ms  2.8 KElem/sec
dep_fan_in_dispatch/10                      1.17      7.3±0.07ms  1514 Elem/sec     1.00      6.2±0.13ms  1767 Elem/sec
dep_fan_in_dispatch/100                     1.27     55.4±1.00ms  1821 Elem/sec     1.00     43.7±1.02ms  2.3 KElem/sec
dep_fan_in_dispatch/50                      1.26     28.8±1.02ms  1773 Elem/sec     1.00     22.9±0.46ms  2.2 KElem/sec
dispatch_and_complete/1000                  1.29   512.7±11.82ms  1950 Elem/sec     1.00    397.7±8.55ms  2.5 KElem/sec
dispatch_group_scaling/1                    1.01    442.8±9.29ms  1129 Elem/sec     1.00    438.0±5.83ms  1141 Elem/sec
dispatch_group_scaling/10                   1.01    445.3±6.23ms  1122 Elem/sec     1.00    440.1±6.50ms  1136 Elem/sec
dispatch_group_scaling/100                  1.01    443.7±6.06ms  1126 Elem/sec     1.00    438.7±5.68ms  1139 Elem/sec
dispatch_group_scaling/50                   1.01    445.4±6.14ms  1122 Elem/sec     1.00    441.2±7.08ms  1133 Elem/sec
dispatch_no_groups/500                      1.30    258.1±5.59ms  1937 Elem/sec     1.00    198.3±4.93ms  2.5 KElem/sec
dispatch_one_group/500                      1.02    450.8±6.51ms  1109 Elem/sec     1.00    440.8±8.57ms  1134 Elem/sec
dispatch_permanent_failure/500              1.02    390.5±5.03ms  1280 Elem/sec     1.00    381.9±7.81ms  1309 Elem/sec
history_by_type/100                         3.99   885.2±22.62µs  1129 Elem/sec     1.00    221.6±5.58µs  4.4 KElem/sec
history_by_type/1000                        1.16   923.8±49.47µs  1082 Elem/sec     1.00   794.2±41.77µs  1259 Elem/sec
history_by_type/5000                        1.12   919.6±28.29µs  1087 Elem/sec     1.00   824.7±54.86µs  1212 Elem/sec
history_query/100                           1.23   540.5±11.45µs  1850 Elem/sec     1.00   440.1±14.65µs  2.2 KElem/sec
history_query/1000                          1.28    551.1±8.16µs  1814 Elem/sec     1.00   429.0±17.59µs  2.3 KElem/sec
history_query/5000                          1.31    558.5±8.98µs  1790 Elem/sec     1.00   427.7±15.19µs  2.3 KElem/sec
history_stats/100                           1.17    155.5±1.94µs  6.3 KElem/sec     1.00    132.4±1.54µs  7.4 KElem/sec
history_stats/1000                          1.85    369.4±1.28µs  2.6 KElem/sec     1.00    199.4±2.22µs  4.9 KElem/sec
history_stats/5000                          2.75   1331.9±6.86µs   750 Elem/sec     1.00    484.0±3.31µs  2.0 KElem/sec
mixed_priority_dispatch/500                 1.17    285.2±5.47ms  1753 Elem/sec     1.00    244.1±6.46ms  2.0 KElem/sec
peek_next/100                               1.00    126.4±4.47µs  7.7 KElem/sec     1.00    126.7±4.48µs  7.7 KElem/sec
peek_next/1000                              1.00    126.1±5.19µs  7.7 KElem/sec     1.02    128.4±4.79µs  7.6 KElem/sec
peek_next/5000                              1.00    126.7±4.20µs  7.7 KElem/sec     1.00    126.6±4.81µs  7.7 KElem/sec
query_by_tags/100                           1.00  1262.1±132.83µs   792 Elem/sec  
query_by_tags/1000                          1.00      9.9±0.42ms   101 Elem/sec   
query_by_tags/5000                          1.00     53.2±3.25ms    18 Elem/sec   
query_ids_by_tags/100                                                               1.00    187.6±6.29µs  5.2 KElem/sec
query_ids_by_tags/1000                                                              1.00   840.4±20.21µs  1189 Elem/sec
query_ids_by_tags/5000                                                              1.00      3.7±0.08ms   270 Elem/sec
retryable_dead_letter/constant              1.18    138.1±0.69ms   724 Elem/sec     1.00    116.7±1.04ms   856 Elem/sec
retryable_dead_letter/exponential           1.17    137.5±1.33ms   727 Elem/sec     1.00    117.2±1.18ms   852 Elem/sec
retryable_dead_letter/exponential_jitter    1.18    138.6±2.10ms   721 Elem/sec     1.00    117.5±1.41ms   851 Elem/sec
retryable_dead_letter/linear                1.18    137.4±1.34ms   727 Elem/sec     1.00    116.8±0.85ms   856 Elem/sec
submit_dedup_hit/1000                       1.15   250.6±12.71ms  3.9 KElem/sec     1.00    217.6±9.61ms  4.5 KElem/sec
submit_tasks/1000                           1.00    188.7±7.85ms  5.2 KElem/sec     1.01    190.9±7.18ms  5.1 KElem/sec
submit_with_tags/0                          1.00     93.1±4.23ms  5.2 KElem/sec     1.03     96.0±4.62ms  5.1 KElem/sec
submit_with_tags/10                         1.01   257.4±15.16ms  1942 Elem/sec     1.00   255.5±14.61ms  1957 Elem/sec
submit_with_tags/20                         1.00   415.3±22.49ms  1204 Elem/sec     1.00   416.3±23.32ms  1201 Elem/sec
submit_with_tags/5                          1.01    175.4±9.92ms  2.8 KElem/sec     1.00    173.4±8.90ms  2.8 KElem/sec
tag_values/100                              1.00    138.6±5.52µs  7.0 KElem/sec     1.02    140.7±5.46µs  6.9 KElem/sec
tag_values/1000                             1.01    200.0±6.47µs  4.9 KElem/sec     1.00    198.6±5.70µs  4.9 KElem/sec
tag_values/5000                             1.03    479.7±6.81µs  2.0 KElem/sec     1.00    466.7±7.29µs  2.1 KElem/sec

deepjoy added 8 commits March 23, 2026 17:07

style: reformat emit_event calls to block argument style

99f36e5

deepjoy merged commit dc7823a into main Mar 24, 2026
2 checks passed

github-actions Bot mentioned this pull request Mar 23, 2026

chore: release v0.7.0 #70

Merged

deepjoy mentioned this pull request Mar 24, 2026

Feature: Observability metrics export (metrics crate integration) #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf!: improve benchmark throughput across submit, dispatch, retry, and failure paths#80

perf!: improve benchmark throughput across submit, dispatch, retry, and failure paths#80
deepjoy merged 8 commits into
mainfrom
improve-benchmark-performance

deepjoy commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deepjoy commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking change

Performance improvements

Documentation

Uh oh!

github-actions Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

deepjoy commented Mar 24, 2026 •

edited

Loading

github-actions Bot commented Mar 24, 2026 •

edited

Loading