Skip to content

feat!: implement priority aging and weighted fair scheduling#84

Merged
deepjoy merged 3 commits into
mainfrom
fair-scheduling
Mar 24, 2026
Merged

feat!: implement priority aging and weighted fair scheduling#84
deepjoy merged 3 commits into
mainfrom
fair-scheduling

Conversation

@deepjoy
Copy link
Copy Markdown
Owner

@deepjoy deepjoy commented Mar 24, 2026

Summary

  • Priority aging (phase 1): Tasks waiting longer than a configurable grace period are gradually promoted in effective priority at dispatch time, preventing starvation of low-priority work under sustained high-priority load. The stored priority is never mutated — effective priority is computed in SQL via the aging formula. Pause duration is excluded from the aging clock so paused tasks don't get unfair promotion.
  • Weighted fair scheduling (phase 2): Three-pass dispatch loop allocates slots proportional to per-group weights, fills remaining capacity greedily, and dispatches urgently-aged tasks as a safety valve. Ungrouped tasks compete as a virtual group with the default weight. Min-slots guarantees and concurrency caps are respected. Work-conserving by construction.
  • Composes with existing features: Rate limits, group pause, concurrency caps, preemption, and backpressure all interact correctly. Fast dispatch is disabled when aging or weights are configured.

Closes #37

New public API

Builder Runtime Event
priority_aging(AgingConfig) base_priority / effective_priority on TaskEventHeader
group_weight(group, weight) set_group_weight(group, weight) GroupWeightChanged
default_group_weight(weight) remove_group_weight(group)
group_minimum_slots(group, slots) reset_group_weights()
set_group_minimum_slots(group, slots)

New store queries

peek_next_in_group(), peek_next_ungrouped(), running_counts_per_group(), pending_counts_per_group(), peek_next_urgent()

Schema changes (pre-1.0, inline)

pause_duration_ms INTEGER NOT NULL DEFAULT 0 and paused_at_ms INTEGER DEFAULT NULL added to tasks table for aging clock management.

deepjoy added 2 commits March 24, 2026 00:53
…ase 1)

Add dispatch-time priority aging that gradually promotes tasks waiting
longer than a configurable grace period. Effective priority is computed
in SQL (no write amplification) and capped at `max_effective_priority`.

- New `AgingConfig` type with grace_period, aging_interval,
  max_effective_priority, and urgent_threshold
- Modified peek_next/pop_next/pop_next_batch with aging ORDER BY clause
- Schema: pause_duration_ms and paused_at_ms columns for clock freezing
- All pause/resume paths accumulate pause_duration_ms correctly
- Crash recovery accumulates stale pause duration
- TaskEventHeader carries base_priority and effective_priority
- SchedulerSnapshot exposes aging_config
- Child tasks inherit parent's effective priority when aging enabled
- SchedulerBuilder::priority_aging() for opt-in configuration
- Zero overhead when aging is disabled (original query preserved)
…, phase 2)

Add three-pass fair dispatch loop that allocates slots proportional to
group weights, fills remaining capacity greedily, and dispatches
urgently-aged tasks as a safety valve. Groups without explicit weights
use the default weight; ungrouped tasks compete as a virtual group.

New builder API: group_weight(), default_group_weight(),
group_minimum_slots(). Runtime API: set_group_weight(),
remove_group_weight(), reset_group_weights(),
set_group_minimum_slots(). GroupWeightChanged event emitted on
runtime changes. SchedulerSnapshot includes group_allocations.

Store queries added: peek_next_in_group(), peek_next_ungrouped(),
running/pending_counts_per_group(), peek_next_urgent().

Composes with phase 1 aging, rate limits, group pause, and
concurrency caps. Fast dispatch disabled when weights configured.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

Benchmark Comparison

Click to expand
group                                       main                                    pr
-----                                       ----                                    --
backoff_delay/constant                      1.08     47.6±0.09ns 400.7 MElem/sec    1.00     44.3±0.35ns 430.9 MElem/sec
backoff_delay/exponential                   1.00    187.9±1.79ns 101.5 MElem/sec    1.02    191.0±1.22ns 99.8 MElem/sec
backoff_delay/exponential_jitter            1.00    270.1±0.96ns 70.6 MElem/sec     1.51    407.4±3.95ns 46.8 MElem/sec
backoff_delay/linear                        1.05     80.9±0.20ns 235.6 MElem/sec    1.00     76.7±4.52ns 248.6 MElem/sec
batch_submit/1000                           1.00     32.6±2.34ms 29.9 KElem/sec     1.13     36.7±3.10ms 26.6 KElem/sec
byte_progress/byte_reporting_500            1.00    193.1±3.35ms  2.5 KElem/sec     1.05    203.4±5.63ms  2.4 KElem/sec
byte_progress/noop_500                      1.00    179.1±5.23ms  2.7 KElem/sec     1.14    204.5±6.15ms  2.4 KElem/sec
byte_progress_snapshot/100_tasks            1.00     80.7±2.16ms  1239 Elem/sec     1.06     85.7±4.44ms  1166 Elem/sec
concurrency_scaling/1                       1.00    374.2±3.88ms  1336 Elem/sec     1.08    403.5±6.79ms  1239 Elem/sec
concurrency_scaling/2                       1.00    277.0±6.30ms  1804 Elem/sec     1.05    291.7±5.93ms  1714 Elem/sec
concurrency_scaling/4                       1.00    230.5±9.35ms  2.1 KElem/sec     1.07    247.6±7.16ms  2019 Elem/sec
concurrency_scaling/8                       1.00    178.0±4.77ms  2.7 KElem/sec     1.12    199.9±6.79ms  2.4 KElem/sec
count_by_tags/100                           1.00    126.6±2.98µs  7.7 KElem/sec     1.06    134.8±5.84µs  7.2 KElem/sec
count_by_tags/1000                          1.00    215.1±2.97µs  4.5 KElem/sec     1.04    223.8±4.78µs  4.4 KElem/sec
count_by_tags/5000                          1.04   654.8±27.32µs  1527 Elem/sec     1.00    628.1±8.71µs  1592 Elem/sec
dep_chain_dispatch/10                       1.00     10.8±0.22ms   922 Elem/sec     1.07     11.6±0.17ms   862 Elem/sec
dep_chain_dispatch/25                       1.00     26.4±0.37ms   948 Elem/sec     1.09     28.7±0.61ms   869 Elem/sec
dep_chain_dispatch/50                       1.00     52.8±0.74ms   947 Elem/sec     1.09     57.5±0.72ms   869 Elem/sec
dep_chain_submit/10                         1.00      3.0±0.11ms  3.3 KElem/sec     1.06      3.2±0.14ms  3.1 KElem/sec
dep_chain_submit/200                        1.00     76.3±3.60ms  2.6 KElem/sec     1.09     83.4±5.94ms  2.3 KElem/sec
dep_chain_submit/50                         1.00     16.4±0.72ms  3.0 KElem/sec     1.08     17.8±1.15ms  2.7 KElem/sec
dep_fan_in_dispatch/10                      1.00      5.9±0.27ms  1869 Elem/sec     1.09      6.4±0.14ms  1716 Elem/sec
dep_fan_in_dispatch/100                     1.00     40.5±1.05ms  2.4 KElem/sec     1.09     44.2±1.19ms  2.2 KElem/sec
dep_fan_in_dispatch/50                      1.00     21.1±0.43ms  2.4 KElem/sec     1.11     23.4±0.69ms  2.1 KElem/sec
dispatch_and_complete/1000                  1.00    357.9±4.71ms  2.7 KElem/sec     1.10    395.5±8.86ms  2.5 KElem/sec
dispatch_group_scaling/1                    1.00    417.4±7.80ms  1197 Elem/sec     1.08    452.8±9.16ms  1104 Elem/sec
dispatch_group_scaling/10                   1.00    413.2±6.55ms  1210 Elem/sec     1.10    453.0±7.18ms  1103 Elem/sec
dispatch_group_scaling/100                  1.00   415.9±11.16ms  1202 Elem/sec     1.09    451.7±8.26ms  1107 Elem/sec
dispatch_group_scaling/50                   1.00    414.6±7.38ms  1206 Elem/sec     1.09    451.4±7.84ms  1107 Elem/sec
dispatch_no_groups/500                      1.00    180.1±5.06ms  2.7 KElem/sec     1.12    201.3±6.47ms  2.4 KElem/sec
dispatch_one_group/500                      1.00    416.4±6.62ms  1200 Elem/sec     1.08    450.7±7.81ms  1109 Elem/sec
dispatch_permanent_failure/500              1.00    347.7±7.67ms  1437 Elem/sec     1.10    382.6±6.77ms  1306 Elem/sec
history_by_type/100                         1.00    220.1±5.59µs  4.4 KElem/sec     1.01    223.3±6.22µs  4.4 KElem/sec
history_by_type/1000                        1.00   815.2±30.52µs  1226 Elem/sec     1.02   827.5±42.08µs  1208 Elem/sec
history_by_type/5000                        1.00   814.9±59.75µs  1227 Elem/sec     1.04   844.4±68.40µs  1184 Elem/sec
history_query/100                           1.00   433.9±18.15µs  2.3 KElem/sec     1.01   438.3±16.14µs  2.2 KElem/sec
history_query/1000                          1.00   431.6±19.66µs  2.3 KElem/sec     1.07   459.9±24.07µs  2.1 KElem/sec
history_query/5000                          1.01   432.4±23.76µs  2.3 KElem/sec     1.00   426.3±23.79µs  2.3 KElem/sec
history_stats/100                           1.00    126.5±1.00µs  7.7 KElem/sec     1.06    133.8±1.12µs  7.3 KElem/sec
history_stats/1000                          1.00    192.2±2.49µs  5.1 KElem/sec     1.04    199.5±0.97µs  4.9 KElem/sec
history_stats/5000                          1.00    475.0±2.09µs  2.1 KElem/sec     1.03    490.2±4.76µs  2039 Elem/sec
mixed_priority_dispatch/500                 1.00    230.0±8.53ms  2.1 KElem/sec     1.08    247.9±7.20ms  2016 Elem/sec
peek_next/100                               1.00    119.8±7.86µs  8.2 KElem/sec     1.07    127.6±5.68µs  7.7 KElem/sec
peek_next/1000                              1.00    119.9±2.80µs  8.1 KElem/sec     1.08    129.2±6.84µs  7.6 KElem/sec
peek_next/5000                              1.00    119.7±2.25µs  8.2 KElem/sec     1.07    128.1±5.01µs  7.6 KElem/sec
query_ids_by_tags/100                       1.00    186.0±1.88µs  5.2 KElem/sec     1.03    191.6±5.91µs  5.1 KElem/sec
query_ids_by_tags/1000                      1.00    807.7±4.98µs  1238 Elem/sec     1.05   847.2±19.42µs  1180 Elem/sec
query_ids_by_tags/5000                      1.00      3.5±0.05ms   282 Elem/sec     1.04      3.7±0.06ms   272 Elem/sec
retryable_dead_letter/constant              1.00    105.4±1.03ms   949 Elem/sec     1.13    118.9±1.14ms   840 Elem/sec
retryable_dead_letter/exponential           1.00    105.1±0.80ms   951 Elem/sec     1.12    117.7±1.68ms   849 Elem/sec
retryable_dead_letter/exponential_jitter    1.00    105.6±0.93ms   946 Elem/sec     1.12    118.2±2.29ms   845 Elem/sec
retryable_dead_letter/linear                1.00    104.8±0.88ms   954 Elem/sec     1.12    117.0±1.33ms   854 Elem/sec
submit_dedup_hit/1000                       1.00    208.2±8.20ms  4.7 KElem/sec     1.06    220.5±9.71ms  4.4 KElem/sec
submit_tasks/1000                           1.00    180.8±4.46ms  5.4 KElem/sec     1.06    192.2±8.75ms  5.1 KElem/sec
submit_with_tags/0                          1.00     90.0±3.34ms  5.4 KElem/sec     1.09     98.1±5.95ms  5.0 KElem/sec
submit_with_tags/10                         1.00   239.2±10.80ms  2.0 KElem/sec     1.09   259.7±15.50ms  1925 Elem/sec
submit_with_tags/20                         1.00   394.3±18.56ms  1268 Elem/sec     1.06   417.2±26.96ms  1198 Elem/sec
submit_with_tags/5                          1.00    165.8±6.84ms  2.9 KElem/sec     1.07   177.9±11.43ms  2.7 KElem/sec
tag_values/100                              1.00    133.4±3.09µs  7.3 KElem/sec     1.07    142.9±5.72µs  6.8 KElem/sec
tag_values/1000                             1.00    193.6±3.45µs  5.0 KElem/sec     1.05    202.6±6.46µs  4.8 KElem/sec
tag_values/5000                             1.00    455.1±4.96µs  2.1 KElem/sec     1.03    469.1±8.15µs  2.1 KElem/sec

- Add priority aging and weighted fair scheduling sections to priorities-and-preemption.md
- Add AgingConfig, group weight, and fair scheduling builder methods to configuration.md
- Add aging.rs, fair.rs, rate_limit.rs to module map and update dispatch cycle in design.md
- Add glossary entries: effective priority, priority aging, group weight, fair scheduling, urgent threshold
- Document GroupWeightChanged event, updated TaskEventHeader fields, and snapshot fields in progress-and-events.md
- Add pause_duration_ms and paused_at_ms columns to schema docs in persistence-and-recovery.md
- Update module starvation guidance in multi-module-apps.md to recommend aging and group weights
- Update snapshot field listing in query-apis.md
- Add priority aging and fair scheduling to lib.rs crate-level docs and feature list
- Fix broken TaskEventHeader rustdoc link in task/mod.rs
@deepjoy deepjoy enabled auto-merge (squash) March 24, 2026 13:52
@deepjoy deepjoy merged commit cb48f0c into main Mar 24, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Priority aging and weighted fair scheduling across groups

1 participant