Summary
Add two interrelated scheduling fairness mechanisms:
- Priority aging — gradually escalate a task's effective priority the longer it waits, preventing starvation of low-priority work
- Weighted fair scheduling — distribute scheduler capacity proportionally across groups based on configurable weights, preventing one group from monopolizing the scheduler
These are two facets of the same problem: ensuring all submitted work eventually gets a fair share of execution capacity, regardless of priority level or group membership.
Motivation
In a long-running sync engine (mantle watch), the scheduler runs continuously with mixed workloads:
| Source |
Priority |
Volume |
Group |
| User-triggered sync |
HIGH |
50 files |
s3://prod |
| Background watch (profile A) |
LOW |
50,000 files |
s3://prod |
| Background watch (profile B) |
LOW |
200 files |
s3://b2-backup |
| Maintenance tasks |
LOW |
5 tasks |
(various) |
Problem 1: Priority starvation
If HIGH-priority tasks keep arriving (user triggers multiple syncs), LOW-priority background tasks never run. In the worst case, a watch-mode sync that detected 50,000 changes sits in the queue indefinitely, its TTL expires, and the entire diff is lost — requiring a full re-scan on the next poll cycle.
Problem 2: Group monopolization
Group concurrency limits are caps ("endpoint X gets at most 6 slots"), but they don't guarantee minimum throughput. Without fair scheduling:
Scheduler capacity: 16 concurrent tasks
Profile A submits: 50,000 tasks to group "s3://prod"
Profile B submits: 200 tasks to group "s3://b2-backup"
Result: Profile A fills all 16 slots (within its group cap of 16).
Profile B gets nothing until A's queue drains.
With weighted fair scheduling:
Profile A weight: 3 (75% of capacity → 12 slots)
Profile B weight: 1 (25% of capacity → 4 slots)
Result: Both profiles make progress proportionally.
Why these are interrelated
Priority aging and fair scheduling interact in subtle ways. Consider:
- A LOW-priority task in a high-weight group vs. a HIGH-priority task in a low-weight group — which runs first?
- If aging promotes a task to HIGH priority, should it bypass the group's weight allocation?
- If a group's weight allocation is exhausted but the global scheduler has idle capacity, should the excess go to overweight groups or be reserved?
These questions need a unified answer, not two independent mechanisms bolted together.
Proposed Behavior
Priority Aging
let scheduler = Scheduler::builder()
.priority_aging(AgingConfig {
// After this duration in the queue, start aging
grace_period: Duration::from_secs(300), // 5 min
// Promote one priority level per interval
aging_interval: Duration::from_secs(60), // every 60s after grace
// Never age above this ceiling (don't let background tasks
// compete with CRITICAL priority)
max_effective_priority: Priority::HIGH,
})
.build();
Effective priority is computed dynamically at dispatch time:
effective_priority = min(
base_priority + (time_in_queue - grace_period) / aging_interval,
max_effective_priority
)
- The task's stored priority never changes — aging is a dispatch-time calculation
- Aged priority is visible in
SchedulerSnapshot and events as effective_priority alongside base_priority
- When a task is superseded or resubmitted, the aging clock resets
- Recurring task instances start with a fresh aging clock each occurrence
Edge cases:
- Tasks with TTL: aging and TTL interact — a task could age to HIGH priority right before its TTL expires. This is correct behavior (last-ditch attempt to run before expiry)
- Retried tasks: should aging clock reset on retry, or continue from original submission? Recommend: continue from original submission (the task has been waiting even longer)
- Paused tasks: aging clock should freeze while paused (same as TTL freeze)
- Child tasks: inherit the parent's current effective priority, not base priority
Weighted Fair Scheduling
let scheduler = Scheduler::builder()
.max_concurrent_tasks(16)
// Weights are relative — these give A 75%, B 25% of capacity
.group_weight("s3://prod", 3)
.group_weight("s3://b2-backup", 1)
// Groups without explicit weight get the default
.default_group_weight(1)
// Minimum guaranteed slots per group (regardless of weight)
.group_minimum_slots("s3://b2-backup", 2)
.build();
Slot allocation algorithm:
- Each group gets
min_slots guaranteed (if configured)
- Remaining capacity is distributed by weight ratio
- If a group doesn't need its full allocation (fewer pending tasks than slots), excess capacity is redistributed to groups that can use it (work-conserving)
- Group concurrency limits (caps) still apply on top of weight allocation
Total capacity: 16 slots
Group A weight: 3, cap: 12, pending: 50,000
Group B weight: 1, cap: 6, pending: 200, min_slots: 2
Step 1: Guarantee minimums → B gets 2, remaining = 14
Step 2: Distribute by weight → A gets 14*(3/4)=10.5→10, B gets 14*(1/4)=3.5→4
Step 3: Apply caps → A: min(10, 12)=10, B: min(2+4, 6)=6
Step 4: Total=16 ✓
If B only has 3 pending tasks:
B uses 3 slots, 3 excess redistributed to A
A gets 13 (still under cap of 12... capped at 12)
1 slot idle (all groups at cap or drained)
Runtime adjustment:
// Dynamically adjust weights (e.g., during business hours, prioritize prod)
scheduler.set_group_weight("s3://prod", 5);
scheduler.set_group_weight("s3://b2-backup", 1);
// Reset to equal weights
scheduler.reset_group_weights();
Interaction Between Aging and Fair Scheduling
The dispatch loop should work as follows:
- Slot allocation: Determine how many slots each group gets (via weights)
- Within each group's allocation: Pick the highest effective priority task (with aging applied)
- Cross-group priority override: If a task has aged past a configurable "urgent threshold" (
Priority::CRITICAL), it can borrow a slot from an underweight group. This prevents truly starved tasks from being blocked by fair scheduling
.priority_aging(AgingConfig {
// ...
// When effective priority reaches CRITICAL, task can bypass group weights
urgent_threshold: Some(Priority::CRITICAL),
})
Metrics
| Metric |
Type |
Description |
taskmill_task_queue_wait_seconds |
Histogram |
Time from submission to dispatch (tracks starvation) |
taskmill_task_effective_priority |
Gauge |
Current effective priority distribution |
taskmill_aging_promotions_total |
Counter |
How many times tasks were aged up |
taskmill_group_weight_allocation |
Gauge |
Current slot allocation per group |
taskmill_group_weight_utilization |
Gauge |
Slots used / slots allocated per group |
taskmill_group_excess_redistributed |
Counter |
Unused slots redistributed to other groups |
Events
SchedulerEvent::TaskAged {
task_id: TaskId,
base_priority: Priority,
effective_priority: Priority,
wait_duration: Duration,
}
SchedulerEvent::GroupAllocationChanged {
group: String,
previous_slots: u32,
new_slots: u32,
reason: AllocationReason, // WeightChange, Rebalance, GroupDrained
}
Design Considerations
-
Performance: Aging should not require updating SQLite on every aging interval. Compute effective priority at dispatch time from submitted_at and the aging config. This is O(1) per task.
-
Weight rebalancing frequency: Weights should be rebalanced on every dispatch cycle (when a slot opens up), not on a timer. This ensures immediate response to changing conditions.
-
Empty groups: A group with weight but no pending tasks should release its allocation immediately (work-conserving). When new tasks arrive, it should get its share back on the next rebalance.
-
Many groups: With hundreds of groups (one per endpoint), weight calculation should be efficient. Pre-compute allocations on group membership change, not per-dispatch.
-
Interaction with rate limits: If a group has available weight-allocated slots but is rate-limited, those slots should be temporarily available to other groups, not wasted.
-
Interaction with pause: Paused groups release their weight allocation. On resume, they get their share back on the next rebalance.
-
Backward compatibility: Both features should be opt-in. Without aging config, no aging. Without weights, all groups share equally (current behavior). Existing group concurrency limits continue to work as caps independent of weights.
-
Starvation guarantee: With aging enabled, every task is guaranteed to eventually reach max_effective_priority (or the urgent threshold). Combined with work-conserving fair scheduling, this provides a bounded worst-case wait time that can be calculated from the config:
max_wait = grace_period + (urgent_threshold - base_priority) * aging_interval
This is useful for capacity planning and SLA reasoning.
Summary
Add two interrelated scheduling fairness mechanisms:
These are two facets of the same problem: ensuring all submitted work eventually gets a fair share of execution capacity, regardless of priority level or group membership.
Motivation
In a long-running sync engine (
mantle watch), the scheduler runs continuously with mixed workloads:s3://prods3://prods3://b2-backupProblem 1: Priority starvation
If HIGH-priority tasks keep arriving (user triggers multiple syncs), LOW-priority background tasks never run. In the worst case, a watch-mode sync that detected 50,000 changes sits in the queue indefinitely, its TTL expires, and the entire diff is lost — requiring a full re-scan on the next poll cycle.
Problem 2: Group monopolization
Group concurrency limits are caps ("endpoint X gets at most 6 slots"), but they don't guarantee minimum throughput. Without fair scheduling:
With weighted fair scheduling:
Why these are interrelated
Priority aging and fair scheduling interact in subtle ways. Consider:
These questions need a unified answer, not two independent mechanisms bolted together.
Proposed Behavior
Priority Aging
Effective priority is computed dynamically at dispatch time:
SchedulerSnapshotand events aseffective_priorityalongsidebase_priorityEdge cases:
Weighted Fair Scheduling
Slot allocation algorithm:
min_slotsguaranteed (if configured)Runtime adjustment:
Interaction Between Aging and Fair Scheduling
The dispatch loop should work as follows:
Priority::CRITICAL), it can borrow a slot from an underweight group. This prevents truly starved tasks from being blocked by fair schedulingMetrics
taskmill_task_queue_wait_secondstaskmill_task_effective_prioritytaskmill_aging_promotions_totaltaskmill_group_weight_allocationtaskmill_group_weight_utilizationtaskmill_group_excess_redistributedEvents
Design Considerations
Performance: Aging should not require updating SQLite on every aging interval. Compute effective priority at dispatch time from
submitted_atand the aging config. This is O(1) per task.Weight rebalancing frequency: Weights should be rebalanced on every dispatch cycle (when a slot opens up), not on a timer. This ensures immediate response to changing conditions.
Empty groups: A group with weight but no pending tasks should release its allocation immediately (work-conserving). When new tasks arrive, it should get its share back on the next rebalance.
Many groups: With hundreds of groups (one per endpoint), weight calculation should be efficient. Pre-compute allocations on group membership change, not per-dispatch.
Interaction with rate limits: If a group has available weight-allocated slots but is rate-limited, those slots should be temporarily available to other groups, not wasted.
Interaction with pause: Paused groups release their weight allocation. On resume, they get their share back on the next rebalance.
Backward compatibility: Both features should be opt-in. Without aging config, no aging. Without weights, all groups share equally (current behavior). Existing group concurrency limits continue to work as caps independent of weights.
Starvation guarantee: With aging enabled, every task is guaranteed to eventually reach
max_effective_priority(or the urgent threshold). Combined with work-conserving fair scheduling, this provides a bounded worst-case wait time that can be calculated from the config:This is useful for capacity planning and SLA reasoning.