Skip to content

Feature: Priority aging and weighted fair scheduling across groups #37

@deepjoy

Description

@deepjoy

Summary

Add two interrelated scheduling fairness mechanisms:

  1. Priority aging — gradually escalate a task's effective priority the longer it waits, preventing starvation of low-priority work
  2. Weighted fair scheduling — distribute scheduler capacity proportionally across groups based on configurable weights, preventing one group from monopolizing the scheduler

These are two facets of the same problem: ensuring all submitted work eventually gets a fair share of execution capacity, regardless of priority level or group membership.

Motivation

In a long-running sync engine (mantle watch), the scheduler runs continuously with mixed workloads:

Source Priority Volume Group
User-triggered sync HIGH 50 files s3://prod
Background watch (profile A) LOW 50,000 files s3://prod
Background watch (profile B) LOW 200 files s3://b2-backup
Maintenance tasks LOW 5 tasks (various)

Problem 1: Priority starvation

If HIGH-priority tasks keep arriving (user triggers multiple syncs), LOW-priority background tasks never run. In the worst case, a watch-mode sync that detected 50,000 changes sits in the queue indefinitely, its TTL expires, and the entire diff is lost — requiring a full re-scan on the next poll cycle.

Problem 2: Group monopolization

Group concurrency limits are caps ("endpoint X gets at most 6 slots"), but they don't guarantee minimum throughput. Without fair scheduling:

Scheduler capacity: 16 concurrent tasks
Profile A submits: 50,000 tasks to group "s3://prod"  
Profile B submits: 200 tasks to group "s3://b2-backup"

Result: Profile A fills all 16 slots (within its group cap of 16).
        Profile B gets nothing until A's queue drains.

With weighted fair scheduling:

Profile A weight: 3  (75% of capacity → 12 slots)
Profile B weight: 1  (25% of capacity → 4 slots)

Result: Both profiles make progress proportionally.

Why these are interrelated

Priority aging and fair scheduling interact in subtle ways. Consider:

  • A LOW-priority task in a high-weight group vs. a HIGH-priority task in a low-weight group — which runs first?
  • If aging promotes a task to HIGH priority, should it bypass the group's weight allocation?
  • If a group's weight allocation is exhausted but the global scheduler has idle capacity, should the excess go to overweight groups or be reserved?

These questions need a unified answer, not two independent mechanisms bolted together.

Proposed Behavior

Priority Aging

let scheduler = Scheduler::builder()
    .priority_aging(AgingConfig {
        // After this duration in the queue, start aging
        grace_period: Duration::from_secs(300),  // 5 min
        // Promote one priority level per interval
        aging_interval: Duration::from_secs(60), // every 60s after grace
        // Never age above this ceiling (don't let background tasks
        // compete with CRITICAL priority)
        max_effective_priority: Priority::HIGH,
    })
    .build();

Effective priority is computed dynamically at dispatch time:

effective_priority = min(
    base_priority + (time_in_queue - grace_period) / aging_interval,
    max_effective_priority
)
  • The task's stored priority never changes — aging is a dispatch-time calculation
  • Aged priority is visible in SchedulerSnapshot and events as effective_priority alongside base_priority
  • When a task is superseded or resubmitted, the aging clock resets
  • Recurring task instances start with a fresh aging clock each occurrence

Edge cases:

  • Tasks with TTL: aging and TTL interact — a task could age to HIGH priority right before its TTL expires. This is correct behavior (last-ditch attempt to run before expiry)
  • Retried tasks: should aging clock reset on retry, or continue from original submission? Recommend: continue from original submission (the task has been waiting even longer)
  • Paused tasks: aging clock should freeze while paused (same as TTL freeze)
  • Child tasks: inherit the parent's current effective priority, not base priority

Weighted Fair Scheduling

let scheduler = Scheduler::builder()
    .max_concurrent_tasks(16)
    // Weights are relative — these give A 75%, B 25% of capacity
    .group_weight("s3://prod", 3)
    .group_weight("s3://b2-backup", 1)
    // Groups without explicit weight get the default
    .default_group_weight(1)
    // Minimum guaranteed slots per group (regardless of weight)
    .group_minimum_slots("s3://b2-backup", 2)
    .build();

Slot allocation algorithm:

  1. Each group gets min_slots guaranteed (if configured)
  2. Remaining capacity is distributed by weight ratio
  3. If a group doesn't need its full allocation (fewer pending tasks than slots), excess capacity is redistributed to groups that can use it (work-conserving)
  4. Group concurrency limits (caps) still apply on top of weight allocation
Total capacity: 16 slots
Group A weight: 3, cap: 12, pending: 50,000
Group B weight: 1, cap: 6, pending: 200, min_slots: 2

Step 1: Guarantee minimums → B gets 2, remaining = 14
Step 2: Distribute by weight → A gets 14*(3/4)=10.5→10, B gets 14*(1/4)=3.5→4
Step 3: Apply caps → A: min(10, 12)=10, B: min(2+4, 6)=6
Step 4: Total=16 ✓

If B only has 3 pending tasks:
  B uses 3 slots, 3 excess redistributed to A
  A gets 13 (still under cap of 12... capped at 12)
  1 slot idle (all groups at cap or drained)

Runtime adjustment:

// Dynamically adjust weights (e.g., during business hours, prioritize prod)
scheduler.set_group_weight("s3://prod", 5);
scheduler.set_group_weight("s3://b2-backup", 1);

// Reset to equal weights
scheduler.reset_group_weights();

Interaction Between Aging and Fair Scheduling

The dispatch loop should work as follows:

  1. Slot allocation: Determine how many slots each group gets (via weights)
  2. Within each group's allocation: Pick the highest effective priority task (with aging applied)
  3. Cross-group priority override: If a task has aged past a configurable "urgent threshold" (Priority::CRITICAL), it can borrow a slot from an underweight group. This prevents truly starved tasks from being blocked by fair scheduling
.priority_aging(AgingConfig {
    // ...
    // When effective priority reaches CRITICAL, task can bypass group weights
    urgent_threshold: Some(Priority::CRITICAL),
})

Metrics

Metric Type Description
taskmill_task_queue_wait_seconds Histogram Time from submission to dispatch (tracks starvation)
taskmill_task_effective_priority Gauge Current effective priority distribution
taskmill_aging_promotions_total Counter How many times tasks were aged up
taskmill_group_weight_allocation Gauge Current slot allocation per group
taskmill_group_weight_utilization Gauge Slots used / slots allocated per group
taskmill_group_excess_redistributed Counter Unused slots redistributed to other groups

Events

SchedulerEvent::TaskAged {
    task_id: TaskId,
    base_priority: Priority,
    effective_priority: Priority,
    wait_duration: Duration,
}

SchedulerEvent::GroupAllocationChanged {
    group: String,
    previous_slots: u32,
    new_slots: u32,
    reason: AllocationReason,  // WeightChange, Rebalance, GroupDrained
}

Design Considerations

  1. Performance: Aging should not require updating SQLite on every aging interval. Compute effective priority at dispatch time from submitted_at and the aging config. This is O(1) per task.

  2. Weight rebalancing frequency: Weights should be rebalanced on every dispatch cycle (when a slot opens up), not on a timer. This ensures immediate response to changing conditions.

  3. Empty groups: A group with weight but no pending tasks should release its allocation immediately (work-conserving). When new tasks arrive, it should get its share back on the next rebalance.

  4. Many groups: With hundreds of groups (one per endpoint), weight calculation should be efficient. Pre-compute allocations on group membership change, not per-dispatch.

  5. Interaction with rate limits: If a group has available weight-allocated slots but is rate-limited, those slots should be temporarily available to other groups, not wasted.

  6. Interaction with pause: Paused groups release their weight allocation. On resume, they get their share back on the next rebalance.

  7. Backward compatibility: Both features should be opt-in. Without aging config, no aging. Without weights, all groups share equally (current behavior). Existing group concurrency limits continue to work as caps independent of weights.

  8. Starvation guarantee: With aging enabled, every task is guaranteed to eventually reach max_effective_priority (or the urgent threshold). Combined with work-conserving fair scheduling, this provides a bounded worst-case wait time that can be calculated from the config:

    max_wait = grace_period + (urgent_threshold - base_priority) * aging_interval
    

    This is useful for capacity planning and SLA reasoning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions