Feature: Priority aging and weighted fair scheduling across groups

## Summary

Add two interrelated scheduling fairness mechanisms:
1. **Priority aging** — gradually escalate a task's effective priority the longer it waits, preventing starvation of low-priority work
2. **Weighted fair scheduling** — distribute scheduler capacity proportionally across groups based on configurable weights, preventing one group from monopolizing the scheduler

These are two facets of the same problem: **ensuring all submitted work eventually gets a fair share of execution capacity**, regardless of priority level or group membership.

## Motivation

In a long-running sync engine (`mantle watch`), the scheduler runs continuously with mixed workloads:

| Source | Priority | Volume | Group |
|--------|----------|--------|-------|
| User-triggered sync | HIGH | 50 files | `s3://prod` |
| Background watch (profile A) | LOW | 50,000 files | `s3://prod` |
| Background watch (profile B) | LOW | 200 files | `s3://b2-backup` |
| Maintenance tasks | LOW | 5 tasks | (various) |

### Problem 1: Priority starvation

If HIGH-priority tasks keep arriving (user triggers multiple syncs), LOW-priority background tasks never run. In the worst case, a watch-mode sync that detected 50,000 changes sits in the queue indefinitely, its TTL expires, and the entire diff is lost — requiring a full re-scan on the next poll cycle.

### Problem 2: Group monopolization

Group concurrency limits are **caps** ("endpoint X gets at most 6 slots"), but they don't guarantee **minimum throughput**. Without fair scheduling:

```
Scheduler capacity: 16 concurrent tasks
Profile A submits: 50,000 tasks to group "s3://prod"  
Profile B submits: 200 tasks to group "s3://b2-backup"

Result: Profile A fills all 16 slots (within its group cap of 16).
        Profile B gets nothing until A's queue drains.
```

With weighted fair scheduling:

```
Profile A weight: 3  (75% of capacity → 12 slots)
Profile B weight: 1  (25% of capacity → 4 slots)

Result: Both profiles make progress proportionally.
```

### Why these are interrelated

Priority aging and fair scheduling interact in subtle ways. Consider:
- A LOW-priority task in a high-weight group vs. a HIGH-priority task in a low-weight group — which runs first?
- If aging promotes a task to HIGH priority, should it bypass the group's weight allocation?
- If a group's weight allocation is exhausted but the global scheduler has idle capacity, should the excess go to overweight groups or be reserved?

These questions need a unified answer, not two independent mechanisms bolted together.

## Proposed Behavior

### Priority Aging

```rust
let scheduler = Scheduler::builder()
    .priority_aging(AgingConfig {
        // After this duration in the queue, start aging
        grace_period: Duration::from_secs(300),  // 5 min
        // Promote one priority level per interval
        aging_interval: Duration::from_secs(60), // every 60s after grace
        // Never age above this ceiling (don't let background tasks
        // compete with CRITICAL priority)
        max_effective_priority: Priority::HIGH,
    })
    .build();
```

**Effective priority** is computed dynamically at dispatch time:

```
effective_priority = min(
    base_priority + (time_in_queue - grace_period) / aging_interval,
    max_effective_priority
)
```

- The task's **stored** priority never changes — aging is a dispatch-time calculation
- Aged priority is visible in `SchedulerSnapshot` and events as `effective_priority` alongside `base_priority`
- When a task is superseded or resubmitted, the aging clock resets
- Recurring task instances start with a fresh aging clock each occurrence

**Edge cases:**
- Tasks with TTL: aging and TTL interact — a task could age to HIGH priority right before its TTL expires. This is correct behavior (last-ditch attempt to run before expiry)
- Retried tasks: should aging clock reset on retry, or continue from original submission? Recommend: continue from original submission (the task has been waiting even longer)
- Paused tasks: aging clock should freeze while paused (same as TTL freeze)
- Child tasks: inherit the parent's current effective priority, not base priority

### Weighted Fair Scheduling

```rust
let scheduler = Scheduler::builder()
    .max_concurrent_tasks(16)
    // Weights are relative — these give A 75%, B 25% of capacity
    .group_weight("s3://prod", 3)
    .group_weight("s3://b2-backup", 1)
    // Groups without explicit weight get the default
    .default_group_weight(1)
    // Minimum guaranteed slots per group (regardless of weight)
    .group_minimum_slots("s3://b2-backup", 2)
    .build();
```

**Slot allocation algorithm:**

1. Each group gets `min_slots` guaranteed (if configured)
2. Remaining capacity is distributed by weight ratio
3. If a group doesn't need its full allocation (fewer pending tasks than slots), excess capacity is redistributed to groups that can use it (work-conserving)
4. Group concurrency limits (caps) still apply on top of weight allocation

```
Total capacity: 16 slots
Group A weight: 3, cap: 12, pending: 50,000
Group B weight: 1, cap: 6, pending: 200, min_slots: 2

Step 1: Guarantee minimums → B gets 2, remaining = 14
Step 2: Distribute by weight → A gets 14*(3/4)=10.5→10, B gets 14*(1/4)=3.5→4
Step 3: Apply caps → A: min(10, 12)=10, B: min(2+4, 6)=6
Step 4: Total=16 ✓

If B only has 3 pending tasks:
  B uses 3 slots, 3 excess redistributed to A
  A gets 13 (still under cap of 12... capped at 12)
  1 slot idle (all groups at cap or drained)
```

**Runtime adjustment:**

```rust
// Dynamically adjust weights (e.g., during business hours, prioritize prod)
scheduler.set_group_weight("s3://prod", 5);
scheduler.set_group_weight("s3://b2-backup", 1);

// Reset to equal weights
scheduler.reset_group_weights();
```

### Interaction Between Aging and Fair Scheduling

The dispatch loop should work as follows:

1. **Slot allocation:** Determine how many slots each group gets (via weights)
2. **Within each group's allocation:** Pick the highest *effective* priority task (with aging applied)
3. **Cross-group priority override:** If a task has aged past a configurable "urgent threshold" (`Priority::CRITICAL`), it can borrow a slot from an underweight group. This prevents truly starved tasks from being blocked by fair scheduling

```rust
.priority_aging(AgingConfig {
    // ...
    // When effective priority reaches CRITICAL, task can bypass group weights
    urgent_threshold: Some(Priority::CRITICAL),
})
```

### Metrics

| Metric | Type | Description |
|--------|------|-------------|
| `taskmill_task_queue_wait_seconds` | Histogram | Time from submission to dispatch (tracks starvation) |
| `taskmill_task_effective_priority` | Gauge | Current effective priority distribution |
| `taskmill_aging_promotions_total` | Counter | How many times tasks were aged up |
| `taskmill_group_weight_allocation` | Gauge | Current slot allocation per group |
| `taskmill_group_weight_utilization` | Gauge | Slots used / slots allocated per group |
| `taskmill_group_excess_redistributed` | Counter | Unused slots redistributed to other groups |

### Events

```rust
SchedulerEvent::TaskAged {
    task_id: TaskId,
    base_priority: Priority,
    effective_priority: Priority,
    wait_duration: Duration,
}

SchedulerEvent::GroupAllocationChanged {
    group: String,
    previous_slots: u32,
    new_slots: u32,
    reason: AllocationReason,  // WeightChange, Rebalance, GroupDrained
}
```

## Design Considerations

1. **Performance:** Aging should not require updating SQLite on every aging interval. Compute effective priority at dispatch time from `submitted_at` and the aging config. This is O(1) per task.

2. **Weight rebalancing frequency:** Weights should be rebalanced on every dispatch cycle (when a slot opens up), not on a timer. This ensures immediate response to changing conditions.

3. **Empty groups:** A group with weight but no pending tasks should release its allocation immediately (work-conserving). When new tasks arrive, it should get its share back on the next rebalance.

4. **Many groups:** With hundreds of groups (one per endpoint), weight calculation should be efficient. Pre-compute allocations on group membership change, not per-dispatch.

5. **Interaction with rate limits:** If a group has available weight-allocated slots but is rate-limited, those slots should be temporarily available to other groups, not wasted.

6. **Interaction with pause:** Paused groups release their weight allocation. On resume, they get their share back on the next rebalance.

7. **Backward compatibility:** Both features should be opt-in. Without aging config, no aging. Without weights, all groups share equally (current behavior). Existing group concurrency limits continue to work as caps independent of weights.

8. **Starvation guarantee:** With aging enabled, every task is guaranteed to eventually reach `max_effective_priority` (or the urgent threshold). Combined with work-conserving fair scheduling, this provides a bounded worst-case wait time that can be calculated from the config:
   ```
   max_wait = grace_period + (urgent_threshold - base_priority) * aging_interval
   ```
   This is useful for capacity planning and SLA reasoning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Priority aging and weighted fair scheduling across groups #37

Summary

Motivation

Problem 1: Priority starvation

Problem 2: Group monopolization

Why these are interrelated

Proposed Behavior

Priority Aging

Weighted Fair Scheduling

Interaction Between Aging and Fair Scheduling

Metrics

Events

Design Considerations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Source	Priority	Volume	Group
User-triggered sync	HIGH	50 files	`s3://prod`
Background watch (profile A)	LOW	50,000 files	`s3://prod`
Background watch (profile B)	LOW	200 files	`s3://b2-backup`
Maintenance tasks	LOW	5 tasks	(various)

Metric	Type	Description
`taskmill_task_queue_wait_seconds`	Histogram	Time from submission to dispatch (tracks starvation)
`taskmill_task_effective_priority`	Gauge	Current effective priority distribution
`taskmill_aging_promotions_total`	Counter	How many times tasks were aged up
`taskmill_group_weight_allocation`	Gauge	Current slot allocation per group
`taskmill_group_weight_utilization`	Gauge	Slots used / slots allocated per group
`taskmill_group_excess_redistributed`	Counter	Unused slots redistributed to other groups

Feature: Priority aging and weighted fair scheduling across groups #37

Description

Summary

Motivation

Problem 1: Priority starvation

Problem 2: Group monopolization

Why these are interrelated

Proposed Behavior

Priority Aging

Weighted Fair Scheduling

Interaction Between Aging and Fair Scheduling

Metrics

Events

Design Considerations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions