Description
scheduler_job_runner.py emits gauges for pool slot states (pool.open_slots, pool.queued_slots, pool.running_slots, pool.starving_tasks). On most backends, gauges are last-write-wins — a spike in pool pressure between two scheduler loop iterations shows up as a single value, and the distribution between scrapes is lost.
Use case / motivation
Backend operators sizing pools want p50/p95/p99 of pool utilization, not just point-in-time gauge samples. Today there's no way to see the spread.
Proposal
Alongside each existing pool slot gauge emission, also emit a histogram with the same value. Four Stats.histogram(...) additions in scheduler_job_runner.py, same call sites as the existing gauges. Nothing removed — gauges stay for backwards-compatible scrapers.
Are you willing to submit a PR?
Code of Conduct
Description
scheduler_job_runner.pyemits gauges for pool slot states (pool.open_slots,pool.queued_slots,pool.running_slots,pool.starving_tasks). On most backends, gauges are last-write-wins — a spike in pool pressure between two scheduler loop iterations shows up as a single value, and the distribution between scrapes is lost.Use case / motivation
Backend operators sizing pools want p50/p95/p99 of pool utilization, not just point-in-time gauge samples. Today there's no way to see the spread.
Proposal
Alongside each existing pool slot gauge emission, also emit a histogram with the same value. Four
Stats.histogram(...)additions inscheduler_job_runner.py, same call sites as the existing gauges. Nothing removed — gauges stay for backwards-compatible scrapers.Are you willing to submit a PR?
Code of Conduct