Skip to content

feat: add Prometheus alerting rules, Grafana alerts row, and monitori…#17

Merged
alez007 merged 1 commit intomainfrom
feat/prometheus-alerting
Apr 13, 2026
Merged

feat: add Prometheus alerting rules, Grafana alerts row, and monitori…#17
alez007 merged 1 commit intomainfrom
feat/prometheus-alerting

Conversation

@alez007
Copy link
Copy Markdown
Owner

@alez007 alez007 commented Apr 13, 2026

…ng docs

Add standalone prometheus-alerts.yml with 4 critical and 6 warning alerting rules covering error rate, model availability, KV cache exhaustion, latency, queue depth, preemptions, disconnects, GPU memory, and TTFT. Add 7th "Alerts" row to the Grafana dashboard with threshold-annotated panels. Extend monitoring.md with import instructions, alert reference table, and tuning guidance.

…ng docs

Add standalone prometheus-alerts.yml with 4 critical and 6 warning
alerting rules covering error rate, model availability, KV cache
exhaustion, latency, queue depth, preemptions, disconnects, GPU
memory, and TTFT. Add 7th "Alerts" row to the Grafana dashboard
with threshold-annotated panels. Extend monitoring.md with import
instructions, alert reference table, and tuning guidance.
@alez007 alez007 merged commit df4c67e into main Apr 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant