chore(deps): update helm release kube-prometheus-stack to v55.7.1#78
Conversation
PR Analysis
PR Feedback💡 General suggestions: The PR is straightforward and doesn't require any major changes. However, it would be beneficial to include a brief explanation of the changes or improvements introduced in the updated version of the Helm chart, even if it's a minor version update. ✨ Usage guide:Overview: With a configuration file, use the following template:
See the review usage page for a comprehensive guide on using this tool. |
Defense-in-depth for task #78 (ext_proc gRPC stream cold-connect drops first request after CDS update / long idle). The aggressive PING schedule shortens the window during which a cached H2 stream sits idle long enough for the upstream — or cilium-envoy itself — to half-close it without our side noticing. The first prompt after an idle period should now find a fresh, validated stream more often. Won't fix the underlying `clearRouteCache: false` race in SR v0.2.0's request-body callback (the structural cause); that still needs an upstream PR or a Lua filter in this CEC. Comment in the cluster definition spells out the rationale + boundary.
#78) Definitive fix for task #78 (ext_proc gRPC stream cold-connect drops first request → catch-all-route 503s). The CEC's own header comment flagged the two viable paths a) upstream SR clearRouteCache=true PR; b) in-tree Lua filter copying x-selected-model. Going with (b) — no upstream dependency, narrower blast radius, full per-model dispatch. Three changes: 1. Lua filter inserted between ext_proc and router. envoy_on_request reads x-selected-model (set by SR's body callback), and if non- empty calls request_handle:clearRouteCache() so Envoy re-evaluates routes against the post-mutation headers. 2. Per-model header-match routes (5 entries — qwen-coder, qwen-coder- fim, qwen3-8b, llamaguard3-1b, phi4-mini) replacing the prior single catch-all. A final no-header catch-all falls back to phi4-mini for the SR-degraded path (failure_mode_allow=true + classification timeout → no x-selected-model → no clearRouteCache → catch-all). 3. EDS clusters for the 4 models that didn't have one (the prior single-cluster baseline only had phi4-mini). Cilium populates endpoints from spec.backendServices. Header reference + ordering rationale + degraded-path semantics all captured inline in the file's top-of-file and per-filter comments.
Course-correct on 678a687: cilium-envoy is a slimmed Envoy build that does NOT include envoy.filters.http.lua. The listener was REJECTED with: Error adding/updating listener(s) llm/llm-ai-gateway/...: Didn't find a registered implementation for 'envoy.filters.http.lua' with type URL: 'envoy.extensions.filters.http.lua.v3.Lua' So the Lua-driven clearRouteCache approach is structurally blocked by Cilium's Envoy compile-time config. The per-model header-match routes and per-model EDS clusters from 678a687 are KEPT — they work for the client-deterministic path (clients that set `x-selected-model` directly). What still doesn't work: the MoM auto-routing path (SR classifies → sets header at body callback → route was already picked at headers phase → 503 on catch-all to phi4-mini if it's not running). That needs one of: (a) upstream SR PR for clearRouteCache=true; (b) custom cilium-envoy with Lua/wasm filter compiled in; (c) fork SR and patch buildRequestBodyContinueResponse. Task #78 stays pending. Comment block updated to describe the partial state honestly.
Brainstorm output for fixing task #78 root cause. The Envoy ext_proc + cilium-envoy approach is structurally blocked by: 1. SR v0.2.0 hard-coding clearRouteCache=false in buildRequestBodyContinueResponse — defeats Envoy's body-callback header-mutation re-routing. 2. cilium-envoy's slim build (no envoy.filters.http.lua) — kills the standard "Lua filter calls clearRouteCache after ext_proc" workaround. Verified empirically: listener rejected with "Didn't find a registered implementation". 3. cilium.l7policy filter on upstream filter chains — denies traffic to per-model EDS clusters with 403 even from CNP-allowed sources. The design replaces the entire CEC + ext_proc chain with a small custom HTTP proxy (~250 LOC Go) deployed in the llm namespace. The proxy reads the body's model field directly and: - For client-deterministic (model: xplane-*): fast path, forward to that Service. No SR roundtrip. - For SR-classified (model: MoM): call SR's HTTP classify API, rewrite body.model, forward. Same UX as the broken ext_proc path but actually works. Both OpenCode subagent dispatch (per-agent model assignment) AND OpenWebUI MoM auto-routing flow through the same proxy. Single provider URL stays for all clients — no client-side changes needed. Spec sections cover goal/SC, architecture, component design, streaming behavior, deployment plan, phased rollout (P0-P7), risks (SSE, single-point-of-failure, SR endpoint contract), and explicit out-of-scope (no auth/cache/circuit-breaking — the proxy is a thin forwarder, not a control plane). Implementation plan ships separately. Targets follow-on PR after #1434 merges.
Phased TDD-shaped plan to deliver the design at 2026-05-05-ai-gateway-redesign-design.md (commit 060c02e). 5 phases, each independently mergeable: P1 smoke (dedicated Envoy + qwen3-8b route), P2 SR ext_proc via EnvoyExtensionPolicy with filter-ordering verification (Lua fallback documented), P3 full fleet routing (Service-backed), P4 InferencePool + EPP per claim (folds task #76), P5 demolition (delete CEC + llm-router-proxy + GHCR workflow, repoint Tailscale, close #78). Cross-cutting verification table + per-phase rollback playbook + open items for implementation-time discovery.
…AI Gateway Drop the EnvoyExtensionPolicy that wired Iris as an ext_proc filter ahead of the AI Gateway's body parser. The 2026-05-06 foundation- showcase design replaces this path: AIGatewayRoute body parser sets x-ai-eg-model directly from body.model for client-deterministic requests (`model: xplane-<name>`); Iris is consulted via its HTTP classifier endpoint only for cascade-routed `model: MoM` requests. Removes the cilium-envoy slim-build constraints (no Lua), the ext_proc cold-connect 404 (#78), and SR v0.2.0's clearRouteCache: false fragility. CNP updated: drop the dead :50051 ingress rule (gRPC ext_proc port, no longer used); replace with :8080 ingress for the HTTP classifier endpoint, allowed from envoy-ai-gateway-system.
Defense-in-depth for task #78 (ext_proc gRPC stream cold-connect drops first request after CDS update / long idle). The aggressive PING schedule shortens the window during which a cached H2 stream sits idle long enough for the upstream — or cilium-envoy itself — to half-close it without our side noticing. The first prompt after an idle period should now find a fresh, validated stream more often. Won't fix the underlying `clearRouteCache: false` race in SR v0.2.0's request-body callback (the structural cause); that still needs an upstream PR or a Lua filter in this CEC. Comment in the cluster definition spells out the rationale + boundary.
#78) Definitive fix for task #78 (ext_proc gRPC stream cold-connect drops first request → catch-all-route 503s). The CEC's own header comment flagged the two viable paths a) upstream SR clearRouteCache=true PR; b) in-tree Lua filter copying x-selected-model. Going with (b) — no upstream dependency, narrower blast radius, full per-model dispatch. Three changes: 1. Lua filter inserted between ext_proc and router. envoy_on_request reads x-selected-model (set by SR's body callback), and if non- empty calls request_handle:clearRouteCache() so Envoy re-evaluates routes against the post-mutation headers. 2. Per-model header-match routes (5 entries — qwen-coder, qwen-coder- fim, qwen3-8b, llamaguard3-1b, phi4-mini) replacing the prior single catch-all. A final no-header catch-all falls back to phi4-mini for the SR-degraded path (failure_mode_allow=true + classification timeout → no x-selected-model → no clearRouteCache → catch-all). 3. EDS clusters for the 4 models that didn't have one (the prior single-cluster baseline only had phi4-mini). Cilium populates endpoints from spec.backendServices. Header reference + ordering rationale + degraded-path semantics all captured inline in the file's top-of-file and per-filter comments.
Course-correct on 678a687: cilium-envoy is a slimmed Envoy build that does NOT include envoy.filters.http.lua. The listener was REJECTED with: Error adding/updating listener(s) llm/llm-ai-gateway/...: Didn't find a registered implementation for 'envoy.filters.http.lua' with type URL: 'envoy.extensions.filters.http.lua.v3.Lua' So the Lua-driven clearRouteCache approach is structurally blocked by Cilium's Envoy compile-time config. The per-model header-match routes and per-model EDS clusters from 678a687 are KEPT — they work for the client-deterministic path (clients that set `x-selected-model` directly). What still doesn't work: the MoM auto-routing path (SR classifies → sets header at body callback → route was already picked at headers phase → 503 on catch-all to phi4-mini if it's not running). That needs one of: (a) upstream SR PR for clearRouteCache=true; (b) custom cilium-envoy with Lua/wasm filter compiled in; (c) fork SR and patch buildRequestBodyContinueResponse. Task #78 stays pending. Comment block updated to describe the partial state honestly.
Brainstorm output for fixing task #78 root cause. The Envoy ext_proc + cilium-envoy approach is structurally blocked by: 1. SR v0.2.0 hard-coding clearRouteCache=false in buildRequestBodyContinueResponse — defeats Envoy's body-callback header-mutation re-routing. 2. cilium-envoy's slim build (no envoy.filters.http.lua) — kills the standard "Lua filter calls clearRouteCache after ext_proc" workaround. Verified empirically: listener rejected with "Didn't find a registered implementation". 3. cilium.l7policy filter on upstream filter chains — denies traffic to per-model EDS clusters with 403 even from CNP-allowed sources. The design replaces the entire CEC + ext_proc chain with a small custom HTTP proxy (~250 LOC Go) deployed in the llm namespace. The proxy reads the body's model field directly and: - For client-deterministic (model: xplane-*): fast path, forward to that Service. No SR roundtrip. - For SR-classified (model: MoM): call SR's HTTP classify API, rewrite body.model, forward. Same UX as the broken ext_proc path but actually works. Both OpenCode subagent dispatch (per-agent model assignment) AND OpenWebUI MoM auto-routing flow through the same proxy. Single provider URL stays for all clients — no client-side changes needed. Spec sections cover goal/SC, architecture, component design, streaming behavior, deployment plan, phased rollout (P0-P7), risks (SSE, single-point-of-failure, SR endpoint contract), and explicit out-of-scope (no auth/cache/circuit-breaking — the proxy is a thin forwarder, not a control plane). Implementation plan ships separately. Targets follow-on PR after #1434 merges.
…AI Gateway Drop the EnvoyExtensionPolicy that wired Iris as an ext_proc filter ahead of the AI Gateway's body parser. The 2026-05-06 foundation- showcase design replaces this path: AIGatewayRoute body parser sets x-ai-eg-model directly from body.model for client-deterministic requests (`model: xplane-<name>`); Iris is consulted via its HTTP classifier endpoint only for cascade-routed `model: MoM` requests. Removes the cilium-envoy slim-build constraints (no Lua), the ext_proc cold-connect 404 (#78), and SR v0.2.0's clearRouteCache: false fragility. CNP updated: drop the dead :50051 ingress rule (gRPC ext_proc port, no longer used); replace with :8080 ingress for the HTTP classifier endpoint, allowed from envoy-ai-gateway-system.
This PR contains the following updates:
55.7.0->55.7.1Release Notes
prometheus-community/helm-charts (kube-prometheus-stack)
v55.7.1Compare Source
kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
What's Changed
New Contributors
Full Changelog: prometheus-community/helm-charts@prometheus-systemd-exporter-0.1.0...kube-prometheus-stack-55.7.1
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.