Use backend-listen active WS connections for autoscaling#5595
Conversation
Greptile SummaryThis PR migrates the Key changes and concerns:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Pod as backend-listen Pod
participant Prom as Prometheus
participant Adapter as Prometheus Adapter
participant HPA as Kubernetes HPA
Pod->>Prom: Expose /metrics<br/>backend_listen_active_ws_connections (Gauge)
Note over Pod,Prom: ACTIVE_WS_CONNECTIONS.inc() on connect<br/>ACTIVE_WS_CONNECTIONS.dec() on disconnect
Prom->>Adapter: Scrape raw metric<br/>backend_listen_active_ws_connections
Note over Adapter: metricsQuery: avg(backend_listen_active_ws_connections)<br/>Exposed as: backend_listen_active_ws_connections_per_pod
HPA->>Adapter: GET /apis/external.metrics.k8s.io/v1beta1/<br/>backend_listen_active_ws_connections_per_pod
Adapter-->>HPA: Return avg value (e.g. 25)
Note over HPA: type: Value, target: 20<br/>stabilizationWindowSeconds: 120 (scale-up)<br/>desiredReplicas = ceil(current * 25/20)
HPA->>Pod: Scale deployment<br/>(up to maxReplicas: 50, down to minReplicas: 26)
Last reviewed commit: cb9be57 |
| selectPolicy: Min | ||
| scaleUp: | ||
| stabilizationWindowSeconds: 30 | ||
| stabilizationWindowSeconds: 120 |
There was a problem hiding this comment.
Scale-up stabilization window increased 4x without justification
stabilizationWindowSeconds for scale-up was increased from 30 to 120. For a WebSocket server, this means the HPA will observe metric values for 2 full minutes before issuing any scale-up decision. If there is a sudden influx of WebSocket connections, all existing pods remain under load during this window.
The previous value of 30 seconds was already quite defensive for active WS connection spikes. At 26 minimum pods and 20 connections/pod as the target, a connection surge would push the metric over the threshold quickly, but scaling would be deferred for 2 minutes, during which each pod may be handling significantly more than the target 20 connections.
Consider reverting to a shorter stabilization window (e.g. 30–60s) for scale-up, or document the reasoning (e.g. if there is evidence of flapping/oscillation that justifies this delay).
| stabilizationWindowSeconds: 120 | |
| stabilizationWindowSeconds: 60 |
| - name: | ||
| as: "backend_listen_active_ws_connections_per_pod" | ||
| seriesQuery: 'backend_listen_active_ws_connections' | ||
| metricsQuery: 'avg(backend_listen_active_ws_connections)' | ||
| resources: | ||
| overrides: | ||
| namespace: { resource: "namespace" } |
There was a problem hiding this comment.
avg() metric with HPA type: Value may react to scrape gaps, not real load
The metricsQuery computes avg(backend_listen_active_ws_connections) — the average connections per pod. When any pod temporarily stops reporting (restart, scrape failure, etc.), the average is computed over fewer samples, artificially inflating the metric. This can trigger premature scale-up events.
A more robust alternative is to use sum of all connections and let Kubernetes compute the per-replica average itself via type: AverageValue in the HPA. However, this would require changing the HPA target type to AverageValue.
As-is, the avg() approach is functionally correct under normal conditions, but it is worth noting that the absence of a pod from Prometheus scraping will cause the metric to transiently over-read.
The same pattern exists in both prod_omi_prometheus_adapter.yaml (line 9) and dev_omi_prometheus_adapter.yaml (line 9).
| requestsPerPod: 10 | ||
| # requestsPerPod: 10 | ||
| # failedResponseCode: 10 | ||
| activeConnectionsPerPod: 20 |
There was a problem hiding this comment.
New autoscaling metric not tested in dev environment
The dev prometheus adapter (dev_omi_prometheus_adapter.yaml) now registers the backend_listen_active_ws_connections_per_pod metric, but dev_omi_backend_listen_values.yaml does not include an activeConnectionsPerPod entry in its autoscaling section. This means the dev HPA does not exercise the new metric, so the new autoscaling behavior goes to production without a dev validation path.
Consider adding activeConnectionsPerPod to the dev values file (possibly with a different — perhaps lower — threshold) to test the end-to-end prometheus-adapter → HPA pipeline before relying on it in production.
|
lgtm |
Changes: