Use backend-listen active WS connections for autoscaling by thainguyensunya · Pull Request #5595 · BasedHardware/omi

thainguyensunya · 2026-03-13T09:40:35Z

Changes:

Use backend_listen_active_ws_connections_per_pod for autoscaling
Modify scaleUp and scaleDown behaviors to be compatible the new autoscaling factor

greptile-apps · 2026-03-13T09:44:30Z

Greptile Summary

This PR migrates the backend-listen HPA autoscaling signal from HTTP request-rate (requestsPerPod) to the number of active WebSocket connections per pod (activeConnectionsPerPod: 20). The new Prometheus Adapter rule (avg(backend_listen_active_ws_connections)) is registered in both dev and prod adapters, and the HPA template now supports the new external metric. Scaling behavior is tuned to be more conservative (slower scale-up and scale-down).

Key changes and concerns:

Scale-up stabilization window increased from 30s → 120s: For a real-time audio/WebSocket workload this 4× increase means the HPA waits 2 minutes before reacting to connection spikes. This could result in pods being over-loaded during traffic surges.
avg() metric is sensitive to scrape gaps: avg(backend_listen_active_ws_connections) will be computed over fewer samples if any pod temporarily stops reporting, transiently inflating the metric and potentially causing unnecessary scale-up events.
Dev HPA does not enable the new metric: dev_omi_backend_listen_values.yaml does not include activeConnectionsPerPod, so the full prometheus-adapter → HPA pipeline for this new signal is untested in the dev environment before it goes to production.

Confidence Score: 3/5

Mostly safe to merge, but the 4× increase in scale-up stabilization window and lack of dev-environment validation of the new metric signal introduce production risk for a latency-sensitive WebSocket service.
The core idea (scaling on active WS connections) is sound and the implementation is technically correct. The Prometheus Adapter avg() + HPA type: Value pairing is valid. However, the scale-up stabilization window change (30s → 120s) is a significant behavior regression for WebSocket spike scenarios, and the new autoscaling signal is being deployed directly to production without being exercised in the dev environment first.
backend/charts/backend-listen/prod_omi_backend_listen_values.yaml — specifically the scale-up stabilizationWindowSeconds and the absence of dev validation.

Important Files Changed

Filename	Overview
backend/charts/backend-listen/prod_omi_backend_listen_values.yaml	Switches autoscaling from `requestsPerPod` (HTTP/WS upgrade request rate) to `activeConnectionsPerPod` (live WS gauge at 20). Scale-up stabilization window increased 4x to 120s, and both scale-up policies slowed significantly — these changes may delay capacity response during connection spikes.
backend/charts/backend-listen/templates/hpa.yaml	Adds new HPA metric block for `backend_listen_active_ws_connections_per_pod` using `type: Value` — correct pairing with the `avg()`-based prometheus adapter query. Template logic is straightforward.
backend/charts/monitoring/prometheus-adapter/prod_omi_prometheus_adapter.yaml	Registers `backend_listen_active_ws_connections_per_pod` external metric using `avg(backend_listen_active_ws_connections)`. The `avg()` approach is functionally correct but susceptible to producing an inflated value if any pod temporarily drops out of Prometheus scraping.
backend/charts/monitoring/prometheus-adapter/dev_omi_prometheus_adapter.yaml	Mirrors prod prometheus adapter change — registers the `backend_listen_active_ws_connections_per_pod` metric rule. Dev backend-listen values do not yet enable this metric in the HPA, so there is no dev-env validation of the full pipeline.

Sequence Diagram

sequenceDiagram
    participant Pod as backend-listen Pod
    participant Prom as Prometheus
    participant Adapter as Prometheus Adapter
    participant HPA as Kubernetes HPA

    Pod->>Prom: Expose /metrics<br/>backend_listen_active_ws_connections (Gauge)
    Note over Pod,Prom: ACTIVE_WS_CONNECTIONS.inc() on connect<br/>ACTIVE_WS_CONNECTIONS.dec() on disconnect

    Prom->>Adapter: Scrape raw metric<br/>backend_listen_active_ws_connections
    Note over Adapter: metricsQuery: avg(backend_listen_active_ws_connections)<br/>Exposed as: backend_listen_active_ws_connections_per_pod

    HPA->>Adapter: GET /apis/external.metrics.k8s.io/v1beta1/<br/>backend_listen_active_ws_connections_per_pod
    Adapter-->>HPA: Return avg value (e.g. 25)

    Note over HPA: type: Value, target: 20<br/>stabilizationWindowSeconds: 120 (scale-up)<br/>desiredReplicas = ceil(current * 25/20)

    HPA->>Pod: Scale deployment<br/>(up to maxReplicas: 50, down to minReplicas: 26)

_{Last reviewed commit: cb9be57}

greptile-apps · 2026-03-13T09:44:34Z

      selectPolicy: Min
    scaleUp:
-      stabilizationWindowSeconds: 30
+      stabilizationWindowSeconds: 120


Scale-up stabilization window increased 4x without justification

stabilizationWindowSeconds for scale-up was increased from 30 to 120. For a WebSocket server, this means the HPA will observe metric values for 2 full minutes before issuing any scale-up decision. If there is a sudden influx of WebSocket connections, all existing pods remain under load during this window.

The previous value of 30 seconds was already quite defensive for active WS connection spikes. At 26 minimum pods and 20 connections/pod as the target, a connection surge would push the metric over the threshold quickly, but scaling would be deferred for 2 minutes, during which each pod may be handling significantly more than the target 20 connections.

Consider reverting to a shorter stabilization window (e.g. 30–60s) for scale-up, or document the reasoning (e.g. if there is evidence of flapping/oscillation that justifies this delay).

Suggested change

stabilizationWindowSeconds: 120

stabilizationWindowSeconds: 60

greptile-apps · 2026-03-13T09:44:35Z

+    - name:
+        as: "backend_listen_active_ws_connections_per_pod"
+      seriesQuery: 'backend_listen_active_ws_connections'
+      metricsQuery: 'avg(backend_listen_active_ws_connections)'
+      resources:
+        overrides:
+          namespace: { resource: "namespace" }


avg() metric with HPA type: Value may react to scrape gaps, not real load

The metricsQuery computes avg(backend_listen_active_ws_connections) — the average connections per pod. When any pod temporarily stops reporting (restart, scrape failure, etc.), the average is computed over fewer samples, artificially inflating the metric. This can trigger premature scale-up events.

A more robust alternative is to use sum of all connections and let Kubernetes compute the per-replica average itself via type: AverageValue in the HPA. However, this would require changing the HPA target type to AverageValue.

As-is, the avg() approach is functionally correct under normal conditions, but it is worth noting that the absence of a pod from Prometheus scraping will cause the metric to transiently over-read.

The same pattern exists in both prod_omi_prometheus_adapter.yaml (line 9) and dev_omi_prometheus_adapter.yaml (line 9).

greptile-apps · 2026-03-13T09:44:36Z

-  requestsPerPod: 10
+  # requestsPerPod: 10
  # failedResponseCode: 10
+  activeConnectionsPerPod: 20


New autoscaling metric not tested in dev environment

The dev prometheus adapter (dev_omi_prometheus_adapter.yaml) now registers the backend_listen_active_ws_connections_per_pod metric, but dev_omi_backend_listen_values.yaml does not include an activeConnectionsPerPod entry in its autoscaling section. This means the dev HPA does not exercise the new metric, so the new autoscaling behavior goes to production without a dev validation path.

Consider adding activeConnectionsPerPod to the dev values file (possibly with a different — perhaps lower — threshold) to test the end-to-end prometheus-adapter → HPA pipeline before relying on it in production.

beastoin · 2026-03-13T10:05:26Z

lgtm

…re#5595)

Use backend-listen active WS connections for autoscaling

cb9be57

thainguyensunya requested a review from beastoin March 13, 2026 09:40

greptile-apps Bot reviewed Mar 13, 2026

View reviewed changes

beastoin merged commit 03c6acf into main Mar 13, 2026
2 checks passed

beastoin deleted the task/use-backend-listen-active-ws-for-auto-scaling branch March 13, 2026 10:05

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026

Use backend-listen active WS connections for autoscaling (BasedHardwa…

edf309d

…re#5595)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use backend-listen active WS connections for autoscaling#5595

Use backend-listen active WS connections for autoscaling#5595
beastoin merged 1 commit intomainfrom
task/use-backend-listen-active-ws-for-auto-scaling

thainguyensunya commented Mar 13, 2026

Uh oh!

greptile-apps Bot commented Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

greptile-apps Bot Mar 13, 2026

Uh oh!

beastoin commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	stabilizationWindowSeconds: 120
	stabilizationWindowSeconds: 60

Conversation

thainguyensunya commented Mar 13, 2026

Uh oh!

greptile-apps Bot commented Mar 13, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants