Silence autoscaler empty-prom error by simonsmallchua · Pull Request #388 · Good-Native/hover

simonsmallchua · 2026-05-12T10:39:18Z

Summary

Wrap the fly-autoscaler PromQL in both fly.autoscaler-worker.toml and fly.autoscaler-analysis.toml with or on() vector(0) so an empty result collapses to zero rather than logging metrics collection failed: empty prometheus result once per minute.
The broker gauges (bee_broker_stream_length, bee_broker_scheduled_zset_depth) are synchronous OTel Int64Gauges — LastValue aggregation only emits a sample when Record() is called inside a collect interval, so the series goes stale in Fly's managed Prometheus during idle. Grafana Cloud confirms the same gappy pattern across all three queried metrics.

Trade-off (documented inline + in CHANGELOG)

A genuine Redis outage previously produced a series gap (autoscaler holds machine count). With this change the gap collapses to 0, so the autoscaler scales to MIN=1. Acceptable because idle workers can't crawl during an outage anyway and restart cleanly once Redis recovers.

The proper fix — converting the broker gauges to async Int64ObservableGauge so they always emit at every collect — will be tracked in a follow-up issue.

Test plan

CI green.
After deploy: flyctl logs -a hover-autoscaler-worker and -a hover-autoscaler-analysis for ~30 min during idle — empty prometheus result log lines should drop to zero.
Sanity check: flyctl status -a hover-worker still shows 1 started machine; -a hover-analysis still shows 1 started.

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

Bug Fixes
- Resolved autoscaler logging errors when metrics data is unavailable. Missing broker metrics are now gracefully treated as zero values instead of generating "metrics collection failed" log entries.
Security
- Updated PostgreSQL driver to address memory-safety concerns.
- Resolved development dependency vulnerabilities in CLI tooling.

Bump pgx/v5 for memory-safety CVE

coderabbitai · 2026-05-12T10:39:29Z

📝 Walkthrough

Walkthrough

The PR fixes autoscaler Prometheus gauge staleness by appending or on() vector(0) to empty result queries, updates three security-sensitive dependencies, and documents both changes in the changelog.

Changes

Security and autoscaler metrics fix

Layer / File(s)	Summary
Autoscaler Prometheus empty-result handling `fly.autoscaler-analysis.toml`, `fly.autoscaler-worker.toml`, `CHANGELOG.md`	`FAS_PROMETHEUS_QUERY` in both autoscaler configs now appends `or on() vector(0)` to convert empty Prometheus results to zero, eliminating "empty prometheus result" log errors. The fix is documented in the Fixed section of the changelog.
Security dependency version bumps `go.mod`, `webflow-designer-extension-cli/package.json`, `CHANGELOG.md`	`pgx/v5` is bumped from v5.7.6 to v5.9.2 for a memory-safety fix, `@webflow/webflow-cli` is upgraded from ^1.12.4 to ^1.21.0 for transitive vulnerability remediation, and the indirect `golang.org/x/crypto v0.50.0` is removed. All changes are recorded in the new Security section of the changelog.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Prometheus queries now return a zero,
No more empty logs to make us a hero,
Dependencies patched for safety so sound,
Staleness and vulnerabilities bound!
A quiet fix that scales our delight. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Silence autoscaler empty-prom error' directly summarizes the main change: fixing the log noise from empty Prometheus results in the autoscaler by wrapping PromQL queries with 'or on() vector(0)'.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch work/autoscaler-promql-vector0

⚔️ Resolve merge conflicts

Resolve merge conflict in branch work/autoscaler-promql-vector0

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

supabase · 2026-05-12T10:39:33Z

Updates to Preview Branch (work/autoscaler-promql-vector0) ↗︎

Deployments	Status	Updated
Database	✅	Tue, 12 May 2026 10:40:12 UTC
Services	✅	Tue, 12 May 2026 10:40:12 UTC
APIs	✅	Tue, 12 May 2026 10:40:12 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks	Status	Updated
Configurations	✅	Tue, 12 May 2026 10:40:23 UTC
Migrations	✅	Tue, 12 May 2026 10:41:13 UTC
Seeding	✅	Tue, 12 May 2026 10:41:22 UTC
Edge Functions	✅	Tue, 12 May 2026 10:41:22 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

github-actions · 2026-05-12T10:39:39Z

Release Versions

App patch: v0.34.14 → v0.34.15

Changelog

Fixed

fly-autoscaler no longer logs
metrics collection failed: empty prometheus result once a minute on both
hover-autoscaler-worker and hover-autoscaler-analysis. The broker gauges
(bee_broker_stream_length, bee_broker_scheduled_zset_depth) are
synchronous OTel Int64Gauges, which only emit when Record() lands inside a
collect interval; during idle the series goes stale in Fly's managed
Prometheus and the autoscaler's PromQL returns no result. The autoscaler
queries now wrap with or on() vector(0) so an empty result collapses to zero
rather than erroring. Scaling behaviour is unchanged at idle (the existing
max(1, …) floor already kept a single machine running). Trade-off documented
inline: a true Redis outage now reads 0 instead of producing a series gap,
so the autoscaler scales to MIN=1 rather than holding count — acceptable
because idle workers can't crawl during an outage anyway and restart cleanly
once Redis recovers. The full fix (async observable gauges) is tracked in a
follow-up issue.

Security

Bump github.com/jackc/pgx/v5 from v5.7.6 to v5.9.2 to resolve a
memory-safety vulnerability (Dependabot alert chore/Add Codecov static analysis configuration #54).
Bump @webflow/webflow-cli from ^1.12.4 to ^1.21.0 in
webflow-designer-extension-cli/ to clear transitive dev-dep vulnerabilities
(axios, follow-redirects, fast-uri, babel, postcss). Webflow extension is
dev-only tooling and does not ship to production.

coderabbitai

🧹 Nitpick comments (1)

fly.autoscaler-worker.toml (1)

35-38: Add a dedicated outage signal now that gaps are masked.

Because Line 35–Line 38 intentionally converts outage-like gaps into 0, consider alerting on a direct Redis health metric (or broker probe heartbeat) so outages remain immediately visible independent of backlog maths.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@fly.autoscaler-worker.toml` around lines 35 - 38, The backlog masking
currently converts outage-like gaps to 0 which hides Redis outages; add a
dedicated outage signal by instrumenting a direct Redis/broker health metric
(e.g., export a redis_up gauge or a broker_probe_heartbeat TTL counter)
alongside the existing autoscaler backlog logic, then add an alerting rule that
fires when redis_up == 0 or broker_probe_heartbeat has not been updated for X
seconds; keep the existing gap-to-0 behavior in the autoscaler (the "gap
masking" logic) but ensure monitoring/alerting uses the new
redis_up/broker_probe_heartbeat metric so outages remain immediately visible and
actionable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@fly.autoscaler-worker.toml`:
- Around line 35-38: The backlog masking currently converts outage-like gaps to
0 which hides Redis outages; add a dedicated outage signal by instrumenting a
direct Redis/broker health metric (e.g., export a redis_up gauge or a
broker_probe_heartbeat TTL counter) alongside the existing autoscaler backlog
logic, then add an alerting rule that fires when redis_up == 0 or
broker_probe_heartbeat has not been updated for X seconds; keep the existing
gap-to-0 behavior in the autoscaler (the "gap masking" logic) but ensure
monitoring/alerting uses the new redis_up/broker_probe_heartbeat metric so
outages remain immediately visible and actionable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 223c7f38-d49c-4908-aa13-663fd2f24d2d

📥 Commits

Reviewing files that changed from the base of the PR and between cf08a89 and 4442ed3.

⛔ Files ignored due to path filters (2)

go.sum is excluded by !**/*.sum
webflow-designer-extension-cli/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (5)

CHANGELOG.md
fly.autoscaler-analysis.toml
fly.autoscaler-worker.toml
go.mod
webflow-designer-extension-cli/package.json

codecov · 2026-05-12T10:41:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

github-actions · 2026-05-12T10:43:11Z

🐝 Review App Deployed

Homepage: https://hover-pr-388.fly.dev
Dashboard: https://hover-pr-388.fly.dev/dashboard

Silence autoscaler empty-prom error

simonsmallchua added 2 commits May 12, 2026 20:33

Merge pull request #387 from Good-Native/work/pgx-bump-cve

8c0dc0b

Bump pgx/v5 for memory-safety CVE

Silence autoscaler empty-prom error

4442ed3

simonsmallchua mentioned this pull request May 12, 2026

Convert bee_broker_* gauges to async Int64ObservableGauge #389

Open

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

simonsmallchua merged commit c6e735f into main May 13, 2026
21 checks passed

simonsmallchua deleted the work/autoscaler-promql-vector0 branch May 13, 2026 02:22

simonsmallchua added a commit that referenced this pull request May 13, 2026

Merge pull request #388 from Good-Native/work/autoscaler-promql-vector0

598f771

Silence autoscaler empty-prom error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silence autoscaler empty-prom error#388

Silence autoscaler empty-prom error#388
simonsmallchua merged 2 commits into
mainfrom
work/autoscaler-promql-vector0

simonsmallchua commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

supabase Bot commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

simonsmallchua commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Trade-off (documented inline + in CHANGELOG)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

supabase Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Release Versions

Changelog

Fixed

Security

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simonsmallchua commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

supabase Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading