Skip to content

Move Alloy metrics agent to hover sidecar#334

Merged
simonsmallchua merged 4 commits into
mainfrom
feat/alloy-sidecar
Apr 18, 2026
Merged

Move Alloy metrics agent to hover sidecar#334
simonsmallchua merged 4 commits into
mainfrom
feat/alloy-sidecar

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented Apr 18, 2026

Summary

  • Removes the separate bee-observability Fly app and runs Grafana Alloy as a sidecar process inside the hover VM
  • Alloy scrapes localhost:9464 (same machine, no network hop) and pushes to Grafana Cloud
  • Credentials moved from hardcoded config to Fly secrets, sourced from 1Password in CI
  • Alloy only starts if GRAFANA_CLOUD_API_KEY is present — review apps automatically skip it unless the secret is set (it now is, so all envs get metrics)
  • WAL capped at 1h, scrape interval increased to 60s, batch size limited to 500 samples — prevents the OOM death spiral seen in bee-observability

Changes

  • Dockerfile — adds Alloy binary from grafana/alloy:latest build stage
  • alloy.river — new config (was gitignored when it contained hardcoded secrets; now uses env())
  • scripts/start.sh — new startup script; handles ulimit, starts Alloy in background, then exec ./main
  • fly.toml — process updated to ./start.sh
  • fly-deploy.yml + review-apps.ymlGRAFANA_CLOUD_USER and GRAFANA_CLOUD_API_KEY added from 1Password hover-runtime
  • .gitignore — removed alloy.river exclusion

After merging

Suspend the old app once production is confirmed healthy:

flyctl apps suspend bee-observability

Summary by CodeRabbit

  • New Features

    • Enabled observability: app metrics are now forwarded to Grafana Cloud for improved monitoring and visibility.
    • Startup now runs a background metrics agent alongside the app when credentials are available.
  • Chores

    • Deployment workflows updated to provision Grafana Cloud credentials to review and production deployments.
    • Observability configuration is now tracked in source control.

@supabase
Copy link
Copy Markdown

supabase Bot commented Apr 18, 2026

Updates to Preview Branch (feat/alloy-sidecar) ↗︎

Deployments Status Updated
Database Sat, 18 Apr 2026 23:48:30 UTC
Services Sat, 18 Apr 2026 23:48:30 UTC
APIs Sat, 18 Apr 2026 23:48:30 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Sat, 18 Apr 2026 23:48:32 UTC
Migrations Sat, 18 Apr 2026 23:48:34 UTC
Seeding Sat, 18 Apr 2026 23:48:35 UTC
Edge Functions Sat, 18 Apr 2026 23:48:36 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d1e68061-04bd-41da-98aa-04d2a84a0178

📥 Commits

Reviewing files that changed from the base of the PR and between cfde0e8 and 9d04368.

📒 Files selected for processing (1)
  • Dockerfile

📝 Walkthrough

Walkthrough

Adds Grafana Alloy observability: new River config tracked in repo, image and entrypoint updated to include an Alloy sidecar, startup script to manage processes and signals, and CI workflows updated to load and sync Grafana Cloud credentials to Fly.

Changes

Cohort / File(s) Summary
Observability config
alloy.river
Adds Prometheus scrape for localhost:9464 and a prometheus.remote_write to Grafana Cloud using sys.env("GRAFANA_CLOUD_USER") and sys.env("GRAFANA_CLOUD_API_KEY"), plus WAL and queue settings.
Entrypoint script
scripts/start.sh
New startup script that sets ulimit, conditionally launches Alloy sidecar when Grafana creds exist, traps SIGINT/SIGTERM to forward to Alloy and main app, runs ./main and exits with its status.
Container & runtime
Dockerfile, fly.toml
Dockerfile adds an Alloy build stage, copies alloy and alloy.river, installs gcompat/ca-certificates, and makes start.sh executable. fly.toml updated to run ./start.sh for the app process.
CI/CD workflows
.github/workflows/fly-deploy.yml, .github/workflows/review-apps.yml
Both workflows load GRAFANA_CLOUD_USER and GRAFANA_CLOUD_API_KEY from 1Password into job env and include them in flyctl secrets set so Fly receives the Grafana Cloud credentials.
VCS
.gitignore
Removed ignore entry for alloy.river, allowing the Alloy configuration to be committed.
sequenceDiagram
    participant Start as Start script (start.sh)
    participant App as Application (./main)
    participant Alloy as Grafana Alloy agent
    participant Grafana as Grafana Cloud

    Start->>Start: set ulimit, check GRAFANA_CLOUD_USER/API_KEY
    alt Grafana creds present
        Start->>Alloy: launch Alloy with /app/alloy.river (background)
        Alloy->>Grafana: remote_write (HTTPS + basic auth)
    else no creds
        Start->>Start: log "skipping Alloy"
    end
    Start->>App: start ./main (background)
    App->>Start: expose metrics on localhost:9464
    Start->>Start: wait for App exit
    Start->>Alloy: forward termination signal and wait (if running)
    Start->>Start: exit with App status
Loading

Possibly related PRs

  • Add cold storage archival system #299 — Modifies the same GitHub Actions deployment and review-apps workflows to add 1Password-loaded Grafana Cloud env vars and include them in flyctl secrets set.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarises the main change: moving the Alloy metrics agent from a separate app to a sidecar process within the hover service.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 18, 2026

Release Versions

App patch: v0.32.7v0.32.8

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-334.fly.dev
Dashboard: https://hover-pr-334.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@alloy.river`:
- Around line 13-16: Replace calls to the non-existent env() with sys.env() for
all environment variable lookups in the basic_auth block (update username and
password assignments to use sys.env("GRAFANA_CLOUD_USER") and
sys.env("GRAFANA_CLOUD_API_KEY")), and move the wal block out of the endpoint
block so that wal is a direct child of the prometheus.remote_write component
(i.e., place wal at the same level as endpoint rather than nested inside
endpoint) to satisfy Alloy schema validation.

In `@Dockerfile`:
- Line 2: Replace the non-deterministic base image declaration "FROM
grafana/alloy:latest AS alloy" with a pinned image by specifying an explicit tag
and digest (e.g., use "grafana/alloy:<version>@sha256:<digest>") so builds are
reproducible; update the FROM line in the Dockerfile to reference the chosen
version and its sha256 digest (obtain the digest from the image registry) and
ensure the alias "AS alloy" remains unchanged.

In `@scripts/start.sh`:
- Around line 7-11: The startup gate currently only checks GRAFANA_CLOUD_API_KEY
so Alloy may start without the required user credential; change the condition in
scripts/start.sh to require both GRAFANA_CLOUD_API_KEY and GRAFANA_CLOUD_USER
before launching the alloy process (the branch that echoes "Starting Alloy
metrics agent" and runs "/usr/local/bin/alloy run --storage.path=/tmp/alloy-wal
/app/alloy.river &"). If either GRAFANA_CLOUD_USER or GRAFANA_CLOUD_API_KEY is
missing, skip starting alloy and log a clear message indicating which
credential(s) are absent.
- Around line 6-15: The script starts Alloy in background and then execs the
main app, so signals aren't forwarded and Alloy can't be shut down gracefully;
modify scripts/start.sh to capture Alloy's PID (e.g., after launching
/usr/local/bin/alloy ... &), install a shell trap for TERM/INT that forwards
signals to the Alloy PID (kill -TERM $ALLOY_PID) and waits for it before
exiting, and run the main application without using exec so the script remains
PID 1 to coordinate shutdown (or ensure the trap also forwards signals to the
main app PID and waits for both); update references around the alloy launch line
and the final exec ./main invocation to implement PID tracking, kill in trap,
and wait.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6cfe3e41-0d8d-4b83-b83d-40b6d06bad1e

📥 Commits

Reviewing files that changed from the base of the PR and between e83157f and eae5aa4.

📒 Files selected for processing (7)
  • .github/workflows/fly-deploy.yml
  • .github/workflows/review-apps.yml
  • .gitignore
  • Dockerfile
  • alloy.river
  • fly.toml
  • scripts/start.sh
💤 Files with no reviewable changes (1)
  • .gitignore

Comment thread alloy.river
Comment thread Dockerfile Outdated
Comment thread scripts/start.sh Outdated
Comment thread scripts/start.sh Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-334.fly.dev
Dashboard: https://hover-pr-334.fly.dev/dashboard

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 29-30: The RUN installing ca-certificates and gcompat currently
uses unpinned packages—update that Dockerfile RUN (the line with "apk --no-cache
add ca-certificates gcompat") to pin each package to the Alpine v3.19 package
versions (e.g., ca-certificates=<version> and gcompat=<version>) so rebuilds are
reproducible; use the exact version strings from Alpine v3.19 package index and
keep the same apk flags (--no-cache) when replacing the unpinned names.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5678eea7-6432-47f2-a527-153db354cedd

📥 Commits

Reviewing files that changed from the base of the PR and between 1201aa4 and cfde0e8.

📒 Files selected for processing (3)
  • Dockerfile
  • alloy.river
  • scripts/start.sh

Comment thread Dockerfile Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-334.fly.dev
Dashboard: https://hover-pr-334.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit fdc3982 into main Apr 18, 2026
11 checks passed
@simonsmallchua simonsmallchua deleted the feat/alloy-sidecar branch April 18, 2026 23:51
@coderabbitai coderabbitai Bot mentioned this pull request Apr 26, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant