Skip to content

ci: add metrics DB review label gate#1425

Merged
suisuss merged 1 commit into
stagingfrom
ci/KEEP-680-metrics-db-review-gate
Jun 1, 2026
Merged

ci: add metrics DB review label gate#1425
suisuss merged 1 commit into
stagingfrom
ci/KEEP-680-metrics-db-review-gate

Conversation

@suisuss
Copy link
Copy Markdown

@suisuss suisuss commented Jun 1, 2026

Follow-up from the 2026-05-29 DB CPU spike (P1). The incident-class change - a heavy aggregate query added without index analysis, or the metrics DB pool concurrency raised without measurement - only manifests at prod data volume and cannot be caught by tests, so the guardrail lives in CI.

What this adds

A required-status-check-shaped workflow (metrics-db-review-gate.yml) that forces a human sign-off when a PR changes metrics code:

  • Always runs (no path filter on the trigger) so it never sits pending-and-blocking as a skipped required check. Path match is computed inside the job, mirroring pr-title-check.yml.
  • Triggers on PRs to staging and prod, with labeled/unlabeled types so the check re-evaluates the instant the label is flipped (not just on the next push).
  • git diff over app/api/metrics, lib/metrics, and lib/db/index.ts. The metrics trees are gated wholesale; lib/db/index.ts is gated as a single file because it holds the metricsDb pool and its concurrency cap (the incident lever) while the rest of lib/db (schema, query helpers) churns too often to gate without turning the label into a rubber stamp.
  • No matching changes: passes as a no-op.
  • Matching changes: fails until the metrics-db-reviewed label is present AND was applied by someone other than the PR author. The non-author check resolves the labeler from the issue timeline (the webhook only names the actor on the labeled event, and a later push would erase that signal).

The label asserts a reviewer confirmed any new/changed aggregate queries are optimised, the scanned tables are appropriately indexed, and any metricsDb pool concurrency change was measured against the scrape timeout. It is a pure attestation (no EXPLAIN artifact required).

Activation (post-merge, manual)

This workflow is inert until it is marked a required check in branch protection for staging and prod. The check name is metrics-db-review-gate. It only appears in the required-checks list after it has run on at least one PR, so the order is: merge -> let it run once -> add as required.

The metrics-db-reviewed label has been created in the repo.

Gate PRs touching app/api/metrics, lib/metrics, and lib/db/index.ts
(the metricsDb pool concurrency cap) behind a metrics-db-reviewed label
applied by a non-author. Guardrail for the 2026-05-29 DB CPU incident
class, which only manifests at prod data volume.
@suisuss suisuss force-pushed the ci/KEEP-680-metrics-db-review-gate branch from 92faf19 to cc3a136 Compare June 1, 2026 07:25
@suisuss suisuss changed the title ci: KEEP-680 add metrics DB review label gate ci: add metrics DB review label gate Jun 1, 2026
@suisuss suisuss merged commit c8538d1 into staging Jun 1, 2026
43 checks passed
@suisuss suisuss deleted the ci/KEEP-680-metrics-db-review-gate branch June 1, 2026 07:30
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • Namespace: pr-1425
  • All Helm releases (Keeperhub, Scheduler, Event services)
  • PostgreSQL Database (including data)
  • LocalStack, Redis
  • All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

ℹ️ No PR Environment to Clean Up

No PR environment was found for this PR. This is expected if:

  • The PR never had the deploy-pr-environment label
  • The environment was already cleaned up
  • The deployment never completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant