fix(selfhost): stop logging routine maintenance-admission backpressure as a warning#3310
Conversation
…e as a warning The maintenance-admission-deferred event fires whenever the queue defers a background job under normal pressure -- expected steady-state behavior, not an operational problem, so it should not sit at warn level next to real failures. Downgrade both the SQLite and Postgres queue drivers to log it at info, matching how other routine/expected events are leveled elsewhere. Also adds a "Maintenance Admission Deferrals (total)" panel next to the existing by-reason breakdown in the Runtime Pressure & Maintenance section, using the two counters already recorded alongside this log line.
|
Superagent didn't find any vulnerabilities or security issues in this PR. |
|
Warning 🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨 ⏸️ Gittensory review result - manual review recommendedReview updated: 2026-07-05 02:49:19 UTC
⏸️ Suggested Action - Manual Review
Review summary Nits — 3 non-blocking
Review context
Contributor next steps
Signal definitions
🟩 Safe / merged · 🟦 Advisory · 🟨 Held for review · 🟥 Blocked / closed 💰 Earn for open-source contributions like this. Gittensor lets GitHub contributors earn for the work they already do — register to start earning →. Checked by Gittensory, a quiet PR intelligence layer for OSS maintainers.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3310 +/- ##
=======================================
Coverage 94.14% 94.14%
=======================================
Files 276 276
Lines 30246 30246
Branches 11021 11021
=======================================
Hits 28474 28474
Misses 1127 1127
Partials 645 645
🚀 New features to boost your workflow:
|
Summary
selfhost_queue_maintenance_admission_deferredwas logged atconsole.warn/level: "warn"in both the SQLite (src/selfhost/sqlite-queue.ts) and Postgres (src/selfhost/pg-queue.ts) queue drivers, even though it fires on ordinary, expected backpressure -- a maintenance-lane job getting pushed back because the live/maintenance/backlog lanes or host load are busy. That's routine steady-state behavior, not something worth flagging at warn severity next to real failures, so operators grepping warn-level logs see noise from a well-functioning system. This PR downgrades both call sites toconsole.log/level: "info", matching how other routine/expected events in this codebase are leveled (e.g.check_run_cross_app_repostinsrc/github/app.ts, and the un-leveledconsole.logbackfill/recovery events insqlite-queue.ts). The metric-recording lines directly above each log call (gittensory_jobs_maintenance_admission_deferred_totalandgittensory_jobs_maintenance_admission_deferred_by_reason_total) are untouched -- no new metric was needed, both already exist and are already asserted in the unit suites.grafana/dashboards/gittensory.json) filters on a populated top-level JSON field literally namederror(| json eventf="event", errf="error" | errf != ""). This log line never sets anerrorfield -- onlyevent/jobType/reason/retry_after_ms-- so it never actually matched that panel, even while it sat at warn. That panel was not polluted and is not being changed for that reason; the log-level change is about the log's own severity being wrong for raw log inspection (journalctl/docker logs/any warn-level grep), independent of that panel.gittensory_jobs_maintenance_admission_deferred_totalrate is visible alongside the by-reason breakdown that already existed. No new metric was added -- both panels chart counters the queue drivers already record.Scope
type(scope): short summaryConventional Commit format, for examplefix(api): restore profile access checks.CONTRIBUTING.mdand does not reintroduce GitHub Pages, VitePress,site/, orCNAME.preferredlinked-issue policy.)Validation
git diff --checknpm run actionlintnpm run typechecknpm run test:coveragelocally;codecov/patchrequires ≥99% coverage of the lines AND branches you changed (aim for 100% on your diff so CI variance does not fail near the threshold). Global coverage is a non-blocking trend with a loose 90% backstop, not the gate.npm run test:workersnpm run build:mcpnpm run test:mcp-packnpm run ui:openapi:checknpm run ui:lintnpm run ui:typechecknpm run ui:buildnpm audit --audit-level=moderateRan the full local gate via
npm run test:ci(aggregates all of the above plusdb:migrations:check,test:coverageunsharded,ui:test,ui:openapi, and more) -- fully green, plusnpm audit --audit-level=moderateclean. Also confirmed vianpm run ui:openapi:checkandnpm run cf-typegen:checkthat neither generated artifact needed regeneration (no API/schema or wrangler binding changes here), andnpm run db:migrations:checkconfirms no DB migration was needed.Added a targeted regression test in each of
test/unit/selfhost-sqlite-queue.test.tsandtest/unit/selfhost-pg-queue.test.tsthat spies onconsole.log/console.warnand asserts the deferred-admission event is emitted atlevel: "info"viaconsole.logand never viaconsole.warn. Added a test intest/unit/selfhost-grafana-dashboard.test.tsasserting both the by-reason panel and the new total-deferrals panel are present with their expected PromQL expressions.Safety
apps/gittensory-uichange; this is a Grafana dashboard JSON + backend log level.)UI Evidencesection below with screenshots. (N/A -- this changes a Grafana dashboard definition file and a backend log statement, not the product UI; no screenshot evidence applies.)Notes
sqlite-queue.tsandpg-queue.ts) are parallel queue backends implementing the same admission logic, so the log-level change was made identically in both, mirroring the existing pattern where both files carry duplicated logging/metric blocks for the same events.