fix: Add diagnostic context to scheduler error logs#69
Conversation
All four scheduler ErrorLogger.error calls were missing context objects, making it hard to diagnose failures from the admin UI. Now each includes errorMessage and relevant operational state (activeChecks, phase, table). https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
Cover the String(error) coercion path when non-Error values are thrown in all 4 scheduler error handlers, and verify activeChecks reflects in-flight checks when the iteration fails. https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughEnhanced scheduler error logging: catch blocks now attach an Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/services/scheduler.test.ts`:
- Around line 258-260: The test resolves the in-flight check by calling
resolver!() but does not wait for async cleanup, allowing module-level
activeChecks (used by runCheckWithLimit) to remain incremented and leak into
subsequent tests; after calling resolver!() in the test, await a microtask tick
(e.g., await Promise.resolve() or await new Promise(r=>setImmediate(r))) so the
runCheckWithLimit(...).finally handler runs and activeChecks is decremented
before the test completes.
In `@server/services/scheduler.ts`:
- Around line 75-79: The scheduler currently persists raw exception text into
context.errorMessage via the ErrorLogger.error calls (see usages in the
scheduler iteration and the other instances), which can leak internal DB/network
details; add and use a small helper like sanitizeAndTruncateErrorMessage(err:
unknown): string that strips sensitive details (no stack, no full DB messages),
replaces newlines, and truncates to a safe length, then pass its output instead
of raw error.message to ErrorLogger.error and any context.errorMessage
assignments (update uses around ErrorLogger.error and the code paths referenced
at the other occurrences in the file).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 48a20023-35ae-45d5-8394-1301e53fc30a
📒 Files selected for processing (2)
server/services/scheduler.test.tsserver/services/scheduler.ts
| await ErrorLogger.error("scheduler", "Scheduler iteration failed", error instanceof Error ? error : null, { | ||
| errorMessage: error instanceof Error ? error.message : String(error), | ||
| activeChecks, | ||
| phase: "fetching active monitors", | ||
| }); |
There was a problem hiding this comment.
Avoid persisting raw exception messages in scheduler context.
Line 76/89/96/113 stores raw exception text in context.errorMessage. That can expose internal DB/network details in admin-visible logs. Prefer a sanitized/truncated message helper before persistence.
🔒 Proposed hardening diff
+function toSafeErrorMessage(error: unknown): string {
+ const raw = error instanceof Error ? error.message : String(error);
+ return raw
+ .replace(/(api[_-]?key|token|secret|password)\s*[:=]\s*\S+/gi, "$1=[REDACTED]")
+ .replace(/(postgres(?:ql)?:\/\/)[^\s]+/gi, "$1[REDACTED]")
+ .replace(/([A-Za-z]:\\|\/)[^\s]*\.(ts|js|sql)/g, "[REDACTED_PATH]")
+ .slice(0, 300);
+}
...
- errorMessage: error instanceof Error ? error.message : String(error),
+ errorMessage: toSafeErrorMessage(error),
...
- errorMessage: error instanceof Error ? error.message : String(error),
+ errorMessage: toSafeErrorMessage(error),
...
- errorMessage: error instanceof Error ? error.message : String(error),
+ errorMessage: toSafeErrorMessage(error),
...
- errorMessage: error instanceof Error ? error.message : String(error),
+ errorMessage: toSafeErrorMessage(error),As per coding guidelines, "Check that error responses do not leak internal details (stack traces, DB errors, file paths)."
Also applies to: 88-90, 95-97, 112-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/services/scheduler.ts` around lines 75 - 79, The scheduler currently
persists raw exception text into context.errorMessage via the ErrorLogger.error
calls (see usages in the scheduler iteration and the other instances), which can
leak internal DB/network details; add and use a small helper like
sanitizeAndTruncateErrorMessage(err: unknown): string that strips sensitive
details (no stack, no full DB messages), replaces newlines, and truncates to a
safe length, then pass its output instead of raw error.message to
ErrorLogger.error and any context.errorMessage assignments (update uses around
ErrorLogger.error and the code paths referenced at the other occurrences in the
file).
The in-flight check test calls resolver!() but does not await a microtask tick, so runCheckWithLimit().finally never decrements module-level activeChecks before the test ends. This can leak state into subsequent tests. https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
Summary
The four
ErrorLogger.errorcalls in the scheduler (Scheduler iteration failed,Queued notification processing failed,Digest processing failed,monitor_metrics cleanup failed) were logged without acontextobject. This meant the admin error log UI showed only the error message and stack trace, with no structured diagnostic data to help triage failures. This PR adds context objects to all four calls, surfacing the root cause error message and relevant operational state directly in the admin UI's expandable context panel.Changes
Scheduler error context (
server/services/scheduler.ts):Scheduler iteration failed: now includeserrorMessage,activeChecks(in-flight concurrency count), andphaseQueued notification processing failed: now includeserrorMessageDigest processing failed: now includeserrorMessagemonitor_metrics cleanup failed: now includeserrorMessage,retentionDays, andtableTest coverage (
server/services/scheduler.test.ts):activeChecks > 0when prior checks are still in-flight during iteration failureHow to test
npm run test— all 29 scheduler tests should pass (698 total)/admin/errorserrorMessage,activeChecks, andphasefields are presenthttps://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
Summary by CodeRabbit
Tests
Chores