Skip to content

fix: Add diagnostic context to scheduler error logs#69

Merged
bd73-com merged 3 commits intomainfrom
claude/improve-error-handling-coXwU
Mar 4, 2026
Merged

fix: Add diagnostic context to scheduler error logs#69
bd73-com merged 3 commits intomainfrom
claude/improve-error-handling-coXwU

Conversation

@bd73-com
Copy link
Owner

@bd73-com bd73-com commented Mar 4, 2026

Summary

The four ErrorLogger.error calls in the scheduler (Scheduler iteration failed, Queued notification processing failed, Digest processing failed, monitor_metrics cleanup failed) were logged without a context object. This meant the admin error log UI showed only the error message and stack trace, with no structured diagnostic data to help triage failures. This PR adds context objects to all four calls, surfacing the root cause error message and relevant operational state directly in the admin UI's expandable context panel.

Changes

Scheduler error context (server/services/scheduler.ts):

  • Scheduler iteration failed: now includes errorMessage, activeChecks (in-flight concurrency count), and phase
  • Queued notification processing failed: now includes errorMessage
  • Digest processing failed: now includes errorMessage
  • monitor_metrics cleanup failed: now includes errorMessage, retentionDays, and table

Test coverage (server/services/scheduler.test.ts):

  • Updated 5 existing test assertions to verify the new context objects
  • Added 5 new tests:
    • Non-Error thrown in scheduler iteration (String coercion path)
    • activeChecks > 0 when prior checks are still in-flight during iteration failure
    • Non-Error thrown in notification processing
    • Non-Error thrown in digest processing
    • Non-Error thrown in metrics cleanup

How to test

  1. Run npm run test — all 29 scheduler tests should pass (698 total)
  2. Trigger a scheduler error (e.g., database downtime) and check the admin error log UI at /admin/errors
  3. Expand the context panel on a "Scheduler iteration failed" entry — verify errorMessage, activeChecks, and phase fields are present

https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX

Summary by CodeRabbit

  • Tests

    • Expanded coverage for non-Error rejections and combined-failure scenarios, ensuring structured error payloads include errorMessage and contextual fields where applicable.
  • Chores

    • Improved scheduler logging to consistently capture errorMessage and operation-specific context (phases, counts, retention metadata) for better diagnostics.

claude added 2 commits March 4, 2026 08:43
All four scheduler ErrorLogger.error calls were missing context objects,
making it hard to diagnose failures from the admin UI. Now each includes
errorMessage and relevant operational state (activeChecks, phase, table).

https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
Cover the String(error) coercion path when non-Error values are thrown
in all 4 scheduler error handlers, and verify activeChecks reflects
in-flight checks when the iteration fails.

https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
@github-actions github-actions bot added the fix label Mar 4, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 688de773-aac2-44cb-8d85-351aa30c0648

📥 Commits

Reviewing files that changed from the base of the PR and between 85d6fc6 and a59b356.

📒 Files selected for processing (1)
  • server/services/scheduler.test.ts

📝 Walkthrough

Walkthrough

Enhanced scheduler error logging: catch blocks now attach an errorMessage (stringified non-Error rejections) and contextual fields (e.g., phase, activeChecks, retentionDays, table). Tests expanded to assert these structured error payloads across scheduler tasks and failure scenarios.
Security note: augmented logs may surface sensitive data; review log retention and redaction.

Changes

Cohort / File(s) Summary
Scheduler Error Logging
server/services/scheduler.ts
Catch blocks updated to include errorMessage for non-Error rejections and contextual metadata: phase, activeChecks (monitor fetching), retentionDays, and table (metrics cleanup).
Scheduler Error Handling Tests
server/services/scheduler.test.ts
Expanded tests to cover string/non-Error rejections, verify errorMessage presence, assert activeChecks behavior during concurrent/in-flight checks, extend metrics cleanup and cron-related tests, and add combined-failure scenario assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and directly describes the main change: adding diagnostic context to scheduler error logs. It accurately reflects the core objective of enriching error payloads with structured metadata.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/improve-error-handling-coXwU

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/services/scheduler.test.ts`:
- Around line 258-260: The test resolves the in-flight check by calling
resolver!() but does not wait for async cleanup, allowing module-level
activeChecks (used by runCheckWithLimit) to remain incremented and leak into
subsequent tests; after calling resolver!() in the test, await a microtask tick
(e.g., await Promise.resolve() or await new Promise(r=>setImmediate(r))) so the
runCheckWithLimit(...).finally handler runs and activeChecks is decremented
before the test completes.

In `@server/services/scheduler.ts`:
- Around line 75-79: The scheduler currently persists raw exception text into
context.errorMessage via the ErrorLogger.error calls (see usages in the
scheduler iteration and the other instances), which can leak internal DB/network
details; add and use a small helper like sanitizeAndTruncateErrorMessage(err:
unknown): string that strips sensitive details (no stack, no full DB messages),
replaces newlines, and truncates to a safe length, then pass its output instead
of raw error.message to ErrorLogger.error and any context.errorMessage
assignments (update uses around ErrorLogger.error and the code paths referenced
at the other occurrences in the file).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48a20023-35ae-45d5-8394-1301e53fc30a

📥 Commits

Reviewing files that changed from the base of the PR and between 72c010b and 85d6fc6.

📒 Files selected for processing (2)
  • server/services/scheduler.test.ts
  • server/services/scheduler.ts

Comment on lines +75 to +79
await ErrorLogger.error("scheduler", "Scheduler iteration failed", error instanceof Error ? error : null, {
errorMessage: error instanceof Error ? error.message : String(error),
activeChecks,
phase: "fetching active monitors",
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid persisting raw exception messages in scheduler context.

Line 76/89/96/113 stores raw exception text in context.errorMessage. That can expose internal DB/network details in admin-visible logs. Prefer a sanitized/truncated message helper before persistence.

🔒 Proposed hardening diff
+function toSafeErrorMessage(error: unknown): string {
+  const raw = error instanceof Error ? error.message : String(error);
+  return raw
+    .replace(/(api[_-]?key|token|secret|password)\s*[:=]\s*\S+/gi, "$1=[REDACTED]")
+    .replace(/(postgres(?:ql)?:\/\/)[^\s]+/gi, "$1[REDACTED]")
+    .replace(/([A-Za-z]:\\|\/)[^\s]*\.(ts|js|sql)/g, "[REDACTED_PATH]")
+    .slice(0, 300);
+}
...
-        errorMessage: error instanceof Error ? error.message : String(error),
+        errorMessage: toSafeErrorMessage(error),
...
-        errorMessage: error instanceof Error ? error.message : String(error),
+        errorMessage: toSafeErrorMessage(error),
...
-        errorMessage: error instanceof Error ? error.message : String(error),
+        errorMessage: toSafeErrorMessage(error),
...
-        errorMessage: error instanceof Error ? error.message : String(error),
+        errorMessage: toSafeErrorMessage(error),

As per coding guidelines, "Check that error responses do not leak internal details (stack traces, DB errors, file paths)."

Also applies to: 88-90, 95-97, 112-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.ts` around lines 75 - 79, The scheduler currently
persists raw exception text into context.errorMessage via the ErrorLogger.error
calls (see usages in the scheduler iteration and the other instances), which can
leak internal DB/network details; add and use a small helper like
sanitizeAndTruncateErrorMessage(err: unknown): string that strips sensitive
details (no stack, no full DB messages), replaces newlines, and truncates to a
safe length, then pass its output instead of raw error.message to
ErrorLogger.error and any context.errorMessage assignments (update uses around
ErrorLogger.error and the code paths referenced at the other occurrences in the
file).

The in-flight check test calls resolver!() but does not await a
microtask tick, so runCheckWithLimit().finally never decrements
module-level activeChecks before the test ends. This can leak
state into subsequent tests.

https://claude.ai/code/session_01LMwiPdJg4AjbhSMTKEp1AX
@bd73-com bd73-com merged commit 32bf75b into main Mar 4, 2026
1 of 2 checks passed
@bd73-com bd73-com deleted the claude/improve-error-handling-coXwU branch March 4, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants