Skip to content

fix: reduce scraper and scheduler error log noise#267

Merged
bd73-com merged 6 commits intomainfrom
claude/fix-scraper-scheduler-errors-smjDW
Mar 26, 2026
Merged

fix: reduce scraper and scheduler error log noise#267
bd73-com merged 6 commits intomainfrom
claude/fix-scraper-scheduler-errors-smjDW

Conversation

@bd73-com
Copy link
Owner

@bd73-com bd73-com commented Mar 25, 2026

Summary

Reduces error log noise in the scraper and scheduler by eliminating duplicate logging, downgrading transient/expected errors from ERROR to WARNING, and adding retry wrappers to DB operations that lacked them. This dramatically improves the signal-to-noise ratio in the admin error dashboard — genuine problems (schema errors, SSRF attempts, persistent failures) stay at ERROR level while recoverable conditions (connection drops, timeouts, pool exhaustion) become warnings.

Changes

Error logging consolidation

  • Remove duplicate ErrorLogger.error from extractWithBrowserless — caller (checkMonitor) already logs with fuller context
  • Remove premature ErrorLogger.error from fetchWithCurl — this is a fallback; the pipeline continues to Browserless

Transient error classification

  • Extract isTransientDbError() to shared server/utils/dbErrors.ts — used by both scheduler and scraper for consistent classification
  • Extract logSchedulerError() helper in scheduler — encapsulates the transient-vs-permanent branching pattern (was repeated 7x)
  • Separate SSRF-blocked errors from network errors in classifyOuterError() — SSRF blocks get dedicated "ssrf_blocked" logContext and stay at ERROR level
  • Exclude permanent network errors (ENOTFOUND, certificate/SSL/TLS) from transient downgrade — these indicate misconfigurations, not temporary conditions
  • "database error" (schema/constraint) stays at ERROR level; only "database connection error" is transient

Retry wrappers

  • Wrap processQueuedNotifications() and processDigestCron() in withDbRetry
  • Wrap all 3 daily cleanup operations (monitor_metrics, delivery_log, notification_queue) in withDbRetry

Hardening

  • Add try/catch to logSchedulerError so logging failures during DB outage don't mask the original error
  • DB save failure in scraper now uses isTransientDbError — transient → warning, non-transient → error

Tests

  • 8 new tests covering transient warning paths for notification, digest, webhook, and cleanup
  • Tests for SSRF classification, permanent network error exclusion, classifyOuterError categories
  • Updated existing tests to match new severity levels and message formats

Test plan

  • npm run check passes (TypeScript)
  • npm run test passes (all 1765 tests)
  • npm run build passes (production build)
  • Deploy and monitor admin error dashboard — expect significantly fewer ERROR entries
  • Verify SSRF-blocked URLs still appear at ERROR level
  • Verify permanent failures (bad domain, expired cert) still appear at ERROR level

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk

Summary by CodeRabbit

  • Bug Fixes
    • Improved resilience of scheduled cleanup and notification tasks by automatically retrying on temporary database connection failures.
    • Enhanced error handling to distinguish transient database issues from permanent failures, reducing false error alerts.
    • Refined web scraping error classification and recovery logic for better resilience against temporary network and database issues.

- Eliminate duplicate Browserless error logging (extractWithBrowserless
  no longer logs since the caller checkMonitor already logs with fuller
  context)
- Remove premature error logging from fetchWithCurl (it's a fallback —
  the pipeline continues to Browserless)
- Downgrade transient network/DB errors in scraper outer catch from
  ERROR to WARNING (they're retried automatically)
- Downgrade "rendered page extraction failed" to WARNING (expected for
  sites that block headless browsers, handled by circuit breaker)
- Downgrade "check succeeded but failed to save result" to WARNING
  (marked for accelerated retry, self-heals on next cycle)
- Wrap scheduler notification, digest, webhook, and cleanup DB
  operations in withDbRetry for transient connection errors
- Downgrade all scheduler transient DB failures to WARNING level
- Downgrade daily cleanup failures to WARNING (non-critical background
  tasks that retry tomorrow)

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
@github-actions github-actions bot added the fix label Mar 25, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Scheduler and scraper error handling were changed to detect transient DB/network failures and log them as warnings with a “(transient, will retry)” suffix; DB-affecting scheduler operations were wrapped with withDbRetry(...); several extraction/save paths now rethrow instead of logging so callers handle logging.

Changes

Cohort / File(s) Summary
Scheduler: core & tests
server/services/scheduler.ts, server/services/scheduler.test.ts
Replaced file-local transient DB classifier with shared isTransientDbError(...); added logSchedulerError(...) to route transient cases to ErrorLogger.warning (with “(transient, will retry)” text) and non-transient to ErrorLogger.error; wrapped DB cleanup ops in withDbRetry(...); tests updated to expect warning-level logs and revised metadata/messages.
Scraper: runtime & tests
server/services/scraper.ts, server/services/scraper.test.ts
Added explicit SSRF classification (logContext: "ssrf_blocked"); removed direct logging from extractWithBrowserless/fetchWithCurl (they rethrow now); checkMonitor centralizes logging and classifies retryable DB/network failures via isTransientDbError(...), logging transient cases as warnings and others as errors; tests adjusted to new classifications and rethrow behavior.
DB error utilities
server/utils/dbErrors.ts
New exported isTransientDbError(err: unknown): boolean detecting Postgres-style connection codes and common connection/pool/timeout substrings; used by scheduler/scraper to decide warning vs error.
Tests: expanded transient cases
server/services/*test.ts (scheduler & scraper)
Expanded/modified tests to assert ErrorLogger.warning for transient DB/network failures across daily cleanup, queued notifications, digest cron, webhook retry processing, and re-entrancy/retry scenarios; retained ErrorLogger.error assertions for non-transient failures.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler
    participant Retry as withDbRetry
    participant DB as Database
    participant Logger as ErrorLogger
    participant Scraper as Scraper

    Scheduler->>Retry: invoke DB cleanup via withDbRetry(...)
    Retry->>DB: perform delete/cleanup operation
    alt DB success
        DB-->>Retry: success
        Retry-->>Scheduler: continue
    else Transient DB error
        DB-->>Retry: transient error
        Retry->>Logger: ErrorLogger.warning("... (transient, will retry)", { errorMessage, ... })
        Retry-->>Scheduler: backoff/retry flow
    else Non-transient DB error
        DB-->>Retry: fatal error
        Retry->>Logger: ErrorLogger.error("... failed", error, context)
        Retry-->>Scheduler: handle/propagate failure
    end

    Scheduler->>Scraper: call extract/fetch (rethrows on failure)
    Scraper->>Scheduler: throws classified error (e.g., ssrf_blocked / network / db)
    alt ssrf_blocked
        Scheduler->>Logger: ErrorLogger.error("Browserless SSRF blocked", error, { logContext:"ssrf_blocked" })
    else transient network/db
        Scheduler->>Logger: ErrorLogger.warning("... (transient, will retry)", { errorMessage,... })
    else non-transient
        Scheduler->>Logger: ErrorLogger.error("... failed", error, context)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Security note: lowering log severity for retryable failures can reduce visibility of recurring systemic issues; verify monitoring/alerting still surfaces persistent or repeated warnings.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: reduce scraper and scheduler error log noise' accurately reflects the main objective of this changeset: reducing ERROR-level logging by downgrading recoverable failures to WARNING and removing duplicate logs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-scraper-scheduler-errors-smjDW

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
server/services/scheduler.ts (1)

327-349: ⚠️ Potential issue | 🟠 Major

Webhook processing still skips retry for several DB calls.

This block now claims transient DB errors are retried before logging, but getMonitor(), getMonitorChannels(), getMonitorChangeById(), and the early updateDeliveryLog(...failed) branches still bypass withDbRetry(). A connection blip in any of those calls aborts the batch and logs after zero retries.

Suggested fix
-          const monitor = await storage.getMonitor(entry.monitorId);
+          const monitor = await withDbRetry(() => storage.getMonitor(entry.monitorId));
           if (!monitor) {
-            await storage.updateDeliveryLog(entry.id, { status: "failed" });
+            await withDbRetry(() => storage.updateDeliveryLog(entry.id, { status: "failed" }));
             continue;
           }

-          const channels = await storage.getMonitorChannels(monitor.id);
+          const channels = await withDbRetry(() => storage.getMonitorChannels(monitor.id));
           const webhookChannel = channels.find((c) => c.channel === "webhook" && c.enabled);
           if (!webhookChannel) {
-            await storage.updateDeliveryLog(entry.id, { status: "failed" });
+            await withDbRetry(() => storage.updateDeliveryLog(entry.id, { status: "failed" }));
             continue;
           }

-          const change = await storage.getMonitorChangeById(entry.changeId);
+          const change = await withDbRetry(() => storage.getMonitorChangeById(entry.changeId));
           if (!change || change.monitorId !== monitor.id) {
-            await storage.updateDeliveryLog(entry.id, { status: "failed" });
+            await withDbRetry(() => storage.updateDeliveryLog(entry.id, { status: "failed" }));
             continue;
           }

Also applies to: 386-394

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.ts` around lines 327 - 349, The code is calling
storage.getMonitor, storage.getMonitorChannels, storage.getMonitorChangeById and
early storage.updateDeliveryLog directly which bypasses the retry logic; wrap
these DB calls with the existing withDbRetry utility (e.g., call withDbRetry(()
=> storage.getMonitor(entry.monitorId)), withDbRetry(() =>
storage.getMonitorChannels(monitor.id)), withDbRetry(() =>
storage.getMonitorChangeById(entry.changeId)), and use withDbRetry for the
updateDeliveryLog(...) calls used in the early-fail branches) so transient DB
errors are retried instead of immediately marking delivery logs failed; apply
same change for the similar block around lines 386-394.
server/services/scraper.test.ts (1)

4217-4229: ⚠️ Potential issue | 🟠 Major

Don't log raw extracted values on save failures.

This expectation turns arbitrary scraped page content into part of the warning-log contract. Because selectors are user-controlled, extractedValue and previousValue can contain sensitive user data; log redacted metadata instead of raw values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 4217 - 4229, The test is
asserting raw scraped values are logged; change the expectation to assert
redacted metadata instead of raw values: update the ErrorLogger.warning
assertion (for "scraper") to check for keys like monitorId, changed, dbError,
retryError and a redaction marker (e.g., extractedValue:
expect.stringContaining("[REDACTED]") or
expect.objectContaining({extractedValueRedacted: true}) ) rather than "$49.99"
and "$39.99"; keep the message check (expect.stringContaining("check succeeded
but failed to save result")) and other fields (monitorId, changed, dbError,
retryError) intact so the contract validates redaction rather than raw content.
server/services/scheduler.test.ts (2)

932-949: ⚠️ Potential issue | 🟡 Minor

Also assert no legacy Scheduler iteration failed ERROR is emitted.

Line 944 only checks for a warning. It does not fail if the old error-level log is still emitted alongside the warning.

Suggested assertion addition
   expect(ErrorLogger.warning).toHaveBeenCalledWith(
     "scheduler",
     expect.stringContaining("Scheduler iteration skipped"),
     expect.objectContaining({ activeChecks: 0 })
   );
+  expect(ErrorLogger.error).not.toHaveBeenCalledWith(
+    "scheduler",
+    "Scheduler iteration failed",
+    expect.anything(),
+    expect.anything()
+  );

As per coding guidelines, "/*.test.ts: Tests must cover edge cases and error paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 932 - 949, The test "logs
warning when retry also fails on transient error" currently asserts a warning
but doesn't verify the legacy error-level log isn't emitted; update the test
that calls runCron("* * * * *") (the it block for this behavior) to also assert
ErrorLogger.error was not called with the legacy "Scheduler iteration failed"
message (or not called at all for "scheduler" context), e.g. add an assertion
referencing mockGetAllActiveMonitors and ErrorLogger (the existing symbols) to
ensure ErrorLogger.error was not invoked with expect.stringContaining("Scheduler
iteration failed") or that ErrorLogger.error.mock.calls contain no entry for the
"scheduler" context. Ensure you use the same test scope and mocks so the
negative assertion runs after the existing expect(ErrorLogger.warning) check.

667-697: ⚠️ Potential issue | 🟡 Minor

Assert the ERROR path is absent for cleanup failures.

These tests only verify ErrorLogger.warning. If the implementation logs both warning and error, this still passes and reintroduces the noise this PR is intended to remove.

Suggested test hardening
   expect(ErrorLogger.warning).toHaveBeenCalledWith(
     "scheduler",
     "monitor_metrics cleanup failed (will retry tomorrow)",
     expect.objectContaining({
       errorMessage: "DB timeout",
       retentionDays: 90,
       table: "monitor_metrics",
     })
   );
+  expect(ErrorLogger.error).not.toHaveBeenCalled();
   expect(ErrorLogger.warning).toHaveBeenCalledWith(
     "scheduler",
     "monitor_metrics cleanup failed (will retry tomorrow)",
     expect.objectContaining({
       errorMessage: "disk full",
       retentionDays: 90,
       table: "monitor_metrics",
     })
   );
+  expect(ErrorLogger.error).not.toHaveBeenCalled();

As per coding guidelines, "/*.test.ts: Tests must cover edge cases and error paths."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 667 - 697, The tests
currently only assert ErrorLogger.warning is called on cleanup failures; update
both tests ("logs warning when cleanup query fails..." and "handles non-Error
thrown...") to also assert that ErrorLogger.error is NOT called — e.g. after
runCron(...) assert ErrorLogger.error was not invoked — so you ensure no ERROR
path is logged for non-critical cleanup failures; locate assertions around
startScheduler(), mockDbExecute.mockRejectedValueOnce(...) and runCron(...) and
add the negative assertion against ErrorLogger.error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/services/scheduler.ts`:
- Around line 230-244: The isTransientDbError function is missing
pool-exhaustion cases so PostgreSQL errors like code "53300" and "57P03" and
messages such as "too many clients" or "remaining connection slots are reserved
for non-replication superuser connections" are being treated as non-transient;
update isTransientDbError to also return true for error.code === "53300" and
error.code === "57P03" and to match message patterns /too many clients/i and
/remaining connection slots are reserved/i (or similar case-insensitive
substring checks) so the scheduler logs these as warnings (as in the existing
ErrorLogger.warning path) instead of errors.

In `@server/services/scraper.test.ts`:
- Around line 5833-5862: The tests currently only assert ErrorLogger.error
wasn't called; instead ensure ErrorLogger emits no calls across any of its
methods: in the three tests that use mockConnectOverCDP and call
extractWithBrowserless, replace the single error-only assertion with a check
that none of ErrorLogger's functions were invoked (e.g., iterate
Object.values(ErrorLogger) and assert each mock function's call count is zero).
Reference ErrorLogger and extractWithBrowserless (and the mock
mockConnectOverCDP) to locate the assertions to update so the suite fails if any
ErrorLogger method (error, warn, info, debug, etc.) is called.

In `@server/services/scraper.ts`:
- Around line 1196-1199: Update the downgrade-to-warning call so logging
failures can't propagate: locate the Browserless downgrade block that uses
classifyBrowserlessError(lastBrowserlessErr...) and ErrorLogger.warning and
append a defensive .catch(() => {}) to the ErrorLogger.warning promise (e.g.,
replace the awaited call with ErrorLogger.warning(...).catch(() => {}) or await
ErrorLogger.warning(...).catch(() => {})); keep the same message/metadata
(monitor, monitorId, url, selector) and do not change classification logic.

---

Outside diff comments:
In `@server/services/scheduler.test.ts`:
- Around line 932-949: The test "logs warning when retry also fails on transient
error" currently asserts a warning but doesn't verify the legacy error-level log
isn't emitted; update the test that calls runCron("* * * * *") (the it block for
this behavior) to also assert ErrorLogger.error was not called with the legacy
"Scheduler iteration failed" message (or not called at all for "scheduler"
context), e.g. add an assertion referencing mockGetAllActiveMonitors and
ErrorLogger (the existing symbols) to ensure ErrorLogger.error was not invoked
with expect.stringContaining("Scheduler iteration failed") or that
ErrorLogger.error.mock.calls contain no entry for the "scheduler" context.
Ensure you use the same test scope and mocks so the negative assertion runs
after the existing expect(ErrorLogger.warning) check.
- Around line 667-697: The tests currently only assert ErrorLogger.warning is
called on cleanup failures; update both tests ("logs warning when cleanup query
fails..." and "handles non-Error thrown...") to also assert that
ErrorLogger.error is NOT called — e.g. after runCron(...) assert
ErrorLogger.error was not invoked — so you ensure no ERROR path is logged for
non-critical cleanup failures; locate assertions around startScheduler(),
mockDbExecute.mockRejectedValueOnce(...) and runCron(...) and add the negative
assertion against ErrorLogger.error.

In `@server/services/scheduler.ts`:
- Around line 327-349: The code is calling storage.getMonitor,
storage.getMonitorChannels, storage.getMonitorChangeById and early
storage.updateDeliveryLog directly which bypasses the retry logic; wrap these DB
calls with the existing withDbRetry utility (e.g., call withDbRetry(() =>
storage.getMonitor(entry.monitorId)), withDbRetry(() =>
storage.getMonitorChannels(monitor.id)), withDbRetry(() =>
storage.getMonitorChangeById(entry.changeId)), and use withDbRetry for the
updateDeliveryLog(...) calls used in the early-fail branches) so transient DB
errors are retried instead of immediately marking delivery logs failed; apply
same change for the similar block around lines 386-394.

In `@server/services/scraper.test.ts`:
- Around line 4217-4229: The test is asserting raw scraped values are logged;
change the expectation to assert redacted metadata instead of raw values: update
the ErrorLogger.warning assertion (for "scraper") to check for keys like
monitorId, changed, dbError, retryError and a redaction marker (e.g.,
extractedValue: expect.stringContaining("[REDACTED]") or
expect.objectContaining({extractedValueRedacted: true}) ) rather than "$49.99"
and "$39.99"; keep the message check (expect.stringContaining("check succeeded
but failed to save result")) and other fields (monitorId, changed, dbError,
retryError) intact so the contract validates redaction rather than raw content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4c60e2ee-453d-4c0d-bfd5-d52335481db1

📥 Commits

Reviewing files that changed from the base of the PR and between 5c1b85d and 59559fc.

📒 Files selected for processing (4)
  • server/services/scheduler.test.ts
  • server/services/scheduler.ts
  • server/services/scraper.test.ts
  • server/services/scraper.ts

Comment on lines +5833 to +5862
it("does not log to ErrorLogger (caller handles logging)", async () => {
mockConnectOverCDP.mockRejectedValue(new Error("Navigation timeout of 30000ms exceeded"));

await expect(
extractWithBrowserless("https://example.com", ".price", 1, "My Monitor")
).rejects.toThrow();

expect(ErrorLogger.error).toHaveBeenCalledWith(
"scraper",
expect.stringContaining("took too long"),
expect.any(Error),
expect.objectContaining({ url: "https://example.com" }),
);
// extractWithBrowserless no longer logs — the caller (checkMonitor) logs
// with fuller context to avoid duplicate error entries.
expect(ErrorLogger.error).not.toHaveBeenCalled();
});

it("logs classified ECONNREFUSED message to ErrorLogger", async () => {
it("re-throws ECONNREFUSED without logging", async () => {
mockConnectOverCDP.mockRejectedValue(new Error("connect ECONNREFUSED 127.0.0.1:443"));

await expect(
extractWithBrowserless("https://example.com", ".price")
).rejects.toThrow();

expect(ErrorLogger.error).toHaveBeenCalledWith(
"scraper",
expect.stringContaining("refused the connection"),
expect.any(Error),
expect.objectContaining({ url: "https://example.com" }),
);
expect(ErrorLogger.error).not.toHaveBeenCalled();
});

it("includes monitor name in error label when provided", async () => {
it("re-throws errors without logging even when monitor name provided", async () => {
mockConnectOverCDP.mockRejectedValue(new Error("some error"));

await expect(
extractWithBrowserless("https://example.com", ".price", 1, "Price Tracker")
).rejects.toThrow();

expect(ErrorLogger.error).toHaveBeenCalledWith(
"scraper",
expect.stringContaining('"Price Tracker"'),
expect.any(Error),
expect.objectContaining({ monitorName: "Price Tracker", monitorId: 1 }),
);
expect(ErrorLogger.error).not.toHaveBeenCalled();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert that Browserless emits no ErrorLogger calls at all.

These cases only guard ErrorLogger.error. A regression to warning() or info() would still reintroduce duplicate scraper logs while this suite stays green.

Suggested test hardening
     expect(ErrorLogger.error).not.toHaveBeenCalled();
+    expect(ErrorLogger.warning).not.toHaveBeenCalled();
+    expect(ErrorLogger.info).not.toHaveBeenCalled();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 5833 - 5862, The tests
currently only assert ErrorLogger.error wasn't called; instead ensure
ErrorLogger emits no calls across any of its methods: in the three tests that
use mockConnectOverCDP and call extractWithBrowserless, replace the single
error-only assertion with a check that none of ErrorLogger's functions were
invoked (e.g., iterate Object.values(ErrorLogger) and assert each mock
function's call count is zero). Reference ErrorLogger and extractWithBrowserless
(and the mock mockConnectOverCDP) to locate the assertions to update so the
suite fails if any ErrorLogger method (error, warn, info, debug, etc.) is
called.

claude added 2 commits March 25, 2026 17:59
Add tests verifying that transient DB errors in scheduler (notification,
digest, webhook processing) and scraper (outer catch classification)
are correctly downgraded to warnings instead of errors. Update existing
tests to match new warning-level logging for save failures and cleanup.

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
… error downgrades

Security fixes from review:
- SSRF-blocked errors now get dedicated "ssrf_blocked" logContext so they
  stay at error level instead of being downgraded as transient network errors
- DB save failures in scraper apply isTransient check: only transient
  connection errors are warnings, schema/constraint errors remain errors
- Cleanup operations in scheduler apply isTransientDbError: transient → warning,
  non-transient → error (was unconditionally warning)

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/services/scheduler.test.ts (1)

672-680: ⚠️ Potential issue | 🟡 Minor

Assert that cleanup downgrades never emit an ERROR.

These tests only prove a warning exists. They still pass if the cleanup path logs both a warning and an error, which would bring back the exact noise this PR is trying to remove.

Suggested assertion tightening
     expect(ErrorLogger.warning).toHaveBeenCalledWith(
       "scheduler",
       "monitor_metrics cleanup failed (will retry tomorrow)",
       expect.objectContaining({
         errorMessage: "DB timeout",
         retentionDays: 90,
         table: "monitor_metrics",
       })
     );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

Apply the same assertion to the non-Error case below.

As per coding guidelines, **/*.test.ts: Tests must cover edge cases and error paths, and assertions are specific (not just truthy/falsy).

Also applies to: 688-696

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 672 - 680, The test currently
only asserts ErrorLogger.warning was called for the "monitor_metrics cleanup
failed (will retry tomorrow)" case; add a negative assertion to ensure no
ERROR-level log was emitted by asserting ErrorLogger.error was not called with
the same service/message/payload (use ErrorLogger.error and the same
expect.objectContaining({ errorMessage: "DB timeout", retentionDays: 90, table:
"monitor_metrics" })) and likewise add the corresponding negative assertion for
the non-`Error` case later in the file (the block around lines 688-696) so both
branches verify a warning-only downgrade.
♻️ Duplicate comments (1)
server/services/scraper.test.ts (1)

5849-5879: ⚠️ Potential issue | 🟡 Minor

“Without logging” assertions are incomplete (only .error is checked).

Line 5858, Line 5868, and Line 5878 only assert ErrorLogger.error was not called. A regression to ErrorLogger.warning or ErrorLogger.info would still pass and reintroduce duplicate scraper logs.

🛡️ Tighten assertions to enforce no ErrorLogger calls
-    expect(ErrorLogger.error).not.toHaveBeenCalled();
+    expect(ErrorLogger.error).not.toHaveBeenCalled();
+    expect(ErrorLogger.warning).not.toHaveBeenCalled();
+    expect(ErrorLogger.info).not.toHaveBeenCalled();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 5849 - 5879, The tests
currently only assert ErrorLogger.error wasn’t called, which misses regressions
that call ErrorLogger.warn/info; update the three tests ("does not log to
ErrorLogger (caller handles logging)", "re-throws ECONNREFUSED without logging",
"re-throws errors without logging even when monitor name provided") to assert
that no ErrorLogger methods were invoked by either (a) asserting
ErrorLogger.error, ErrorLogger.warn, and ErrorLogger.info were not called, or
(b) asserting the ErrorLogger spy was called 0 times (e.g.,
toHaveBeenCalledTimes(0)); locate the assertions in the tests that reference
extractWithBrowserless and mockConnectOverCDP and replace the single
expect(ErrorLogger.error).not.toHaveBeenCalled() checks accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/services/scheduler.test.ts`:
- Around line 844-849: Replace the weak partial call check with a strict "no
calls" assertion: change the
expect(ErrorLogger.error).not.toHaveBeenCalledWith(...) in
server/services/scheduler.test.ts to
expect(ErrorLogger.error).not.toHaveBeenCalled() (or
expect(ErrorLogger.error).toHaveBeenCalledTimes(0)) so any call (different
message, arity, or args) will fail; apply the same replacement for the two other
occurrences mentioned (the digest and webhook cases).
- Around line 1000-1004: The test's assertion on ErrorLogger.warning is too
loose (it only checks activeChecks) so update the assertion for the retried
scheduler failure to also assert the warning payload contains an errorMessage
and that the logger behavior is correct: modify the
expect(ErrorLogger.warning).toHaveBeenCalledWith(...) call to use
expect.objectContaining({ activeChecks: 0, errorMessage:
expect.stringContaining("Scheduler") }) (or a substring from the thrown error)
and keep the expect.stringContaining("Scheduler iteration skipped") for the
message; additionally assert ErrorLogger.error was not called
(expect(ErrorLogger.error).not.toHaveBeenCalled()) to ensure we didn't log an
error after the transient retry; references: ErrorLogger.warning,
ErrorLogger.error, and the "Scheduler iteration skipped" message.

In `@server/services/scraper.test.ts`:
- Around line 4264-4279: Tests for classifyOuterError are duplicated across
suites; consolidate by removing the repeated individual it(...) cases and adding
their scenarios into the existing describe("classifyOuterError") table-driven
test. Update the test file to centralize checks for classifyOuterError (network
error, database error, unclassified error) into a single parameterized test
(using an array of {input, expectedLogContext, expectedUserMessage} cases) that
references classifyOuterError so the three scenarios are covered once and
duplicates at lines around the current it(...) blocks are deleted.

---

Outside diff comments:
In `@server/services/scheduler.test.ts`:
- Around line 672-680: The test currently only asserts ErrorLogger.warning was
called for the "monitor_metrics cleanup failed (will retry tomorrow)" case; add
a negative assertion to ensure no ERROR-level log was emitted by asserting
ErrorLogger.error was not called with the same service/message/payload (use
ErrorLogger.error and the same expect.objectContaining({ errorMessage: "DB
timeout", retentionDays: 90, table: "monitor_metrics" })) and likewise add the
corresponding negative assertion for the non-`Error` case later in the file (the
block around lines 688-696) so both branches verify a warning-only downgrade.

---

Duplicate comments:
In `@server/services/scraper.test.ts`:
- Around line 5849-5879: The tests currently only assert ErrorLogger.error
wasn’t called, which misses regressions that call ErrorLogger.warn/info; update
the three tests ("does not log to ErrorLogger (caller handles logging)",
"re-throws ECONNREFUSED without logging", "re-throws errors without logging even
when monitor name provided") to assert that no ErrorLogger methods were invoked
by either (a) asserting ErrorLogger.error, ErrorLogger.warn, and
ErrorLogger.info were not called, or (b) asserting the ErrorLogger spy was
called 0 times (e.g., toHaveBeenCalledTimes(0)); locate the assertions in the
tests that reference extractWithBrowserless and mockConnectOverCDP and replace
the single expect(ErrorLogger.error).not.toHaveBeenCalled() checks accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a59c9c4-df32-47dd-a557-370bde1aef83

📥 Commits

Reviewing files that changed from the base of the PR and between 59559fc and 8d03301.

📒 Files selected for processing (2)
  • server/services/scheduler.test.ts
  • server/services/scraper.test.ts

Comment on lines +844 to +849
expect(ErrorLogger.error).not.toHaveBeenCalledWith(
"scheduler",
expect.stringContaining("notification"),
expect.anything(),
expect.anything()
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a full no-error assertion here.

not.toHaveBeenCalledWith(...) is too weak for these warning-only paths. A different message, different arity, or a null error argument will still let the test pass even if ErrorLogger.error(...) regresses.

Suggested assertion tightening
-    expect(ErrorLogger.error).not.toHaveBeenCalledWith(
-      "scheduler",
-      expect.stringContaining("notification"),
-      expect.anything(),
-      expect.anything()
-    );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

Apply the same change to the digest and webhook cases.

As per coding guidelines, **/*.test.ts: Tests must cover edge cases and error paths, and assertions are specific (not just truthy/falsy).

Also applies to: 870-875, 1123-1128

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 844 - 849, Replace the weak
partial call check with a strict "no calls" assertion: change the
expect(ErrorLogger.error).not.toHaveBeenCalledWith(...) in
server/services/scheduler.test.ts to
expect(ErrorLogger.error).not.toHaveBeenCalled() (or
expect(ErrorLogger.error).toHaveBeenCalledTimes(0)) so any call (different
message, arity, or args) will fail; apply the same replacement for the two other
occurrences mentioned (the digest and webhook cases).

Comment on lines +4264 to +4279
it("classifyOuterError returns 'network error' for transient network errors", () => {
// Verify the classification that drives the transient/non-transient logging split
const { userMessage, logContext } = classifyOuterError(new Error("Connection terminated due to connection timeout"));
expect(logContext).toBe("network error");
expect(userMessage).toBe("Page took too long to respond");
});

it("classifyOuterError returns 'database error' for DB-specific errors", () => {
const { logContext } = classifyOuterError(new Error("relation 'monitors' does not exist"));
expect(logContext).toBe("database error");
});

it("classifyOuterError returns 'unclassified error' for non-transient errors", () => {
const { logContext } = classifyOuterError(new Error("Something totally unexpected"));
expect(logContext).toBe("unclassified error");
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consolidate duplicated classifyOuterError coverage.

Line 4264-Line 4279 repeats cases already covered in the dedicated describe("classifyOuterError") block (network/database/unclassified). Consider merging into one table-driven suite to reduce maintenance churn.

♻️ Suggested consolidation approach
-  it("classifyOuterError returns 'network error' for transient network errors", () => {
-    const { userMessage, logContext } = classifyOuterError(new Error("Connection terminated due to connection timeout"));
-    expect(logContext).toBe("network error");
-    expect(userMessage).toBe("Page took too long to respond");
-  });
-
-  it("classifyOuterError returns 'database error' for DB-specific errors", () => {
-    const { logContext } = classifyOuterError(new Error("relation 'monitors' does not exist"));
-    expect(logContext).toBe("database error");
-  });
-
-  it("classifyOuterError returns 'unclassified error' for non-transient errors", () => {
-    const { logContext } = classifyOuterError(new Error("Something totally unexpected"));
-    expect(logContext).toBe("unclassified error");
-  });
+  it.each([
+    ["Connection terminated due to connection timeout", "network error"],
+    ["relation 'monitors' does not exist", "database error"],
+    ["Something totally unexpected", "unclassified error"],
+  ])("classifyOuterError maps '%s' -> %s", (message, expectedContext) => {
+    const { logContext } = classifyOuterError(new Error(message));
+    expect(logContext).toBe(expectedContext);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 4264 - 4279, Tests for
classifyOuterError are duplicated across suites; consolidate by removing the
repeated individual it(...) cases and adding their scenarios into the existing
describe("classifyOuterError") table-driven test. Update the test file to
centralize checks for classifyOuterError (network error, database error,
unclassified error) into a single parameterized test (using an array of {input,
expectedLogContext, expectedUserMessage} cases) that references
classifyOuterError so the three scenarios are covered once and duplicates at
lines around the current it(...) blocks are deleted.

claude added 2 commits March 25, 2026 18:10
…fication

Architecture review fixes:
- Extract logSchedulerError() helper to eliminate duplicated
  isTransientDbError + conditional log level pattern (was repeated 7x)
- Remove "database error" from transient classification in scraper outer
  catch — schema/constraint errors are NOT transient and must stay at
  error level; only "database connection error" is truly transient

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
Skeptic review fixes:
- Extract isTransientDbError() to shared server/utils/dbErrors.ts to
  prevent drift between scraper inline regex and scheduler function
- ENOTFOUND and certificate/ssl/tls errors are permanent misconfigs,
  not transient — exclude from warning downgrade in scraper outer catch
- Add try/catch to logSchedulerError so logging failures during DB
  outage don't mask the original error

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/services/scraper.ts (1)

1338-1355: ⚠️ Potential issue | 🟠 Major

Gate the save retry on the first error being transient.

isTransientDbError() only affects log level right now. The second updateMonitor() attempt still runs after every save failure, so schema/constraint errors take the extra 1s sleep and duplicate write even though this PR is supposed to retry only transient DB connection failures.

Required fix
       } catch (dbError) {
+        const isTransientSave = isTransientDbError(dbError);
         // Retry once after a short delay for transient DB errors.
         try {
+          if (!isTransientSave) throw dbError;
           await new Promise(r => setTimeout(r, 1000));
           await storage.updateMonitor(monitor.id, {
             lastChecked: new Date(),
             currentValue: newValue,
             lastStatus: finalStatus,
             lastError: null,
             consecutiveFailures: 0,
           });
         } catch (retryError) {
@@
-          const isTransientSave = isTransientDbError(retryError);
+          const isTransientRetry = isTransientSave || isTransientDbError(retryError);
@@
-          if (isTransientSave) {
+          if (isTransientRetry) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.ts` around lines 1338 - 1355, The retry logic
currently always waits and calls storage.updateMonitor a second time even when
the original dbError is non-transient; change it to first check
isTransientDbError(dbError) and only perform the 1s delay + second call to
storage.updateMonitor when that returns true, otherwise skip the retry and log
the original error as non-transient (use dbError for message/context) and avoid
the duplicate write; keep the existing retryError handling and logging for the
transient-path.
♻️ Duplicate comments (5)
server/utils/dbErrors.ts (1)

10-20: ⚠️ Potential issue | 🟠 Major

Treat PostgreSQL saturation codes as transient.

53300 / 57P03 and the standard "too many clients" / "remaining connection slots" messages still fall through as non-transient here. That means withDbRetry() and the new warning downgrade will never trigger during connection saturation, even though this helper is supposed to cover pool-exhaustion cases.

What PostgreSQL SQLSTATE values correspond to "too many clients already" / "remaining connection slots are reserved" and "cannot connect now"? Are `53300` and `57P03` transient conditions that are typically safe to retry?
Required fix
-  if (typeof code === "string" && (/^08/.test(code) || code === "57P01")) return true;
+  if (
+    typeof code === "string"
+    && (/^08/.test(code) || code === "57P01" || code === "57P03" || code === "53300")
+  ) return true;
   const msg = err.message.toLowerCase();
   return msg.includes("connection terminated")
     || msg.includes("connection timeout")
     || msg.includes("connection refused")
     || msg.includes("econnreset")
     || msg.includes("econnrefused")
     || msg.includes("cannot acquire")
-    || msg.includes("timeout expired");
+    || msg.includes("timeout expired")
+    || msg.includes("too many clients")
+    || msg.includes("remaining connection slots");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/utils/dbErrors.ts` around lines 10 - 20, The transient-detection logic
in dbErrors.ts currently only treats 08xxx and 57P01 as transient and misses
pool-saturation codes; update the code path that inspects (err as any).code and
err.message so that SQLSTATE values "53300" and "57P03" are also considered
transient (add them to the code check alongside /^08/ and "57P01") and extend
the message checks to include "too many clients", "remaining connection slots",
and "cannot connect now" (in addition to the existing strings) so that
withDbRetry()/warning downgrade will trigger on connection saturation.
server/services/scraper.test.ts (2)

4264-4285: 🧹 Nitpick | 🔵 Trivial

Remove the duplicated classifyOuterError coverage.

These scenarios are already exercised in the dedicated describe("classifyOuterError") block above. Keeping them in two places makes future classification changes drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 4264 - 4285, The four
duplicated test cases that call classifyOuterError ("classifyOuterError returns
'network error' for transient network errors", "'database error' for DB-specific
errors", "'unclassified error' for non-transient errors", and "'ssrf_blocked'
for SSRF errors (not grouped with network errors)") are redundant with the
dedicated describe("classifyOuterError") suite; remove this duplicated it(...)
block so only the authoritative describe("classifyOuterError") tests exercise
classifyOuterError and avoid drift when classification logic changes.

5855-5885: ⚠️ Potential issue | 🟡 Minor

Assert that Browserless emits no logger calls at all.

These tests only guard ErrorLogger.error. A regression to warning() or info() would still reintroduce duplicate scraper logs and this suite would stay green.

Suggested hardening
-    expect(ErrorLogger.error).not.toHaveBeenCalled();
+    expect(ErrorLogger.error).not.toHaveBeenCalled();
+    expect(ErrorLogger.warning).not.toHaveBeenCalled();
+    expect(ErrorLogger.info).not.toHaveBeenCalled();

Apply the same assertion pattern to each of the three no-logging cases in this block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 5855 - 5885, The tests
currently only assert ErrorLogger.error isn't called; update the three
"no-logging" test cases that call extractWithBrowserless (the ones using
mockConnectOverCDP rejection) to assert no ErrorLogger methods are called at all
by adding assertions that ErrorLogger.warn and ErrorLogger.info (in addition to
ErrorLogger.error) were not called (i.e., assert none of these logger methods
were invoked after extractWithBrowserless rejects). Locate the three tests
referencing extractWithBrowserless, mockConnectOverCDP, and ErrorLogger and add
the additional expect checks for ErrorLogger.warn and ErrorLogger.info in each
test.
server/services/scheduler.test.ts (2)

1011-1027: ⚠️ Potential issue | 🟡 Minor

Assert the full transient-failure payload and no error-level fallback.

At Line 1026, asserting only activeChecks is too loose. This can miss regressions where errorMessage is dropped or an error log is emitted after retry exhaustion.

Suggested assertion tightening
     expect(ErrorLogger.warning).toHaveBeenCalledWith(
       "scheduler",
       expect.stringContaining("Scheduler iteration failed (transient, will retry)"),
-      expect.objectContaining({ activeChecks: 0 })
+      expect.objectContaining({
+        activeChecks: 0,
+        errorMessage: "Connection terminated again",
+      })
     );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

As per coding guidelines, **/*.test.ts should use specific assertions for error-path behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 1011 - 1027, Tighten the test
to assert the full transient-failure payload and verify no error-level fallback:
after calling startScheduler/runCron and advancing timers, assert
mockGetAllActiveMonitors was called twice, then check ErrorLogger.warning was
called with the scheduler context and a payload that includes both activeChecks:
0 and the expected errorMessage (e.g., stringContaining "Connection
terminated"), and also assert ErrorLogger.error was not called; reference the
mocks and helpers by name (mockGetAllActiveMonitors, startScheduler, runCron,
ErrorLogger.warning, ErrorLogger.error) to locate the assertions to update.

867-872: ⚠️ Potential issue | 🟡 Minor

Replace weak partial negative checks with strict no-error assertions.

At Line 867, Line 893, and Line 1146, not.toHaveBeenCalledWith(...) is too weak: an error log with different args still passes. These warning-only transient paths should assert ErrorLogger.error was never called.

Suggested fix
-    expect(ErrorLogger.error).not.toHaveBeenCalledWith(
-      "scheduler",
-      expect.stringContaining("notification"),
-      expect.anything(),
-      expect.anything()
-    );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();
@@
-    expect(ErrorLogger.error).not.toHaveBeenCalledWith(
-      "scheduler",
-      expect.stringContaining("Digest"),
-      expect.anything(),
-      expect.anything()
-    );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();
@@
-    expect(ErrorLogger.error).not.toHaveBeenCalledWith(
-      "scheduler",
-      expect.stringContaining("Webhook"),
-      expect.anything(),
-      expect.anything()
-    );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

As per coding guidelines, **/*.test.ts requires specific assertions on error paths.

Also applies to: 893-898, 1146-1151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 867 - 872, The test uses weak
partial negative checks via not.toHaveBeenCalledWith(...) which can miss other
error calls; replace each occurrence that checks ErrorLogger.error with
not.toHaveBeenCalledWith(...) (specifically the three occurrences checking for
"scheduler" and "notification") by asserting that ErrorLogger.error was never
called at all — e.g., replace with
expect(ErrorLogger.error).not.toHaveBeenCalled() or
expect(ErrorLogger.error).toHaveBeenCalledTimes(0); update the assertions around
ErrorLogger.error in the tests that reference the "scheduler"/"notification"
warnings (the three failing spots) to use this strict no-call assertion and
remove the partial toHaveBeenCalledWith usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/services/scheduler.test.ts`:
- Around line 667-670: Replace the ambiguous "DB timeout" error fixture with a
clearly non-transient DB/schema error in the test that uses startScheduler,
mockDbExecute, and runCron; e.g., have mockDbExecute.mockRejectedValueOnce(new
Error('duplicate key value violates unique constraint "users_pkey"')) so the
test unambiguously exercises the non-transient error path.
- Around line 684-702: The test currently only checks ErrorLogger.warning
payload; tighten it to also assert the retry happened and no fatal error was
logged: after running startScheduler()/runCron and advancing timers, assert that
mockDbExecute was called twice (expect(mockDbExecute).toHaveBeenCalledTimes(2))
to verify the retry, and assert ErrorLogger.error was not called
(expect(ErrorLogger.error).not.toHaveBeenCalled()). Keep the existing
ErrorLogger.warning assertion and the mocked rejections on mockDbExecute to
preserve the transient-failure setup.

In `@server/services/scheduler.ts`:
- Around line 253-261: The retry currently wraps the entire workflows
processQueuedNotifications/processDigestCron causing duplicate sends if
markQueueEntryDelivered fails; update the code so either (A) move withDbRetry to
only wrap the DB mutation calls (markQueueEntryDelivered and the equivalent in
processDigestBatch) while keeping delivery calls outside the retry, or (B) add a
pre-send idempotency check inside
deliverToChannels/sendNotificationEmail/deliverWebhook that queries delivery_log
for an existing successful entry (matching monitorId, changeId, channel) and
skips sending if found; reference functions: withDbRetry,
processQueuedNotifications, getReadyQueueEntries, markQueueEntryDelivered,
deliverToChannels, processDigestCron, processDigestBatch, sendNotificationEmail,
deliverWebhook and the delivery_log table when implementing the chosen fix.

In `@server/services/scraper.test.ts`:
- Around line 4217-4229: The test currently asserts ErrorLogger.warning was
called for the transient DB save path but doesn't assert ERROR wasn't emitted;
update the test in scraper.test.ts (the block referencing ErrorLogger.warning)
to also assert that ErrorLogger.error was not called for this case (e.g., add an
assertion using ErrorLogger.error and .not.toHaveBeenCalled() or the equivalent)
so the suite fails if the implementation logs an ERROR in addition to the
warning.

In `@server/services/scraper.ts`:
- Around line 1202-1205: The code currently always downgrades browserless
failures to a warning; instead, update the branch around
classifyBrowserlessError(...) to check the returned classification (from
classifyBrowserlessError) and if it equals "ssrf_blocked" (or whatever sentinel
classifyBrowserlessError uses for SSRF) call ErrorLogger.error (preserving the
same context fields: monitorId, monitorName, url, selector) and include the
original error/message, otherwise keep the existing ErrorLogger.warning
behavior; locate this logic near the classifyBrowserlessError call that uses
lastBrowserlessErr and the ErrorLogger.warning call, and ensure
withBrowserlessPage-related SSRF errors are not masked by the generic
rendered-page warning.
- Around line 1468-1475: The regex used to classify permanent network errors
currently includes EAI_AGAIN (see errMsg and isPermanentNetworkError), which is
transient; remove EAI_AGAIN from the /ENOTFOUND|EAI_AGAIN|certificate|ssl|tls/i
pattern so that temporary DNS lookup failures are not treated as permanent,
leaving isTransient logic (which uses logContext and isPermanentNetworkError)
unchanged; ensure logMessage (which references monitor.name) will now result in
WARNING for EAI_AGAIN cases instead of ERROR.

---

Outside diff comments:
In `@server/services/scraper.ts`:
- Around line 1338-1355: The retry logic currently always waits and calls
storage.updateMonitor a second time even when the original dbError is
non-transient; change it to first check isTransientDbError(dbError) and only
perform the 1s delay + second call to storage.updateMonitor when that returns
true, otherwise skip the retry and log the original error as non-transient (use
dbError for message/context) and avoid the duplicate write; keep the existing
retryError handling and logging for the transient-path.

---

Duplicate comments:
In `@server/services/scheduler.test.ts`:
- Around line 1011-1027: Tighten the test to assert the full transient-failure
payload and verify no error-level fallback: after calling startScheduler/runCron
and advancing timers, assert mockGetAllActiveMonitors was called twice, then
check ErrorLogger.warning was called with the scheduler context and a payload
that includes both activeChecks: 0 and the expected errorMessage (e.g.,
stringContaining "Connection terminated"), and also assert ErrorLogger.error was
not called; reference the mocks and helpers by name (mockGetAllActiveMonitors,
startScheduler, runCron, ErrorLogger.warning, ErrorLogger.error) to locate the
assertions to update.
- Around line 867-872: The test uses weak partial negative checks via
not.toHaveBeenCalledWith(...) which can miss other error calls; replace each
occurrence that checks ErrorLogger.error with not.toHaveBeenCalledWith(...)
(specifically the three occurrences checking for "scheduler" and "notification")
by asserting that ErrorLogger.error was never called at all — e.g., replace with
expect(ErrorLogger.error).not.toHaveBeenCalled() or
expect(ErrorLogger.error).toHaveBeenCalledTimes(0); update the assertions around
ErrorLogger.error in the tests that reference the "scheduler"/"notification"
warnings (the three failing spots) to use this strict no-call assertion and
remove the partial toHaveBeenCalledWith usage.

In `@server/services/scraper.test.ts`:
- Around line 4264-4285: The four duplicated test cases that call
classifyOuterError ("classifyOuterError returns 'network error' for transient
network errors", "'database error' for DB-specific errors", "'unclassified
error' for non-transient errors", and "'ssrf_blocked' for SSRF errors (not
grouped with network errors)") are redundant with the dedicated
describe("classifyOuterError") suite; remove this duplicated it(...) block so
only the authoritative describe("classifyOuterError") tests exercise
classifyOuterError and avoid drift when classification logic changes.
- Around line 5855-5885: The tests currently only assert ErrorLogger.error isn't
called; update the three "no-logging" test cases that call
extractWithBrowserless (the ones using mockConnectOverCDP rejection) to assert
no ErrorLogger methods are called at all by adding assertions that
ErrorLogger.warn and ErrorLogger.info (in addition to ErrorLogger.error) were
not called (i.e., assert none of these logger methods were invoked after
extractWithBrowserless rejects). Locate the three tests referencing
extractWithBrowserless, mockConnectOverCDP, and ErrorLogger and add the
additional expect checks for ErrorLogger.warn and ErrorLogger.info in each test.

In `@server/utils/dbErrors.ts`:
- Around line 10-20: The transient-detection logic in dbErrors.ts currently only
treats 08xxx and 57P01 as transient and misses pool-saturation codes; update the
code path that inspects (err as any).code and err.message so that SQLSTATE
values "53300" and "57P03" are also considered transient (add them to the code
check alongside /^08/ and "57P01") and extend the message checks to include "too
many clients", "remaining connection slots", and "cannot connect now" (in
addition to the existing strings) so that withDbRetry()/warning downgrade will
trigger on connection saturation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9f73432d-05e4-470b-8045-ea4de91a7c61

📥 Commits

Reviewing files that changed from the base of the PR and between 8d03301 and a3ad14f.

📒 Files selected for processing (5)
  • server/services/scheduler.test.ts
  • server/services/scheduler.ts
  • server/services/scraper.test.ts
  • server/services/scraper.ts
  • server/utils/dbErrors.ts

Comment on lines +684 to +702
it("logs warning when cleanup fails with transient DB error", async () => {
await startScheduler();
mockDbExecute
.mockRejectedValueOnce(new Error("Connection terminated"))
.mockRejectedValueOnce(new Error("Connection terminated"));
const cronPromise = runCron("0 3 * * *");
await vi.advanceTimersByTimeAsync(2000);
await cronPromise;

expect(ErrorLogger.warning).toHaveBeenCalledWith(
"scheduler",
"monitor_metrics cleanup failed (transient, will retry)",
expect.objectContaining({
errorMessage: "Connection terminated",
retentionDays: 90,
table: "monitor_metrics",
})
);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten transient cleanup assertions (retry + no error log).

This case currently validates only the warning payload. It still passes if retry is removed or if ErrorLogger.error is also emitted.

Suggested assertion tightening
     expect(ErrorLogger.warning).toHaveBeenCalledWith(
       "scheduler",
       "monitor_metrics cleanup failed (transient, will retry)",
       expect.objectContaining({
         errorMessage: "Connection terminated",
         retentionDays: 90,
         table: "monitor_metrics",
       })
     );
+    expect(mockDbExecute).toHaveBeenCalledTimes(2);
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

As per coding guidelines, **/*.test.ts must have specific assertions for error paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 684 - 702, The test currently
only checks ErrorLogger.warning payload; tighten it to also assert the retry
happened and no fatal error was logged: after running startScheduler()/runCron
and advancing timers, assert that mockDbExecute was called twice
(expect(mockDbExecute).toHaveBeenCalledTimes(2)) to verify the retry, and assert
ErrorLogger.error was not called
(expect(ErrorLogger.error).not.toHaveBeenCalled()). Keep the existing
ErrorLogger.warning assertion and the mocked rejections on mockDbExecute to
preserve the transient-failure setup.

@bd73-com bd73-com changed the title fix: reduce scraper/scheduler error log noise fix: reduce scraper and scheduler error log noise Mar 26, 2026
- Add PG pool exhaustion codes (53300, 57P03) and messages to
  isTransientDbError for complete transient error coverage
- Remove EAI_AGAIN from permanent network errors (it's transient)
- Add SSRF check in Browserless failure path — SSRF blocks stay at
  error level even in the rendered extraction warning path
- Remove withDbRetry from processQueuedNotifications/processDigestCron
  to prevent duplicate notification deliveries on retry
- Tighten test assertions: also verify ErrorLogger.error/warning not
  called where the opposite level is expected
- Use unambiguous non-transient error message in cleanup test

https://claude.ai/code/session_01CAV9VUnHbJ66A4oKNbr9Kk
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
server/services/scheduler.test.ts (2)

684-709: 🧹 Nitpick | 🔵 Trivial

Add assertion to verify retry behavior.

The test correctly validates the warning path for transient errors, but it doesn't confirm that withDbRetry actually attempted the retry. Adding expect(mockDbExecute).toHaveBeenCalledTimes(2) would strengthen this test by verifying the retry mechanism fired.

Suggested assertion addition
     expect(ErrorLogger.warning).toHaveBeenCalledWith(
       "scheduler",
       "monitor_metrics cleanup failed (transient, will retry)",
       expect.objectContaining({
         errorMessage: "Connection terminated",
         retentionDays: 90,
         table: "monitor_metrics",
       })
     );
+    // Verify withDbRetry attempted the retry
+    expect(mockDbExecute).toHaveBeenCalledTimes(2);
     // Verify the monitor_metrics cleanup itself didn't log an error (other cleanup tasks may)
     expect(ErrorLogger.error).not.toHaveBeenCalledWith(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 684 - 709, The test lacks an
assertion that the retry actually occurred; after running startScheduler() and
awaiting cronPromise in the "logs warning when cleanup fails with transient DB
error" test, add an assertion that mockDbExecute was invoked twice to verify
withDbRetry retried (expect(mockDbExecute).toHaveBeenCalledTimes(2)); reference
the existing mockDbExecute, withDbRetry, runCron and startScheduler in the same
test and place this assertion before the ErrorLogger checks so the retry
behavior is validated.

1133-1138: 🧹 Nitpick | 🔵 Trivial

Use stricter error assertion for webhook transient failure test.

The not.toHaveBeenCalledWith(...) pattern is weaker than necessary here. Since this test exercises a transient failure path, no errors should be logged at all. Use not.toHaveBeenCalled() for a definitive assertion.

Suggested assertion tightening
-    expect(ErrorLogger.error).not.toHaveBeenCalledWith(
-      "scheduler",
-      expect.stringContaining("Webhook"),
-      expect.anything(),
-      expect.anything()
-    );
+    expect(ErrorLogger.error).not.toHaveBeenCalled();

As per coding guidelines, **/*.test.ts: assertions should be specific (not just truthy/falsy), and error paths should be thoroughly tested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scheduler.test.ts` around lines 1133 - 1138, Replace the weak
call-site assertion that checks ErrorLogger.error was not called with specific
args by asserting it was never called at all: locate the test exercising the
webhook transient failure (the expectation using ErrorLogger.error and
not.toHaveBeenCalledWith with "scheduler" / "Webhook"), and change that
assertion to expect(ErrorLogger.error).not.toHaveBeenCalled() so the test
definitively verifies no error logging occurred.
server/services/scraper.ts (1)

1201-1216: ⚠️ Potential issue | 🟡 Minor

Add logContext: "ssrf_blocked" to the SSRF error context.

The SSRF detection correctly logs at ERROR level, but the context object doesn't include logContext: "ssrf_blocked". This field enables filtering and alerting on SSRF-specific events in log aggregation systems.

Suggested fix
           await ErrorLogger.error(
             "scraper",
             `"${monitor.name}" — rendered page extraction blocked by SSRF protection`,
             lastBrowserlessErr instanceof Error ? lastBrowserlessErr : null,
-            { monitorId: monitor.id, monitorName: monitor.name, url: monitor.url, selector: monitor.selector },
+            { monitorId: monitor.id, monitorName: monitor.name, url: monitor.url, selector: monitor.selector, logContext: "ssrf_blocked" },
           ).catch(() => {});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.ts` around lines 1201 - 1216, The SSRF error branch
currently calls ErrorLogger.error with a context object missing a logContext
field; update the call in the lastBrowserlessErr SSRF branch (where
lastBrowserlessErr is checked and ErrorLogger.error is invoked) to include
logContext: "ssrf_blocked" in the context object alongside monitorId,
monitorName, url, and selector so SSRF events can be filtered/alerted; keep the
rest of the error handling (message and passed error) unchanged.
server/services/scraper.test.ts (2)

4265-4285: 🧹 Nitpick | 🔵 Trivial

Remove the duplicated classifyOuterError cases.

These assertions repeat mappings already covered in the dedicated describe("classifyOuterError") block above. Keeping the same classification contract in two places makes future changes drift-prone.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 4265 - 4285, The test contains
duplicated assertions for classifyOuterError; remove the repeated it blocks (the
cases for "network error", "database error", "unclassified error", and
"ssrf_blocked") from server/services/scraper.test.ts so only the dedicated
describe("classifyOuterError") block defines these mappings; locate the
duplicated it(...) tests invoking classifyOuterError and delete them to avoid
drift and maintain a single source of truth for the classification contract.

5856-5888: ⚠️ Potential issue | 🟡 Minor

Assert that extractWithBrowserless stays silent across all mocked logger methods.

The mock includes ErrorLogger.info, but these tests only forbid error and warning. A regression to info(...) would still reintroduce duplicate scraper log noise while this suite stays green.

Suggested hardening
     expect(ErrorLogger.error).not.toHaveBeenCalled();
     expect(ErrorLogger.warning).not.toHaveBeenCalled();
+    expect(ErrorLogger.info).not.toHaveBeenCalled();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/services/scraper.test.ts` around lines 5856 - 5888, The tests for
extractWithBrowserless currently assert ErrorLogger.error and
ErrorLogger.warning were not called but miss ErrorLogger.info; update the three
test cases (the ones titled "does not log to ErrorLogger (caller handles
logging)", "re-throws ECONNREFUSED without logging", and "re-throws errors
without logging even when monitor name provided") to also assert
ErrorLogger.info was not called (i.e., include
expect(ErrorLogger.info).not.toHaveBeenCalled()); alternatively add a shared
afterEach or helper assertion that verifies ErrorLogger.error,
ErrorLogger.warning, and ErrorLogger.info were all not called to prevent
regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@server/services/scheduler.test.ts`:
- Around line 684-709: The test lacks an assertion that the retry actually
occurred; after running startScheduler() and awaiting cronPromise in the "logs
warning when cleanup fails with transient DB error" test, add an assertion that
mockDbExecute was invoked twice to verify withDbRetry retried
(expect(mockDbExecute).toHaveBeenCalledTimes(2)); reference the existing
mockDbExecute, withDbRetry, runCron and startScheduler in the same test and
place this assertion before the ErrorLogger checks so the retry behavior is
validated.
- Around line 1133-1138: Replace the weak call-site assertion that checks
ErrorLogger.error was not called with specific args by asserting it was never
called at all: locate the test exercising the webhook transient failure (the
expectation using ErrorLogger.error and not.toHaveBeenCalledWith with
"scheduler" / "Webhook"), and change that assertion to
expect(ErrorLogger.error).not.toHaveBeenCalled() so the test definitively
verifies no error logging occurred.

In `@server/services/scraper.test.ts`:
- Around line 4265-4285: The test contains duplicated assertions for
classifyOuterError; remove the repeated it blocks (the cases for "network
error", "database error", "unclassified error", and "ssrf_blocked") from
server/services/scraper.test.ts so only the dedicated
describe("classifyOuterError") block defines these mappings; locate the
duplicated it(...) tests invoking classifyOuterError and delete them to avoid
drift and maintain a single source of truth for the classification contract.
- Around line 5856-5888: The tests for extractWithBrowserless currently assert
ErrorLogger.error and ErrorLogger.warning were not called but miss
ErrorLogger.info; update the three test cases (the ones titled "does not log to
ErrorLogger (caller handles logging)", "re-throws ECONNREFUSED without logging",
and "re-throws errors without logging even when monitor name provided") to also
assert ErrorLogger.info was not called (i.e., include
expect(ErrorLogger.info).not.toHaveBeenCalled()); alternatively add a shared
afterEach or helper assertion that verifies ErrorLogger.error,
ErrorLogger.warning, and ErrorLogger.info were all not called to prevent
regressions.

In `@server/services/scraper.ts`:
- Around line 1201-1216: The SSRF error branch currently calls ErrorLogger.error
with a context object missing a logContext field; update the call in the
lastBrowserlessErr SSRF branch (where lastBrowserlessErr is checked and
ErrorLogger.error is invoked) to include logContext: "ssrf_blocked" in the
context object alongside monitorId, monitorName, url, and selector so SSRF
events can be filtered/alerted; keep the rest of the error handling (message and
passed error) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5158432-60ff-4765-a3dd-c229f0c4c8ad

📥 Commits

Reviewing files that changed from the base of the PR and between a3ad14f and 2a8d203.

📒 Files selected for processing (5)
  • server/services/scheduler.test.ts
  • server/services/scheduler.ts
  • server/services/scraper.test.ts
  • server/services/scraper.ts
  • server/utils/dbErrors.ts

@bd73-com bd73-com merged commit 8fab4b2 into main Mar 26, 2026
2 checks passed
@github-actions github-actions bot deleted the claude/fix-scraper-scheduler-errors-smjDW branch March 26, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants