Skip to content

fix(notebooks): surface exception type on empty-message failures#581

Merged
nathantournant merged 1 commit into
mainfrom
nathan.tournant/NATHAN-51/notebooks-empty-error
May 31, 2026
Merged

fix(notebooks): surface exception type on empty-message failures#581
nathantournant merged 1 commit into
mainfrom
nathan.tournant/NATHAN-51/notebooks-empty-error

Conversation

@nathantournant
Copy link
Copy Markdown
Member

@nathantournant nathantournant commented May 27, 2026

Problem

Notebook sync failures can surface an empty error message, leaving operators with no diagnostic signal:

Sync-CLI type=notebooks (took 32.4s) failed -- error summary (1 lines):
... - ERROR - [notebooks - <id>] -

The trailing dash + space + newline is literal — the error body is empty, not truncated.

Root cause: bare aiohttp.ServerTimeoutError() and asyncio.TimeoutError() have empty __str__. They propagate through custom_client.py (which doesn't translate them) into resources_handler.py's except Exception as e block, where the handler logs str(e) directly.

Fix

New helper format_exc_for_log(exc) in datadog_sync/utils/resource_utils.py:

  • Timeout family (asyncio.TimeoutError, aiohttp.ServerTimeoutError) → timeout: <ClassName>[: <msg>]
  • Non-empty str(exc) → returned verbatim (no regression for CustomClientHTTPError, SkipResource, ResourceConnectionError, etc.)
  • Empty str(exc) → exception class name

Applied at four failure-log sites in resources_handler.py:

  • _apply_resource_cb — the exact site that emits the empty-error log line.
  • _import_resource — promotes detail from DEBUG-only to ERROR (was invisible at default verbosity).
  • _force_missing_dep_import_cb (×2) — consistency.
  • _cleanup_worker — consistency.

Tests

11 new tests in tests/unit/test_notebooks_error_formatting.py:

  • 6 helper-level unit tests covering the three branches.
  • 5 ResourcesHandler integration tests pinning the empty-error / timeout / non-empty-pass-through / HTTP-error-preserved contracts.

Verification

  • pytest tests/unit/629 passed, 0 failed.
  • ruff check + black --check — clean.

Out of scope

  • The ~30s timeout itself. This PR makes the failure observable; tuning the per-notebook timeout is a separate question. Now that the type is surfaced, an operator can decide whether the right answer is a longer timeout, batched requests, or per-notebook payload-size limits.

Follow-ups

  • Investigate the timeout root cause once this PR is in and the type is visible in logs.
  • Consider extending the same format_exc_for_log helper to other resource models if their error paths show the same shape.

@nathantournant nathantournant force-pushed the nathan.tournant/NATHAN-51/notebooks-empty-error branch from 5647ceb to 771fccc Compare May 28, 2026 08:46
When sync-cli fails on a notebook resource with a bare
aiohttp.ServerTimeoutError() or asyncio.TimeoutError() (both have
empty __str__), the failure log emitted by resources_handler reads
exactly:

  ERROR - [notebooks - <id>] -

The trailing dash + space + newline is literal — the error body is the
empty string, not truncated. Operators are left with no signal
distinguishing a timeout from a 5xx with empty body from a network
reset.

Introduce format_exc_for_log(exc) in utils/resource_utils.py:
- timeout family (asyncio.TimeoutError, aiohttp.ServerTimeoutError)
  -> 'timeout: <ClassName>[: <msg>]' so 'timeout:' is greppable.
- non-empty str(exc) -> verbatim (no regression for the common path;
  CustomClientHTTPError / SkipResource / etc. unchanged).
- empty str(exc) -> falls back to the exception class name.

Applied at all four failure-log sites in resources_handler.py:
_apply_resource_cb (the failure site this issue surfaced from),
_import_resource, _force_missing_dep_import_cb, _cleanup_worker. The
import path additionally promotes the exception detail from DEBUG
into the ERROR line — operator log harvests only see ERROR, so the
previous DEBUG-only attachment was invisible at default verbosity.

This makes the error visible; it does not change the underlying
timeout. Tuning that is a follow-up once the type is visible.
@nathantournant nathantournant force-pushed the nathan.tournant/NATHAN-51/notebooks-empty-error branch from 771fccc to 39b6911 Compare May 28, 2026 16:39
@nathantournant nathantournant marked this pull request as ready for review May 28, 2026 16:46
@nathantournant nathantournant requested a review from a team as a code owner May 28, 2026 16:46
@nathantournant nathantournant merged commit 9f274f6 into main May 31, 2026
20 checks passed
@nathantournant nathantournant deleted the nathan.tournant/NATHAN-51/notebooks-empty-error branch May 31, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants