Skip to content

test: pin custom generator error boundary#684

Merged
eric-tramel merged 1 commit into
scheduling-yolofrom
andreatgretel/test/async-scheduling-error-boundary
May 19, 2026
Merged

test: pin custom generator error boundary#684
eric-tramel merged 1 commit into
scheduling-yolofrom
andreatgretel/test/async-scheduling-error-boundary

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

Summary

  • add async scheduler integration coverage for decorated custom-generator KeyError
  • pin that wrapped custom-generator failures drop the affected row instead of using the fatal internal-bug path

Why

The original review flagged the error-boundary policy around KeyError/TypeError exceptions. The current implementation already keeps raw ColumnGenerator internal-bug exceptions fatal and wraps decorated custom-generator user failures as CustomColumnGenerationError. This test locks that distinction in at the scheduler boundary.

Validation

  • .venv/bin/pytest packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -k "internal_bug_failure or custom_generator_key_error"
  • .venv/bin/ruff check --fix .
  • .venv/bin/ruff format .

@eric-tramel eric-tramel marked this pull request as ready for review May 19, 2026 21:20
@eric-tramel eric-tramel requested a review from a team as a code owner May 19, 2026 21:20
@eric-tramel eric-tramel merged commit 0d2050b into scheduling-yolo May 19, 2026
4 checks passed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR adds a scheduler-level integration test that pins the error-boundary policy for decorated custom-generator user failures. Specifically, it verifies that a KeyError raised inside a @custom_column_generator-wrapped function is caught and re-raised as CustomColumnGenerationError (a DataDesignerError), causing the affected row to be dropped rather than triggering the fatal-internal-bug abort path.

  • Adds test_scheduler_custom_generator_key_error_drops_row_without_fatal_abort, which builds a real CustomColumnGenerator and AsyncTaskScheduler with one record, asserts the row is dropped, the "This record will be skipped" warning is emitted, and the "Unexpected fatal Non-retryable failure" log message is absent.
  • The new test complements the existing test_scheduler_internal_bug_failure_aborts_instead_of_dropping_row by covering the opposite side of the boundary: the decorated wrapper path versus the raw ColumnGenerator path.

Confidence Score: 5/5

The change adds a single integration test that exercises an already-implemented code path; it introduces no production code changes and no risk.

The test correctly mirrors the existing test structure, uses the same session-scoped asyncio marker, and makes three independently meaningful assertions that together fully characterize the desired boundary.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py Adds one new integration test; assertions correctly trace the decorated custom-generator failure path through the scheduler. No logic issues found.

Sequence Diagram

sequenceDiagram
    participant Test
    participant AsyncTaskScheduler
    participant CustomColumnGenerator
    participant custom_py as custom.py (_generate)

    Test->>AsyncTaskScheduler: scheduler.run()
    AsyncTaskScheduler->>CustomColumnGenerator: agenerate(row_dict)
    Note over CustomColumnGenerator: sync fn → asyncio.to_thread(self.generate, data)
    CustomColumnGenerator->>custom_py: "_generate(data, is_dataframe=False)"
    custom_py->>custom_py: _invoke_generator_function(data)
    custom_py-->>custom_py: raises KeyError("missing user field")
    custom_py->>custom_py: except Exception → log WARNING "This record will be skipped"
    custom_py-->>CustomColumnGenerator: raises CustomColumnGenerationError
    CustomColumnGenerator-->>AsyncTaskScheduler: raises CustomColumnGenerationError
    Note over AsyncTaskScheduler: _is_expected_non_retryable(exc) → True (DataDesignerError)
    AsyncTaskScheduler->>AsyncTaskScheduler: "_drop_row(row_group=0, row_index=0)"
    AsyncTaskScheduler-->>Test: run() returns normally (no fatal abort)
Loading

Reviews (1): Last reviewed commit: "test: pin custom generator error boundar..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants