chore: async engine readiness - blockers and polish before default#553
chore: async engine readiness - blockers and polish before default#553andreatgretel wants to merge 2 commits intomainfrom
Conversation
- Processor callback failures (pre-batch and post-batch) now raise DatasetGenerationError instead of silently dropping row groups - Early shutdown and all error paths drain in-flight workers via a finally block in AsyncTaskScheduler.run() - Pre-batch and post-batch processors that change row count in async mode raise immediately (strict_row_count guard) - Partial completion logs a warning when actual < target records - allow_resize=True auto-falls back to sync engine with a deprecation warning instead of raising, using a per-run _use_async flag - Preview path mirrors the trace check from the full build path; PreviewResults exposes task_traces Closes #462
- Prevent double-wrapping of DatasetGenerationError in scheduler callbacks - Fix stacklevel in allow_resize DeprecationWarning to point at user code - Update stale comment to reflect fail-fast behavior - Rename misleading test and remove unused caplog fixture - Add zero-warnings assertion for happy-path case - Move warnings import to module level
Greptile SummaryThis PR hardens the async engine before it becomes the default execution path: pre- and post-batch processor failures now propagate as
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py | Core change: try/finally replaces duplicated cleanup logic; pre/post-batch failures now propagate as DatasetGenerationError. Semaphore release in the finally block is correct even on error path since the scheduler terminates immediately. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | _resolve_async_compatibility correctly returns bool for sync fallback; _use_async instance flag properly overrides module-level constant in _run_cell_by_cell_generator; row-count guard and partial-completion warning are well-placed. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/processor_runner.py | strict_row_count=True raises DatasetProcessingError (a sibling of DatasetGenerationError, not a subclass); the scheduler's except-chain correctly wraps it into DatasetGenerationError via the generic except clause. |
| packages/data-designer-config/src/data_designer/config/preview_results.py | Clean addition of optional task_traces field; typed as list[Any] |
| packages/data-designer/src/data_designer/interface/data_designer.py | task_traces forwarded to PreviewResults using or None to normalize an empty list to None; correct and intentional. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py | New tests cover post-batch failure propagation and early-shutdown worker drain; pre-batch failure tests correctly updated from drop semantics to raise semantics. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_async_builder_integration.py | test_resolve_async_compatibility correctly validates fallback+warning on allow_resize; metadata write test is a reasonable proxy for partial-completion coverage. |
| packages/data-designer-engine/tests/engine/models/test_async_engine_switch.py | Patch-based test correctly replaced with direct instance-flag manipulation; simpler and more accurate after the _use_async refactor. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[build / build_preview] --> B{_resolve_async_compatibility}
B -->|allow_resize detected| C[DeprecationWarning + logger.warning\nreturn False]
B -->|no allow_resize| D[return True]
C --> E[_use_async = False\nSync path]
D --> F[_use_async = True\nAsync path]
F --> G[AsyncTaskScheduler.run]
G --> H[try: _main_dispatch_loop]
H --> I{pre-batch callback\n_run_seeds_complete_check}
I -->|DatasetGenerationError| J[re-raise]
I -->|other Exception| K[wrap → DatasetGenerationError\nre-raise]
J --> L[finally: cancel admission\n+ asyncio.shield cancel_workers]
K --> L
H --> M{post-batch callback\n_checkpoint_completed_row_groups}
M -->|DatasetGenerationError| N[re-raise]
M -->|other Exception| O[wrap → DatasetGenerationError\nre-raise]
N --> L
O --> L
H --> P[Normal exit]
P --> Q[finally: cancel admission\n+ drain workers]
Q --> R[reporter.log_final\n_rg_states diagnostic]
L --> S[DatasetGenerationError\npropagates to caller]
E --> T[_run_cell_by_cell_generator\nuses self._use_async]
F --> T
Reviews (1): Last reviewed commit: "fix: address review findings for async e..." | Re-trigger Greptile
Code Review: PR #553 — chore: async engine readiness - blockers and polish before defaultSummaryThis PR hardens the async engine for production readiness before it becomes the default execution path. The changes fall into three categories:
The PR also adds row-count guards, partial-completion warnings, Files changed: 8 (245 additions, 103 deletions) FindingsHigh SeverityNone found. Medium Severity1. In The original
Low Severity2. In
3. Semaphore release skipped for remaining row groups on fail-fast In
4. The
Positive Observations5. Clean The consolidation of admission cancellation and worker drain into a single 6. Double-wrap prevention for The second commit adds 7. Instance flag Replacing the module-level 8. Solid test coverage The new tests ( VerdictAPPROVE — This is a well-structured hardening PR with clear fail-fast semantics, proper cleanup guarantees, and good test coverage. The medium finding (error type wrapping) is a design-level note worth discussing but not blocking. The low-severity items are minor housekeeping suggestions. |
📋 Summary
Hardens the async engine for production readiness before it becomes the default execution path. Processor callback failures now propagate as
DatasetGenerationError(fail-fast) instead of silently dropping row groups,allow_resize=Truetriggers a graceful sync fallback withDeprecationWarning, andtry/finallycleanup inrun()ensures workers are always drained.🔗 Related Issue
Closes #462
🔄 Changes
🔧 Changed
DatasetGenerationErrorinstead of silently dropping row groupsallow_resize=Truetriggers graceful sync fallback withDeprecationWarninginstead of raisingtry/finallyrefactor inAsyncTaskScheduler.run()eliminates duplicated cleanup logic and ensures workers are always cancelled viaasyncio.shield()_run_cell_by_cell_generatornow branches on instance flag_use_asyncinstead of module-levelDATA_DESIGNER_ASYNC_ENGINE, so the sync fallback decision propagates correctly✨ Added
strict_row_count=True) and post-batch callbacks reject unsupported row-count changestask_tracesfield onPreviewResultsfor async scheduler trace data🐛 Fixed
DatasetGenerationErrorfrom callbacks no longer gets double-wrapped by the scheduler's genericexcept Exceptionhandlerstacklevelinallow_resizeDeprecationWarningnow points at user code instead of library internals🔍 Attention Areas
async_scheduler.py— fail-fast semantics change: pre/post-batch failures now propagate instead of being swalloweddataset_builder.py—_resolve_async_compatibilityreplaces_validate_async_compatibility, row-count guards in callbacks🧪 Testing
make testpassesAll 48 async-related tests pass. 18 smoke tests pass. Ruff check and format clean on all changed files.
✅ Checklist