Problem
Stage 1 dataset-build subprocess failures can be masked when Modal serializes custom exceptions back to the pipeline orchestrator. In the observed failure, DatasetCommandError carried a structured DatasetCommandResult, but the exception reconstructed from pickle was called with the message string instead of the result object, producing AttributeError: 'str' object has no attribute 'returncode'.
That secondary serialization failure hid the actual Extended CPS command failure and made the pipeline status surface less useful.
Scope
- Make dataset command failures safe to serialize across Modal boundaries.
- Preserve captured Stage 1 command output tails in durable pipeline error records.
- Keep the pipeline status endpoint resilient when a latest error record cannot be read or decoded.
- Add regression tests for the serialization and status-reporting failure modes.
Out of scope
- Fixing the underlying Extended CPS generation failure itself.
- Reworking Stage 1 subprocess execution into direct library calls.
- Changing the broader pipeline status dashboard architecture.
Problem
Stage 1 dataset-build subprocess failures can be masked when Modal serializes custom exceptions back to the pipeline orchestrator. In the observed failure,
DatasetCommandErrorcarried a structuredDatasetCommandResult, but the exception reconstructed from pickle was called with the message string instead of the result object, producingAttributeError: 'str' object has no attribute 'returncode'.That secondary serialization failure hid the actual Extended CPS command failure and made the pipeline status surface less useful.
Scope
Out of scope