Skip to content

Preserve dataset-build failure details across Modal boundaries #1111

@anth-volk

Description

@anth-volk

Problem

Stage 1 dataset-build subprocess failures can be masked when Modal serializes custom exceptions back to the pipeline orchestrator. In the observed failure, DatasetCommandError carried a structured DatasetCommandResult, but the exception reconstructed from pickle was called with the message string instead of the result object, producing AttributeError: 'str' object has no attribute 'returncode'.

That secondary serialization failure hid the actual Extended CPS command failure and made the pipeline status surface less useful.

Scope

  • Make dataset command failures safe to serialize across Modal boundaries.
  • Preserve captured Stage 1 command output tails in durable pipeline error records.
  • Keep the pipeline status endpoint resilient when a latest error record cannot be read or decoded.
  • Add regression tests for the serialization and status-reporting failure modes.

Out of scope

  • Fixing the underlying Extended CPS generation failure itself.
  • Reworking Stage 1 subprocess execution into direct library calls.
  • Changing the broader pipeline status dashboard architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions