Add provider-level progress, heartbeats, and clean interrupt handling to source loading

During a full `pe_us_data_rebuild_checkpoint` pipeline build after merging `origin/main` into `codex/fix-146-narrow-lazy-imports`, the build reached `02_source_loading` and then remained there for roughly 2h55m without any provider-level progress or manifest heartbeat.

Observed state:

- `01_run_profile` completed.
- `02_source_loading` started at `2026-06-03T15:49:38Z`.
- The build never reached `03_source_planning`.
- `stage_artifacts/manifests/02_source_loading.json` still showed `status: running`.
- `updatedAt` remained equal to `startedAt`.
- No completed outputs were present.
- Required outputs were still missing:
  - `observation_frame_summary`
  - `source_descriptors`
  - `source_relationships`
- After manual termination, the manifest remained in `running` state with no failure/interruption reason.

This means source loading is currently difficult to diagnose: after a long runtime, we cannot tell whether it is making expected progress, stuck on a specific provider, retrying a cache/download path, or spending time in a pathological slow path.

Recommended fix:

1. Add provider-level source-loading progress events, at least:
   - provider started
   - provider completed
   - provider failed
   - elapsed time
   - row/entity counts where available
   - cache/download paths where relevant
2. Heartbeat `02_source_loading.json` periodically and after each provider, including the current provider and last successful provider.
3. Persist partial per-provider summaries so reruns are diagnosable without restarting blind.
4. Catch `SIGTERM`/`KeyboardInterrupt` in the stage runtime or stage writer and mark the active stage as failed/interrupted with timestamp and reason, instead of leaving it as `running`.
5. Add unit tests for heartbeat updates and interrupted-stage failure recording.

Notably, this was not a Python traceback or obvious missing dependency. The first blocker was source-loading observability and clean failure recording during a long-running full build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add provider-level progress, heartbeats, and clean interrupt handling to source loading #201

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add provider-level progress, heartbeats, and clean interrupt handling to source loading #201

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions