Skip to content

feat(pipeline_status): v0.36.0 — Option-D execution-picker substrate#75

Merged
cipher813 merged 1 commit into
mainfrom
feat/pipeline-status-role-filter-execution-picker
May 25, 2026
Merged

feat(pipeline_status): v0.36.0 — Option-D execution-picker substrate#75
cipher813 merged 1 commit into
mainfrom
feat/pipeline-status-role-filter-execution-picker

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Lib half of the Option-D plan (Brian-approved 2026-05-25 evening) to fix page-25's "smoke runs displace the real weekly run" problem. Tonight's scanner-smoke-l1995 retry overrode the real Saturday weekly run as "most recent" on page 25; same class will recur every time anyone launches an ad-hoc execution. This PR adds the lib substrate; alpha-engine-data + alpha-engine-dashboard follow.

What's new

Schema

  • PipelineRun.pipeline_role: Optional[str] — populated from input.pipeline_role on the execution's input JSON. Dashboard renders it in section header so operator always knows whether the rendered execution is the canonical cadence run (weekly / daily / eod) or a smoke / recovery / operator-replay overlay.
  • New PipelineExecutionSummary for lightweight dropdown rows.

APIs

  • read_pipeline_state(arn, *, role_filter=None, execution_arn=None, search_limit=50, client=None) — three call paths:
    1. Default (no kwargs): most-recent execution per ListExecutions(maxResults=1) — backwards-compatible.
    2. role_filter={\"weekly\"}: walks ListExecutions pages, calling DescribeExecution on each, until finding one whose input.pipeline_role ∈ filter. Bounded by search_limit; raises SFNNoExecutions with a filter-named message on exhaustion.
    3. execution_arn=<specific arn>: fetches that execution directly (bypasses ListExecutions). For the dashboard's dropdown "click a row" path.
  • list_recent_pipeline_runs(arn, *, limit=10, role_filter=None, client=None)list[PipelineExecutionSummary] for the operator dropdown's at-a-glance smoke-vs-weekly distinction.

Helpers

  • _extract_pipeline_role(describe_resp) — parses input JSON, permissive on malformed input (WARN + return None per feedback_no_silent_fails so a bad input doesn't blackhole the page).
  • Refactored _build_pipeline_run_from_execution_arn shared between the role-filter walk and explicit-arn path.

Cost analysis

The role-filter walk costs N+1 boto3 calls (ListExecutions + N DescribeExecution) where N is the number of executions walked. Typical cron-cadence: 1-3 executions before hitting a weekly match. Smoke-heavy windows: bounded by search_limit=50 (~50 calls worst case).

list_recent_pipeline_runs(limit=10) costs ~11 calls per page render — within Step Functions' 25-TPS DescribeExecution soft limit.

Consumer rollout

  • alpha-engine-data (next PR): inject pipeline_role into EventBridge cron rule inputs for the 3 SFs; document the taxonomy and naming convention for ad-hoc operator launches.
  • alpha-engine-dashboard (after data merges): bump pin v0.35.1 → v0.36.0; flip pipeline_status_loader to call read_pipeline_state(arn, role_filter={canonical_role_for_this_sf}); add "View other recent executions" Streamlit disclosure backed by list_recent_pipeline_runs.

Test plan

  • 16 new unit tests covering role-extractor (happy / missing / malformed JSON / array-shape / empty-string) + role_filter walk (match-on-first / match-after-skip / no-match-raises / pre-Option-D-untagged-fallthrough) + execution_arn-direct-path + list_recent_pipeline_runs (with/without role_filter / empty-history)
  • Full lib suite: 795 passed, 7 warnings in 2.43s
  • alpha-engine-data PR follows
  • alpha-engine-dashboard PR follows after data merges

🤖 Generated with Claude Code

Adds the lib half of the Option-D plan to fix page-25's "smoke runs
displace the real weekly run" problem:

- New ``pipeline_role: Optional[str]`` field on PipelineRun, populated
  from ``input.pipeline_role`` on the execution's input JSON. The
  dashboard's section header shows this so the operator always knows
  whether the rendered execution is the canonical cadence run (weekly /
  daily / eod) or a smoke / recovery / operator-replay overlay.

- ``read_pipeline_state(arn, *, role_filter=None, execution_arn=None,
  search_limit=50)`` — three call paths:
  1. Default (no kwargs): most-recent execution per
     ListExecutions(maxResults=1) — backwards-compatible.
  2. role_filter={"weekly"}: walks ListExecutions pages, calling
     DescribeExecution on each, until finding one whose
     input.pipeline_role ∈ filter set. Bounded by search_limit; raises
     SFNNoExecutions with a filter-named message on exhaustion.
  3. execution_arn=<specific arn>: fetches that execution directly
     (bypasses ListExecutions). Used by the dashboard's dropdown
     "click a row to inspect this execution" path.

- ``list_recent_pipeline_runs(arn, *, limit=10, role_filter=None)`` →
  list[PipelineExecutionSummary]: lightweight per-execution summaries
  (name + status + start/duration + pipeline_role) for the operator
  dropdown's at-a-glance smoke-vs-weekly distinction. Optional
  role_filter pre-filters server-side.

Per-execution role extraction is permissive on malformed input JSON
(WARN + return None per feedback_no_silent_fails) so a single bad
input doesn't blackhole the page.

Refactored the read body into ``_build_pipeline_run_from_execution_arn``
so both the role-filter walk and the explicit-arn path share the same
DescribeExecution + GetExecutionHistory + materialize pipeline.

Consumer rollout:
- alpha-engine-data (next PR): inject pipeline_role into EventBridge
  cron rule inputs for Saturday / Weekday / EOD SFs; document the
  taxonomy and naming convention for ad-hoc operator launches.
- alpha-engine-dashboard (after data merges): bump pin, flip
  pipeline_status_loader to call read_pipeline_state(arn,
  role_filter={canonical_role_for_this_sf}), add "View other recent
  executions" Streamlit disclosure backed by list_recent_pipeline_runs.

16 new unit tests cover the role-extractor + role_filter walk +
execution_arn path + list_recent_pipeline_runs (happy / no-match /
malformed / pre-Option-D-untagged-fallthrough).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant