feat(pipeline_status): v0.36.0 — Option-D execution-picker substrate#75
Merged
cipher813 merged 1 commit intoMay 25, 2026
Merged
Conversation
Adds the lib half of the Option-D plan to fix page-25's "smoke runs
displace the real weekly run" problem:
- New ``pipeline_role: Optional[str]`` field on PipelineRun, populated
from ``input.pipeline_role`` on the execution's input JSON. The
dashboard's section header shows this so the operator always knows
whether the rendered execution is the canonical cadence run (weekly /
daily / eod) or a smoke / recovery / operator-replay overlay.
- ``read_pipeline_state(arn, *, role_filter=None, execution_arn=None,
search_limit=50)`` — three call paths:
1. Default (no kwargs): most-recent execution per
ListExecutions(maxResults=1) — backwards-compatible.
2. role_filter={"weekly"}: walks ListExecutions pages, calling
DescribeExecution on each, until finding one whose
input.pipeline_role ∈ filter set. Bounded by search_limit; raises
SFNNoExecutions with a filter-named message on exhaustion.
3. execution_arn=<specific arn>: fetches that execution directly
(bypasses ListExecutions). Used by the dashboard's dropdown
"click a row to inspect this execution" path.
- ``list_recent_pipeline_runs(arn, *, limit=10, role_filter=None)`` →
list[PipelineExecutionSummary]: lightweight per-execution summaries
(name + status + start/duration + pipeline_role) for the operator
dropdown's at-a-glance smoke-vs-weekly distinction. Optional
role_filter pre-filters server-side.
Per-execution role extraction is permissive on malformed input JSON
(WARN + return None per feedback_no_silent_fails) so a single bad
input doesn't blackhole the page.
Refactored the read body into ``_build_pipeline_run_from_execution_arn``
so both the role-filter walk and the explicit-arn path share the same
DescribeExecution + GetExecutionHistory + materialize pipeline.
Consumer rollout:
- alpha-engine-data (next PR): inject pipeline_role into EventBridge
cron rule inputs for Saturday / Weekday / EOD SFs; document the
taxonomy and naming convention for ad-hoc operator launches.
- alpha-engine-dashboard (after data merges): bump pin, flip
pipeline_status_loader to call read_pipeline_state(arn,
role_filter={canonical_role_for_this_sf}), add "View other recent
executions" Streamlit disclosure backed by list_recent_pipeline_runs.
16 new unit tests cover the role-extractor + role_filter walk +
execution_arn path + list_recent_pipeline_runs (happy / no-match /
malformed / pre-Option-D-untagged-fallthrough).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 25, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lib half of the Option-D plan (Brian-approved 2026-05-25 evening) to fix page-25's "smoke runs displace the real weekly run" problem. Tonight's scanner-smoke-l1995 retry overrode the real Saturday weekly run as "most recent" on page 25; same class will recur every time anyone launches an ad-hoc execution. This PR adds the lib substrate; alpha-engine-data + alpha-engine-dashboard follow.
What's new
Schema
PipelineRun.pipeline_role: Optional[str]— populated frominput.pipeline_roleon the execution's input JSON. Dashboard renders it in section header so operator always knows whether the rendered execution is the canonical cadence run (weekly / daily / eod) or a smoke / recovery / operator-replay overlay.PipelineExecutionSummaryfor lightweight dropdown rows.APIs
read_pipeline_state(arn, *, role_filter=None, execution_arn=None, search_limit=50, client=None)— three call paths:ListExecutions(maxResults=1)— backwards-compatible.role_filter={\"weekly\"}: walksListExecutionspages, callingDescribeExecutionon each, until finding one whoseinput.pipeline_role ∈ filter. Bounded bysearch_limit; raisesSFNNoExecutionswith a filter-named message on exhaustion.execution_arn=<specific arn>: fetches that execution directly (bypassesListExecutions). For the dashboard's dropdown "click a row" path.list_recent_pipeline_runs(arn, *, limit=10, role_filter=None, client=None)→list[PipelineExecutionSummary]for the operator dropdown's at-a-glance smoke-vs-weekly distinction.Helpers
_extract_pipeline_role(describe_resp)— parses input JSON, permissive on malformed input (WARN + return None perfeedback_no_silent_failsso a bad input doesn't blackhole the page)._build_pipeline_run_from_execution_arnshared between the role-filter walk and explicit-arn path.Cost analysis
The role-filter walk costs N+1 boto3 calls (ListExecutions + N DescribeExecution) where N is the number of executions walked. Typical cron-cadence: 1-3 executions before hitting a weekly match. Smoke-heavy windows: bounded by
search_limit=50(~50 calls worst case).list_recent_pipeline_runs(limit=10)costs ~11 calls per page render — within Step Functions' 25-TPSDescribeExecutionsoft limit.Consumer rollout
pipeline_roleinto EventBridge cron rule inputs for the 3 SFs; document the taxonomy and naming convention for ad-hoc operator launches.pipeline_status_loaderto callread_pipeline_state(arn, role_filter={canonical_role_for_this_sf}); add "View other recent executions" Streamlit disclosure backed bylist_recent_pipeline_runs.Test plan
list_recent_pipeline_runs(with/without role_filter / empty-history)795 passed, 7 warnings in 2.43s🤖 Generated with Claude Code