Skip to content

fix(prompts): tighten safe-outputs MCP deadline after news workflow timed out at 33 min#1960

Merged
pethers merged 6 commits intomainfrom
copilot/debug-workflow-failure-again
Apr 23, 2026
Merged

fix(prompts): tighten safe-outputs MCP deadline after news workflow timed out at 33 min#1960
pethers merged 6 commits intomainfrom
copilot/debug-workflow-failure-again

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Fixes the News: Interpellation Debates failure (run 24821837975).

Fix (prompt-only, runtime-imported — no .lock.yml recompile)

  • 07-commit-and-pr.md §Deadline enforcement — two-timer model (Timer A Copilot ~60 min, Timer B safeoutputs ~25–30 min); Analysis hard deadline 48 → 30 min, target 22–27 min; emergency [early-pr] recipe; non-retry rule on session not found
  • 01-bash-and-shell-safety.md §Banned expansion patterns — AWF v0.69.3 false-positive patterns with safe rewrites; while IFS= read -r f
  • 00-base-contract.md §Session keepalive requirement — synced to 22–27 min / 30 min, points at 07 as authoritative
  • Round 2 review: prefix tool names with safeoutputs___ in Timer B description (line 107) and emergency step 5 (line 130) for consistency with the rest of the prompt module

Copilot AI linked an issue Apr 23, 2026 that may be closed by this pull request
@pethers pethers marked this pull request as ready for review April 23, 2026 08:29
Copilot AI review requested due to automatic review settings April 23, 2026 08:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@github-actions github-actions Bot added the size-xs Extra small change (< 10 lines) label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

… run timed out at 33 min

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/fec34cd8-9f27-4f13-895d-0505ba3ed72e

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions Bot added documentation Documentation updates size-m Medium change (50-250 lines) labels Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI changed the title [WIP] Fix workflow failure in News: Interpellation Debates fix(prompts): tighten safe-outputs MCP deadline after news workflow timed out at 33 min Apr 23, 2026
Copilot AI requested a review from pethers April 23, 2026 08:43
@pethers pethers requested a review from Copilot April 23, 2026 08:47
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

| Chained builder assignments that progressively construct a command substitution (`a=foo; b="$a"bar; c=$($b)`) | Same staged-injection shape, just spread across multiple statements. | Construct commands as arrays, invoke via `"${cmd[@]}"`; never re-parse a string as a command. |
| `eval` on variable contents (or eval-like constructs such as `bash -c "$var"`, `source /dev/stdin <<<"$var"`) | Direct arbitrary-code execution from data. | Never required for our workflows — refuse and rewrite using arrays, `case`, or explicit branches. |
| `echo "…text $(cmd) more text…"` with other `$(…)` elsewhere in the same `command` string | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) flags any `$(…)` that lives inside a double-quoted echo/printf string alongside a second unrelated `$(…)` as "nested command substitution" even when the two are not nested. This is a false positive but the block still fires. | Split into two lines: `RESULT=$(cmd); echo "…text $RESULT more text…"`. Prefer `printf '%s\n' "$RESULT"` over echo when the value may contain backslashes. |
| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the temp-file loop safe-equivalent, while read f; do …; done should use read -r (and ideally IFS=) to avoid backslash-escape interpretation and unintended trimming. Even if the current examples use simple literal filenames, this pattern is likely to be copy/pasted for paths that may contain backslashes or leading/trailing whitespace.

Suggested change
| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |
| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while IFS= read -r f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |

Copilot uses AI. Check for mistakes.
Comment thread .github/prompts/07-commit-and-pr.md Outdated

### Keeping the Safe Outputs MCP session warm

Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement that every safeoutputs___* tool is "terminal" / can run at most once per workflow is inaccurate in this repo’s compiled workflows: e.g. report_incomplete has a higher default max, and missing_data/missing_tool allow multiple calls. Consider rephrasing this to the specific constraint you rely on here (that create_pull_request/noop are limited and should be reserved for the end), rather than claiming all safeoutputs tools are single-shot/terminal.

Suggested change
Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect.
Do **not** use safe outputs as a keepalive strategy. In this workflow, `safeoutputs___create_pull_request` is limited to a single successful end-of-run call, and `safeoutputs___noop` is likewise reserved for the final "no files produced" outcome, so neither can be safely spent to keep the Safe Outputs MCP session alive. Some other `safeoutputs___*` tools may allow more than one call in compiled workflows, but they are not a documented or reliable heartbeat path for this prompt. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe touch path for the local safeoutputs HTTP server (for example, an explicitly supported status or `tools/list` endpoint with verified keepalive behavior), update this section with the concrete command and its observed effect.

Copilot uses AI. Check for mistakes.
| Run 1 — Analysis | 40–45 min after agent start | **48 min** |
| Run 2 — Articles | 20–25 min after agent start | **30 min** |
These windows are tighter than the historical 48-min figure because Timer B fires first on the 23-artifact pipeline. The 30-min hard deadline leaves ~5 minutes of margin for staging, `git commit`, and the safeoutputs round-trip before Timer B has been observed to fire, and ~25 minutes of margin before Timer A.

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section tightens Run 1’s PR window/hard deadline to 30 minutes, but .github/prompts/00-base-contract.md still instructs creating the PR before ~45 minutes and points readers here. To avoid conflicting guidance at runtime, please update the base-contract prompt (or add an explicit note here that it supersedes the older 45-minute guidance) so the imported prompt set is internally consistent.

Suggested change
> **Authoritative override:** For PR timing and hard deadlines, this section supersedes any older guidance imported from `.github/prompts/00-base-contract.md` that suggests creating the PR at around **45 minutes**. The operative deadline for both runs is **30 minutes**, with Run 1 targeting **22–27 minutes** and Run 2 targeting **20–25 minutes**.

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 23, 2026

@copilot apply changes based on the comments in this thread

…, add authoritative override + sync 00-base-contract deadline

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/79d84cd8-124b-459f-8023-55b1201e90a2

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 23, 2026

@copilot apply changes based on the comments in this thread

Applied all three review suggestions in 2b18ef6:

  • 01-bash-and-shell-safety.md: while read fwhile IFS= read -r f in the temp-file-loop rewrite
  • 07-commit-and-pr.md: rephrased the over-broad "every safeoutputs___* is terminal" claim to call out only create_pull_request + noop as single-shot; added the Authoritative override note above the PR-creation-windows table
  • 00-base-contract.md: synced the session-keepalive paragraph to the new 22–27 min / 30 min window and pointed at 07-commit-and-pr.md as the authoritative source (supersedes the old ~45-min figure)

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread .github/prompts/07-commit-and-pr.md Outdated
Comment on lines +105 to +107
> **Timer A — Copilot API session (~60 min)**: The Copilot API session is bound to the `github.token` baked in at step start. That token expires at approximately **60 minutes** and is never refreshed mid-run (gh-aw issue #24920). Every tool call and inference request fails silently after that point — the agent appears to run but makes no progress and the PR is never created. Setup steps consume ~5 minutes, so the agent has at most **~55 minutes** of usable session time.
>
> **Timer B — Safe Outputs MCP idle session (~25–30 min, observed)**: The local Safe Outputs HTTP MCP tracks a per-agent Streamable HTTP session. If the agent goes **idle toward safeoutputs** for 25+ minutes (e.g. a long Pass 1 that only uses `edit` + `bash`), the session is dropped and every subsequent `safeoutputs___*` call returns `Error POSTing to endpoint: session not found` — including the final `create_pull_request`, `noop`, and `report_incomplete`. The `sandbox.mcp.keepalive-interval: 300` setting does **not** prevent this; that knob keeps the `mcp-gateway` upstream MCPs alive, not the safeoutputs HTTP server. Observed failure: run [`24821837975`](https://github.com/Hack23/riksdagsmonitor/actions/runs/24821837975) — 23 artifacts committed at ~33 min, all three safe-output calls rejected.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Timer B description, the tool names are referenced without the safeoutputs___ prefix (e.g. create_pull_request, noop, report_incomplete). Elsewhere in this prompt module the required names are safeoutputs___create_pull_request / safeoutputs___noop, so this inconsistency can mislead the agent under time pressure. Please align these references to the actual safeoutputs tool names consistently.

Copilot uses AI. Check for mistakes.
Comment thread .github/prompts/07-commit-and-pr.md Outdated
Comment on lines +128 to +130
3. **Commit** with message prefixed `[early-pr]` to signal partial content.
4. **Call** `safeoutputs___create_pull_request` once with label `analysis-only` if Pass 2 is incomplete or articles are missing.
5. If `create_pull_request` returns `session not found`, do **not** retry — the MCP session is gone. The work is lost for this run; the commit on disk is not persisted because the safe-outputs runner never saw it. Document the incident in the next run's methodology-reflection.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 5 refers to create_pull_request returning session not found, but the tool this prompt instructs to call is safeoutputs___create_pull_request. For clarity and to avoid a wrong tool call during the emergency path, rename this reference to safeoutputs___create_pull_request (and similarly keep the safeoutputs___ prefix wherever tool names are mentioned).

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 23, 2026

@copilot apply changes based on the comments in this thread

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 23, 2026

@copilot apply changes based on the comments in this thread

Applied in 6176dee: prefixed safeoutputs___ on the three tool names in the Timer B description (line 107 — create_pull_request, noop, report_incomplete) and on create_pull_request in emergency step 5 (line 130). Leaving the two pre-existing unprefixed references at lines 9 and 99 untouched since they predate this PR and aren't in scope of the review threads.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers merged commit 0e8968e into main Apr 23, 2026
12 checks passed
@pethers pethers deleted the copilot/debug-workflow-failure-again branch April 23, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation updates size-m Medium change (50-250 lines) size-xs Extra small change (< 10 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] News: Interpellation Debates failed

3 participants