fix(prompts): tighten safe-outputs MCP deadline after news workflow timed out at 33 min by Copilot · Pull Request #1960 · Hack23/riksdagsmonitor

Copilot · 2026-04-23T08:28:48Z

Fixes the News: Interpellation Debates failure (run 24821837975).

Fix (prompt-only, runtime-imported — no `.lock.yml` recompile)

07-commit-and-pr.md §Deadline enforcement — two-timer model (Timer A Copilot ~60 min, Timer B safeoutputs ~25–30 min); Analysis hard deadline 48 → 30 min, target 22–27 min; emergency [early-pr] recipe; non-retry rule on session not found
01-bash-and-shell-safety.md §Banned expansion patterns — AWF v0.69.3 false-positive patterns with safe rewrites; while IFS= read -r f
00-base-contract.md §Session keepalive requirement — synced to 22–27 min / 30 min, points at 07 as authoritative
Round 2 review: prefix tool names with safeoutputs___ in Timer B description (line 107) and emergency step 5 (line 130) for consistency with the rest of the prompt module

Copilot

Copilot wasn't able to review any files in this pull request.

github-actions · 2026-04-23T08:29:32Z

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

🗳️ Content: news, dashboard, visualization, intelligence
💻 Technology: html-css, javascript, workflow, security
📊 Data: cia-data, riksdag-data, data-pipeline, schema
🌍 I18n: i18n, translation, rtl
🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
🔄 Quality: testing, accessibility, documentation, refactor
🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

github-actions · 2026-04-23T08:29:56Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

… run timed out at 33 min Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/fec34cd8-9f27-4f13-895d-0505ba3ed72e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-23T08:42:47Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

github-actions · 2026-04-23T08:48:22Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-23T08:52:38Z

 | Chained builder assignments that progressively construct a command substitution (`a=foo; b="$a"bar; c=$($b)`) | Same staged-injection shape, just spread across multiple statements. | Construct commands as arrays, invoke via `"${cmd[@]}"`; never re-parse a string as a command. |
 | `eval` on variable contents (or eval-like constructs such as `bash -c "$var"`, `source /dev/stdin <<<"$var"`) | Direct arbitrary-code execution from data. | Never required for our workflows — refuse and rewrite using arrays, `case`, or explicit branches. |
+| `echo "…text $(cmd) more text…"` with other `$(…)` elsewhere in the same `command` string | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) flags any `$(…)` that lives inside a double-quoted echo/printf string alongside a second unrelated `$(…)` as "nested command substitution" even when the two are not nested. This is a false positive but the block still fires. | Split into two lines: `RESULT=$(cmd); echo "…text $RESULT more text…"`. Prefer `printf '%s\n' "$RESULT"` over echo when the value may contain backslashes. |
+| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |


In the temp-file loop safe-equivalent, while read f; do …; done should use read -r (and ideally IFS=) to avoid backslash-escape interpretation and unintended trimming. Even if the current examples use simple literal filenames, this pattern is likely to be copy/pasted for paths that may contain backslashes or leading/trailing whitespace.

Suggested change

| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |

| Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while IFS= read -r f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. |

Copilot · 2026-04-23T08:52:39Z

+
+### Keeping the Safe Outputs MCP session warm
+
+Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect.


The statement that every safeoutputs___* tool is "terminal" / can run at most once per workflow is inaccurate in this repo’s compiled workflows: e.g. report_incomplete has a higher default max, and missing_data/missing_tool allow multiple calls. Consider rephrasing this to the specific constraint you rely on here (that create_pull_request/noop are limited and should be reserved for the end), rather than claiming all safeoutputs tools are single-shot/terminal.

Suggested change

Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect.

Do **not** use safe outputs as a keepalive strategy. In this workflow, `safeoutputs___create_pull_request` is limited to a single successful end-of-run call, and `safeoutputs___noop` is likewise reserved for the final "no files produced" outcome, so neither can be safely spent to keep the Safe Outputs MCP session alive. Some other `safeoutputs___*` tools may allow more than one call in compiled workflows, but they are not a documented or reliable heartbeat path for this prompt. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe touch path for the local safeoutputs HTTP server (for example, an explicitly supported status or `tools/list` endpoint with verified keepalive behavior), update this section with the concrete command and its observed effect.

Copilot · 2026-04-23T08:52:39Z

-| Run 1 — Analysis | 40–45 min after agent start | **48 min** |
-| Run 2 — Articles | 20–25 min after agent start | **30 min** |
+These windows are tighter than the historical 48-min figure because Timer B fires first on the 23-artifact pipeline. The 30-min hard deadline leaves ~5 minutes of margin for staging, `git commit`, and the safeoutputs round-trip before Timer B has been observed to fire, and ~25 minutes of margin before Timer A.



This section tightens Run 1’s PR window/hard deadline to 30 minutes, but .github/prompts/00-base-contract.md still instructs creating the PR before ~45 minutes and points readers here. To avoid conflicting guidance at runtime, please update the base-contract prompt (or add an explicit note here that it supersedes the older 45-minute guidance) so the imported prompt set is internally consistent.

Suggested change

> **Authoritative override:** For PR timing and hard deadlines, this section supersedes any older guidance imported from `.github/prompts/00-base-contract.md` that suggests creating the PR at around **45 minutes**. The operative deadline for both runs is **30 minutes**, with Run 1 targeting **22–27 minutes** and Run 2 targeting **20–25 minutes**.

pethers · 2026-04-23T08:53:58Z

@copilot apply changes based on the comments in this thread

…, add authoritative override + sync 00-base-contract deadline Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/79d84cd8-124b-459f-8023-55b1201e90a2 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot · 2026-04-23T08:59:02Z

@copilot apply changes based on the comments in this thread

Applied all three review suggestions in 2b18ef6:

01-bash-and-shell-safety.md: while read f → while IFS= read -r f in the temp-file-loop rewrite
07-commit-and-pr.md: rephrased the over-broad "every safeoutputs___* is terminal" claim to call out only create_pull_request + noop as single-shot; added the Authoritative override note above the PR-creation-windows table
00-base-contract.md: synced the session-keepalive paragraph to the new 22–27 min / 30 min window and pointed at 07-commit-and-pr.md as the authoritative source (supersedes the old ~45-min figure)

github-actions · 2026-04-23T08:59:59Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copilot · 2026-04-23T09:23:46Z

+> **Timer A — Copilot API session (~60 min)**: The Copilot API session is bound to the `github.token` baked in at step start. That token expires at approximately **60 minutes** and is never refreshed mid-run (gh-aw issue #24920). Every tool call and inference request fails silently after that point — the agent appears to run but makes no progress and the PR is never created. Setup steps consume ~5 minutes, so the agent has at most **~55 minutes** of usable session time.
+>
+> **Timer B — Safe Outputs MCP idle session (~25–30 min, observed)**: The local Safe Outputs HTTP MCP tracks a per-agent Streamable HTTP session. If the agent goes **idle toward safeoutputs** for 25+ minutes (e.g. a long Pass 1 that only uses `edit` + `bash`), the session is dropped and every subsequent `safeoutputs___*` call returns `Error POSTing to endpoint: session not found` — including the final `create_pull_request`, `noop`, and `report_incomplete`. The `sandbox.mcp.keepalive-interval: 300` setting does **not** prevent this; that knob keeps the `mcp-gateway` upstream MCPs alive, not the safeoutputs HTTP server. Observed failure: run [`24821837975`](https://github.com/Hack23/riksdagsmonitor/actions/runs/24821837975) — 23 artifacts committed at ~33 min, all three safe-output calls rejected.


In the Timer B description, the tool names are referenced without the safeoutputs___ prefix (e.g. create_pull_request, noop, report_incomplete). Elsewhere in this prompt module the required names are safeoutputs___create_pull_request / safeoutputs___noop, so this inconsistency can mislead the agent under time pressure. Please align these references to the actual safeoutputs tool names consistently.

Copilot · 2026-04-23T09:23:46Z

+3. **Commit** with message prefixed `[early-pr]` to signal partial content.
+4. **Call** `safeoutputs___create_pull_request` once with label `analysis-only` if Pass 2 is incomplete or articles are missing.
+5. If `create_pull_request` returns `session not found`, do **not** retry — the MCP session is gone. The work is lost for this run; the commit on disk is not persisted because the safe-outputs runner never saw it. Document the incident in the next run's methodology-reflection.


Step 5 refers to create_pull_request returning session not found, but the tool this prompt instructs to call is safeoutputs___create_pull_request. For clarity and to avoid a wrong tool call during the emergency path, rename this reference to safeoutputs___create_pull_request (and similarly keep the safeoutputs___ prefix wherever tool names are mentioned).

pethers · 2026-04-23T09:30:40Z

@copilot apply changes based on the comments in this thread

github-actions · 2026-04-23T09:31:36Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…ncy step 5 Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/72a864ec-2f4a-4711-af09-1f4cd8522bb6 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot · 2026-04-23T09:33:16Z

@copilot apply changes based on the comments in this thread

Applied in 6176dee: prefixed safeoutputs___ on the three tool names in the Timer B description (line 107 — create_pull_request, noop, report_incomplete) and on create_pull_request in emergency step 5 (line 130). Leaving the two pre-existing unprefixed references at lines 9 and 99 untouched since they predate this PR and aren't in scope of the review threads.

github-actions · 2026-04-23T09:34:19Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Initial plan

a094396

Copilot AI assigned Copilot and pethers Apr 23, 2026

Copilot started work on behalf of pethers April 23, 2026 08:28 View session

Copilot AI linked an issue Apr 23, 2026 that may be closed by this pull request

[aw] News: Interpellation Debates failed #1957

Closed

pethers marked this pull request as ready for review April 23, 2026 08:29

Copilot AI review requested due to automatic review settings April 23, 2026 08:29

Copilot AI reviewed Apr 23, 2026

View reviewed changes

github-actions Bot added the size-xs Extra small change (< 10 lines) label Apr 23, 2026

Copilot AI mentioned this pull request Apr 23, 2026

fix(aw): front-load propositions PR call before safeoutputs idle timeout #1961

Merged

fix(prompts): tighten safe-outputs MCP deadline after interpellations…

686730f

… run timed out at 33 min Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/fec34cd8-9f27-4f13-895d-0505ba3ed72e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions Bot added documentation Documentation updates size-m Medium change (50-250 lines) labels Apr 23, 2026

Copilot AI changed the title ~~[WIP] Fix workflow failure in News: Interpellation Debates~~ fix(prompts): tighten safe-outputs MCP deadline after news workflow timed out at 33 min Apr 23, 2026

Copilot finished work on behalf of pethers April 23, 2026 08:43

Copilot AI requested a review from pethers April 23, 2026 08:43

Merge branch 'main' into copilot/debug-workflow-failure-again

b595372

pethers requested a review from Copilot April 23, 2026 08:47

Copilot started reviewing on behalf of pethers April 23, 2026 08:48 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Copilot started work on behalf of pethers April 23, 2026 08:55 View session

Copilot finished work on behalf of pethers April 23, 2026 09:00

pethers requested a review from Copilot April 23, 2026 09:19

Copilot started reviewing on behalf of pethers April 23, 2026 09:20 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Merge branch 'main' into copilot/debug-workflow-failure-again

65a1ee1

Copilot started work on behalf of pethers April 23, 2026 09:30 View session

prefix safeoutputs___ on tool names in Timer B description and emerge…

6176dee

…ncy step 5 Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/72a864ec-2f4a-4711-af09-1f4cd8522bb6 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot finished work on behalf of pethers April 23, 2026 09:33

pethers merged commit 0e8968e into main Apr 23, 2026
12 checks passed

pethers deleted the copilot/debug-workflow-failure-again branch April 23, 2026 09:37


		### Keeping the Safe Outputs MCP session warm

		Every `safeoutputs___` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of April 2026* there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires. Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect.



	> Authoritative override: For PR timing and hard deadlines, this section supersedes any older guidance imported from `.github/prompts/00-base-contract.md` that suggests creating the PR at around 45 minutes. The operative deadline for both runs is 30 minutes, with Run 1 targeting 22–27 minutes and Run 2 targeting 20–25 minutes.

Conversation

Copilot AI commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix (prompt-only, runtime-imported — no .lock.yml recompile)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 23, 2026

🏷️ Automatic Labeling Summary

Label Categories

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

pethers commented Apr 23, 2026

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

pethers commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 23, 2026 •

edited

Loading

Fix (prompt-only, runtime-imported — no `.lock.yml` recompile)