Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
… run timed out at 33 min Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/fec34cd8-9f27-4f13-895d-0505ba3ed72e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| | Chained builder assignments that progressively construct a command substitution (`a=foo; b="$a"bar; c=$($b)`) | Same staged-injection shape, just spread across multiple statements. | Construct commands as arrays, invoke via `"${cmd[@]}"`; never re-parse a string as a command. | | ||
| | `eval` on variable contents (or eval-like constructs such as `bash -c "$var"`, `source /dev/stdin <<<"$var"`) | Direct arbitrary-code execution from data. | Never required for our workflows — refuse and rewrite using arrays, `case`, or explicit branches. | | ||
| | `echo "…text $(cmd) more text…"` with other `$(…)` elsewhere in the same `command` string | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) flags any `$(…)` that lives inside a double-quoted echo/printf string alongside a second unrelated `$(…)` as "nested command substitution" even when the two are not nested. This is a false positive but the block still fires. | Split into two lines: `RESULT=$(cmd); echo "…text $RESULT more text…"`. Prefer `printf '%s\n' "$RESULT"` over echo when the value may contain backslashes. | | ||
| | Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. | |
There was a problem hiding this comment.
In the temp-file loop safe-equivalent, while read f; do …; done should use read -r (and ideally IFS=) to avoid backslash-escape interpretation and unintended trimming. Even if the current examples use simple literal filenames, this pattern is likely to be copy/pasted for paths that may contain backslashes or leading/trailing whitespace.
| | Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while read f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. | | |
| | Bash arrays built inline and later expanded with `"${arr[@]}"` in the same `command` string, e.g. `REQ=(README.md foo.md); for f in "${REQ[@]}"; do …; done` | The gh-aw AWF sandbox (observed on v0.69.3, April 2026) has flagged the `(…)` + `[@]` combination as a "dangerous expansion" even though the array only contains literal filenames. Treat it as blocked and rewrite. | Write the file list to a temp file and loop over that: `printf '%s\n' README.md foo.md > /tmp/req-$$ && while IFS= read -r f; do …; done < /tmp/req-$$`. For small fixed lists, unroll the loop: `for f in README.md foo.md; do …; done`. | |
|
|
||
| ### Keeping the Safe Outputs MCP session warm | ||
|
|
||
| Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect. |
There was a problem hiding this comment.
The statement that every safeoutputs___* tool is "terminal" / can run at most once per workflow is inaccurate in this repo’s compiled workflows: e.g. report_incomplete has a higher default max, and missing_data/missing_tool allow multiple calls. Consider rephrasing this to the specific constraint you rely on here (that create_pull_request/noop are limited and should be reserved for the end), rather than claiming all safeoutputs tools are single-shot/terminal.
| Every `safeoutputs___*` tool is terminal (each runs at most once per workflow; additional calls produce workflow errors), so as of **April 2026** there is no cheap "ping" call you can issue against the safeoutputs MCP from the agent side. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe, non-terminal touch path (e.g. a `tools/list` endpoint on the local safeoutputs HTTP server), update this section with the concrete command and its observed effect. | |
| Do **not** use safe outputs as a keepalive strategy. In this workflow, `safeoutputs___create_pull_request` is limited to a single successful end-of-run call, and `safeoutputs___noop` is likewise reserved for the final "no files produced" outcome, so neither can be safely spent to keep the Safe Outputs MCP session alive. Some other `safeoutputs___*` tools may allow more than one call in compiled workflows, but they are not a documented or reliable heartbeat path for this prompt. **The only reliable mitigation is to reach `safeoutputs___create_pull_request` before Timer B fires.** Plan Pass 1 + gate + commit to finish well inside the 30-minute hard deadline below. If a future gh-aw release publishes a safe touch path for the local safeoutputs HTTP server (for example, an explicitly supported status or `tools/list` endpoint with verified keepalive behavior), update this section with the concrete command and its observed effect. |
| | Run 1 — Analysis | 40–45 min after agent start | **48 min** | | ||
| | Run 2 — Articles | 20–25 min after agent start | **30 min** | | ||
| These windows are tighter than the historical 48-min figure because Timer B fires first on the 23-artifact pipeline. The 30-min hard deadline leaves ~5 minutes of margin for staging, `git commit`, and the safeoutputs round-trip before Timer B has been observed to fire, and ~25 minutes of margin before Timer A. | ||
|
|
There was a problem hiding this comment.
This section tightens Run 1’s PR window/hard deadline to 30 minutes, but .github/prompts/00-base-contract.md still instructs creating the PR before ~45 minutes and points readers here. To avoid conflicting guidance at runtime, please update the base-contract prompt (or add an explicit note here that it supersedes the older 45-minute guidance) so the imported prompt set is internally consistent.
| > **Authoritative override:** For PR timing and hard deadlines, this section supersedes any older guidance imported from `.github/prompts/00-base-contract.md` that suggests creating the PR at around **45 minutes**. The operative deadline for both runs is **30 minutes**, with Run 1 targeting **22–27 minutes** and Run 2 targeting **20–25 minutes**. |
|
@copilot apply changes based on the comments in this thread |
…, add authoritative override + sync 00-base-contract deadline Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/79d84cd8-124b-459f-8023-55b1201e90a2 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all three review suggestions in 2b18ef6:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| > **Timer A — Copilot API session (~60 min)**: The Copilot API session is bound to the `github.token` baked in at step start. That token expires at approximately **60 minutes** and is never refreshed mid-run (gh-aw issue #24920). Every tool call and inference request fails silently after that point — the agent appears to run but makes no progress and the PR is never created. Setup steps consume ~5 minutes, so the agent has at most **~55 minutes** of usable session time. | ||
| > | ||
| > **Timer B — Safe Outputs MCP idle session (~25–30 min, observed)**: The local Safe Outputs HTTP MCP tracks a per-agent Streamable HTTP session. If the agent goes **idle toward safeoutputs** for 25+ minutes (e.g. a long Pass 1 that only uses `edit` + `bash`), the session is dropped and every subsequent `safeoutputs___*` call returns `Error POSTing to endpoint: session not found` — including the final `create_pull_request`, `noop`, and `report_incomplete`. The `sandbox.mcp.keepalive-interval: 300` setting does **not** prevent this; that knob keeps the `mcp-gateway` upstream MCPs alive, not the safeoutputs HTTP server. Observed failure: run [`24821837975`](https://github.com/Hack23/riksdagsmonitor/actions/runs/24821837975) — 23 artifacts committed at ~33 min, all three safe-output calls rejected. |
There was a problem hiding this comment.
In the Timer B description, the tool names are referenced without the safeoutputs___ prefix (e.g. create_pull_request, noop, report_incomplete). Elsewhere in this prompt module the required names are safeoutputs___create_pull_request / safeoutputs___noop, so this inconsistency can mislead the agent under time pressure. Please align these references to the actual safeoutputs tool names consistently.
| 3. **Commit** with message prefixed `[early-pr]` to signal partial content. | ||
| 4. **Call** `safeoutputs___create_pull_request` once with label `analysis-only` if Pass 2 is incomplete or articles are missing. | ||
| 5. If `create_pull_request` returns `session not found`, do **not** retry — the MCP session is gone. The work is lost for this run; the commit on disk is not persisted because the safe-outputs runner never saw it. Document the incident in the next run's methodology-reflection. |
There was a problem hiding this comment.
Step 5 refers to create_pull_request returning session not found, but the tool this prompt instructs to call is safeoutputs___create_pull_request. For clarity and to avoid a wrong tool call during the emergency path, rename this reference to safeoutputs___create_pull_request (and similarly keep the safeoutputs___ prefix wherever tool names are mentioned).
|
@copilot apply changes based on the comments in this thread |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ncy step 5 Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/72a864ec-2f4a-4711-af09-1f4cd8522bb6 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied in 6176dee: prefixed |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Fixes the
News: Interpellation Debatesfailure (run24821837975).Fix (prompt-only, runtime-imported — no
.lock.ymlrecompile)07-commit-and-pr.md §Deadline enforcement— two-timer model (Timer A Copilot ~60 min, Timer B safeoutputs ~25–30 min); Analysis hard deadline 48 → 30 min, target 22–27 min; emergency[early-pr]recipe; non-retry rule onsession not found01-bash-and-shell-safety.md §Banned expansion patterns— AWF v0.69.3 false-positive patterns with safe rewrites;while IFS= read -r f00-base-contract.md §Session keepalive requirement— synced to 22–27 min / 30 min, points at 07 as authoritativesafeoutputs___in Timer B description (line 107) and emergency step 5 (line 130) for consistency with the rest of the prompt module