Trim news-translate per-turn MCP/tool schema to lift the 25M token wall#2828
Conversation
…n limit (run #26641603577) Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…e2e article-class assertion Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…o 13 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: documentation,workflow,ci-cd,testing,size-l,news,agentic-workflow Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This PR reduces the news-translate workflow’s per-turn MCP/tool schema footprint so translation runs can process larger language batches without hitting the Copilot weighted-token session cap.
Changes:
- Trims unused MCP servers/tools from
news-translateand its compiled lock file. - Adds
max_langsbatching and scopes validator calls to the selected language batch. - Adds regression tests for the trimmed MCP surface and updated worklist contract.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/news-translate.md |
Updates workflow inputs, MCP/tool surface, language batching, and prompt instructions. |
.github/workflows/news-translate.lock.yml |
Recompiled lock file reflecting the reduced MCP/tool surface and new input. |
tests/news-translate-mcp-surface.test.ts |
Adds guard tests to prevent reintroducing heavy unused MCP/tool schemas. |
tests/news-translate-worklist-contract.test.ts |
Extends worklist contract tests for max_langs and scoped validation. |
tests/render-lib.test.ts |
Loosens article wrapper assertion to allow the TOC modifier class. |
| 4. **Title post-processing**: run `npx tsx scripts/postprocess-translated-brief.ts analysis/daily/$DATE/$SUB/executive-brief_*.md` to re-apply the renderer's `cleanArticleTitle` pipeline on every translated H1. This strips locale-specific boilerplate prefixes (`Exekutiv sammanfattning — `, `Zusammenfassung — `, `执行摘要:…`) and trailing date suffixes that the Pass 1 translation may have left in place. The helper only rewrites when the cleaned title differs from the source; it never falls back to a BLUF synthesis. Verify the script reports `✓` or `✏️` for every file (no `✗`). | ||
| 5. **Final validate**: re-run `npx tsx scripts/validate-executive-brief-translations.ts --source <source>` (when available) to confirm the title post-processing in sub-step 4 did not break parity, or fall back to the structural sanity checks listed in TRANSLATION_GUIDE §Acceptance checklist. Fix any failures by re-translating only the offending language (read back only that file). Do not commit a file that fails validation. | ||
| 5. **Final validate**: re-run `npx tsx scripts/validate-executive-brief-translations.ts --source <source> --lang "$TRANSLATION_LANGS"` (when available) to confirm the title post-processing in sub-step 4 did not break parity, or fall back to the structural sanity checks listed in TRANSLATION_GUIDE §Acceptance checklist. Fix any failures by re-translating only the offending language (read back only that file). Do not commit a file that fails validation. |
| 1. **Pass 1 — translate**: Read the source `executive-brief.md` in full **once**. For every language in `TRANSLATION_LANGS`, produce `analysis/daily/$DATE/$SUB/executive-brief_<lang>.md` following the TRANSLATION_GUIDE rules — **write each file with one `edit` tool call per language** (never via `python3`, `bash` heredocs, or shell redirection — see `01-bash-and-shell-safety.md §File creation & overwrite strategy`). Apply the per-language tone register from TRANSLATION_GUIDE **at write time** (so Pass 2 never needs a full re-read just to adjust register), and for `ar`/`he` start the file with `<!-- dir: rtl -->`. Preserve every verbatim block (YAML, HTML comments except `source-sha`, `dok_id` codes, Mermaid DSL bodies, code fences, URLs, file paths, evidence-anchor canonical column values). Translate every always-translate block (prose, headings, list items, table cell text, image alt-text, BLUF, decisions, link text). **Do not** read a freshly written translation back into model context to "confirm" it — the validator in Pass 2 is the authoritative gate. | ||
| 2. **Pass 2 — validate-first & targeted refine**: This is the **primary token-efficiency control**. Run `npx tsx scripts/validate-executive-brief-translations.ts --source <source>` in the runtime shell. That validator already performs every structural-parity check at near-zero model-token cost: heading / table-row / code-fence / Mermaid-block count parity, `dok_id` and URL set equality, banned-English-phrase detection (`Executive Brief`, `Decisions`, `Confidence`, `BLUF`, …), the `<!-- dir: rtl -->` marker for `ar`/`he`, and the `<!-- source-sha: -->` trailer. **Do not read passing translations back into model context** — reading all 13 files back in full is exactly what drove run [#26633644372](https://github.com/Hack23/riksdagsmonitor/actions/runs/26633644372) to `7.6M` effective tokens and a `429 Maximum effective tokens exceeded` abort. Only read back (and re-translate / fix) the specific `executive-brief_<lang>.md` files the validator reports as failing, then re-run the validator until it is clean for the requested languages. | ||
| 2. **Pass 2 — validate-first & targeted refine**: This is the **primary token-efficiency control**. Run `npx tsx scripts/validate-executive-brief-translations.ts --source <source> --lang "$TRANSLATION_LANGS"` in the runtime shell (the `--lang` scope restricts validation to this run's language batch so deferred languages are not reported as missing). That validator already performs every structural-parity check at near-zero model-token cost: heading / table-row / code-fence / Mermaid-block count parity, `dok_id` and URL set equality, banned-English-phrase detection (`Executive Brief`, `Decisions`, `Confidence`, `BLUF`, …), the `<!-- dir: rtl -->` marker for `ar`/`he`, and the `<!-- source-sha: -->` trailer. **Do not read passing translations back into model context** — reading every target file back in full is exactly what drove run [#26633644372](https://github.com/Hack23/riksdagsmonitor/actions/runs/26633644372) to `7.6M` effective tokens and a `429 Maximum effective tokens exceeded` abort. Only read back (and re-translate / fix) the specific `executive-brief_<lang>.md` files the validator reports as failing, then re-run the validator until it is clean for the requested languages. | ||
| 3. **Append the source-revision marker**: compute `SRC_SHA=$(git log -1 --format=%H -- <source>)` in the runtime shell. Use the `edit` tool to add or replace the final `<!-- source-sha: $SRC_SHA -->` line at the end of every translation file (consistent with the file-write contract — no `>>` or `echo`). This is the drift signal future runs use to decide whether to retranslate. |
…N_LANGS batch Addresses review feedback: - source-sha trailer is now only written to files whose language is in $TRANSLATION_LANGS, preventing deferred languages from being marked as current without actual retranslation. - Title post-processing now iterates per-language instead of using executive-brief_*.md glob, avoiding rewriting deferred-language files without subsequent validation. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
news-translatekept hitting the 25M weighted effective-token Copilot per-session cap (run #26641603577: 13 langs → 27.0M weighted, aborting before its PR shipped). The team had been defending the cap by shrinking work (max_briefs3→1,max_langscapped at 7); this instead lowers the per-unit cost so each run does more.Root cause
Every model turn re-bills the full JSON schema of every declared MCP server and tool. This workflow loaded
github: toolsets:[all], three data-MCP servers (riksdag-regering/scb/world-bankwithallowed:["*"]),web-fetch, andagentic-workflows— yet the translate agent only ever callsbash,edit, andsafeoutputs___create_pull_request. That dead-weight schema, re-sent each turn, is the ~3.5× weighting multiplier (7.67M raw → 27.0M weighted).Changes
.github/workflows/news-translate.md— trimmed the per-turn surface:scbandworld-bankMCP servers.riksdag-regeringfromallowed:["*"](~32 tools) to["get_sync_status"].github: toolsets:[all],web-fetch, andagentic-workflowsfromtools.max_langsdefault 7→13 (input + worklist-step default/clamp) so one run can translate a full source..github/workflows/news-translate.lock.yml— recompiled (gh aw compile, gh-aw v0.77.1); lock shrank ~146 lines. The agent'smcpServersnow holds only a read-only github MCP (context,repos,issues,pull_requests, auto-injected by the compiler for the safe-outputs PR),riksdag-regeringlimited toget_sync_status, andsafeoutputs.tests/news-translate-mcp-surface.test.ts(new) — guards against regression: noscb/world-bankservers, notoolsets:[all], github stays read-only, riksdag narrowed,max_langsdefault 13.tests/news-translate-worklist-contract.test.ts— updated the pinnedMAX_LANGSdefault to 13.Deliberately retained
network.allowed,safe-outputs.allowed-domains, theriksdag-regeringMCP URL, and the02/07prompt imports are contract-locked bytests/network-diagnostics.test.tsand carry zero per-turn token cost (firewall egress / output sanitizer / system prompt — not re-sent each turn). Trimming them would break CI for no token benefit, so the diet targets only the re-billed tool schemas.Effect
The fixed per-turn overhead collapses toward the raw token count, so a full 13-language source fits comfortably under 25M. Wall-clock (Timer A/B, 60 min) and the 100-file safe-outputs cap become the governing limits — the intended state.