Skip to content

Trim news-translate per-turn MCP/tool schema to lift the 25M token wall#2828

Merged
pethers merged 5 commits into
mainfrom
copilot/fix-workflow-issues
May 29, 2026
Merged

Trim news-translate per-turn MCP/tool schema to lift the 25M token wall#2828
pethers merged 5 commits into
mainfrom
copilot/fix-workflow-issues

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 29, 2026

news-translate kept hitting the 25M weighted effective-token Copilot per-session cap (run #26641603577: 13 langs → 27.0M weighted, aborting before its PR shipped). The team had been defending the cap by shrinking work (max_briefs 3→1, max_langs capped at 7); this instead lowers the per-unit cost so each run does more.

Root cause

Every model turn re-bills the full JSON schema of every declared MCP server and tool. This workflow loaded github: toolsets:[all], three data-MCP servers (riksdag-regering/scb/world-bank with allowed:["*"]), web-fetch, and agentic-workflows — yet the translate agent only ever calls bash, edit, and safeoutputs___create_pull_request. That dead-weight schema, re-sent each turn, is the ~3.5× weighting multiplier (7.67M raw → 27.0M weighted).

Changes

  • .github/workflows/news-translate.md — trimmed the per-turn surface:
    • Removed scb and world-bank MCP servers.
    • Narrowed riksdag-regering from allowed:["*"] (~32 tools) to ["get_sync_status"].
    • Removed github: toolsets:[all], web-fetch, and agentic-workflows from tools.
    • Re-baselined max_langs default 7→13 (input + worklist-step default/clamp) so one run can translate a full source.
    • Updated description, input docs, batch-size table, and deadline prose.
  • .github/workflows/news-translate.lock.yml — recompiled (gh aw compile, gh-aw v0.77.1); lock shrank ~146 lines. The agent's mcpServers now holds only a read-only github MCP (context,repos,issues,pull_requests, auto-injected by the compiler for the safe-outputs PR), riksdag-regering limited to get_sync_status, and safeoutputs.
  • tests/news-translate-mcp-surface.test.ts (new) — guards against regression: no scb/world-bank servers, no toolsets:[all], github stays read-only, riksdag narrowed, max_langs default 13.
  • tests/news-translate-worklist-contract.test.ts — updated the pinned MAX_LANGS default to 13.

Deliberately retained

network.allowed, safe-outputs.allowed-domains, the riksdag-regering MCP URL, and the 02/07 prompt imports are contract-locked by tests/network-diagnostics.test.ts and carry zero per-turn token cost (firewall egress / output sanitizer / system prompt — not re-sent each turn). Trimming them would break CI for no token benefit, so the diet targets only the re-billed tool schemas.

Effect

The fixed per-turn overhead collapses toward the raw token count, so a full 13-language source fits comfortably under 25M. Wall-clock (Timer A/B, 60 min) and the 100-file safe-outputs cap become the governing limits — the intended state.

Copilot AI and others added 3 commits May 29, 2026 15:09
…n limit (run #26641603577)

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…e2e article-class assertion

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…o 13

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions Bot added documentation Documentation updates workflow GitHub Actions workflows ci-cd CI/CD pipeline changes testing Test coverage news News articles and content generation agentic-workflow Agentic workflow changes labels May 29, 2026
@github-actions github-actions Bot added the size-l Large change (250-1000 lines) label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: documentation,workflow,ci-cd,testing,size-l,news,agentic-workflow

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@pethers pethers marked this pull request as ready for review May 29, 2026 17:24
Copilot AI review requested due to automatic review settings May 29, 2026 17:24
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces the news-translate workflow’s per-turn MCP/tool schema footprint so translation runs can process larger language batches without hitting the Copilot weighted-token session cap.

Changes:

  • Trims unused MCP servers/tools from news-translate and its compiled lock file.
  • Adds max_langs batching and scopes validator calls to the selected language batch.
  • Adds regression tests for the trimmed MCP surface and updated worklist contract.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
.github/workflows/news-translate.md Updates workflow inputs, MCP/tool surface, language batching, and prompt instructions.
.github/workflows/news-translate.lock.yml Recompiled lock file reflecting the reduced MCP/tool surface and new input.
tests/news-translate-mcp-surface.test.ts Adds guard tests to prevent reintroducing heavy unused MCP/tool schemas.
tests/news-translate-worklist-contract.test.ts Extends worklist contract tests for max_langs and scoped validation.
tests/render-lib.test.ts Loosens article wrapper assertion to allow the TOC modifier class.

Comment thread .github/workflows/news-translate.md Outdated
Comment on lines +557 to +558
4. **Title post-processing**: run `npx tsx scripts/postprocess-translated-brief.ts analysis/daily/$DATE/$SUB/executive-brief_*.md` to re-apply the renderer's `cleanArticleTitle` pipeline on every translated H1. This strips locale-specific boilerplate prefixes (`Exekutiv sammanfattning — `, `Zusammenfassung — `, `执行摘要:…`) and trailing date suffixes that the Pass 1 translation may have left in place. The helper only rewrites when the cleaned title differs from the source; it never falls back to a BLUF synthesis. Verify the script reports `✓` or `✏️` for every file (no `✗`).
5. **Final validate**: re-run `npx tsx scripts/validate-executive-brief-translations.ts --source <source>` (when available) to confirm the title post-processing in sub-step 4 did not break parity, or fall back to the structural sanity checks listed in TRANSLATION_GUIDE §Acceptance checklist. Fix any failures by re-translating only the offending language (read back only that file). Do not commit a file that fails validation.
5. **Final validate**: re-run `npx tsx scripts/validate-executive-brief-translations.ts --source <source> --lang "$TRANSLATION_LANGS"` (when available) to confirm the title post-processing in sub-step 4 did not break parity, or fall back to the structural sanity checks listed in TRANSLATION_GUIDE §Acceptance checklist. Fix any failures by re-translating only the offending language (read back only that file). Do not commit a file that fails validation.
Comment thread .github/workflows/news-translate.md Outdated
1. **Pass 1 — translate**: Read the source `executive-brief.md` in full **once**. For every language in `TRANSLATION_LANGS`, produce `analysis/daily/$DATE/$SUB/executive-brief_<lang>.md` following the TRANSLATION_GUIDE rules — **write each file with one `edit` tool call per language** (never via `python3`, `bash` heredocs, or shell redirection — see `01-bash-and-shell-safety.md §File creation & overwrite strategy`). Apply the per-language tone register from TRANSLATION_GUIDE **at write time** (so Pass 2 never needs a full re-read just to adjust register), and for `ar`/`he` start the file with `<!-- dir: rtl -->`. Preserve every verbatim block (YAML, HTML comments except `source-sha`, `dok_id` codes, Mermaid DSL bodies, code fences, URLs, file paths, evidence-anchor canonical column values). Translate every always-translate block (prose, headings, list items, table cell text, image alt-text, BLUF, decisions, link text). **Do not** read a freshly written translation back into model context to "confirm" it — the validator in Pass 2 is the authoritative gate.
2. **Pass 2 — validate-first & targeted refine**: This is the **primary token-efficiency control**. Run `npx tsx scripts/validate-executive-brief-translations.ts --source <source>` in the runtime shell. That validator already performs every structural-parity check at near-zero model-token cost: heading / table-row / code-fence / Mermaid-block count parity, `dok_id` and URL set equality, banned-English-phrase detection (`Executive Brief`, `Decisions`, `Confidence`, `BLUF`, …), the `<!-- dir: rtl -->` marker for `ar`/`he`, and the `<!-- source-sha: -->` trailer. **Do not read passing translations back into model context** — reading all 13 files back in full is exactly what drove run [#26633644372](https://github.com/Hack23/riksdagsmonitor/actions/runs/26633644372) to `7.6M` effective tokens and a `429 Maximum effective tokens exceeded` abort. Only read back (and re-translate / fix) the specific `executive-brief_<lang>.md` files the validator reports as failing, then re-run the validator until it is clean for the requested languages.
2. **Pass 2 — validate-first & targeted refine**: This is the **primary token-efficiency control**. Run `npx tsx scripts/validate-executive-brief-translations.ts --source <source> --lang "$TRANSLATION_LANGS"` in the runtime shell (the `--lang` scope restricts validation to this run's language batch so deferred languages are not reported as missing). That validator already performs every structural-parity check at near-zero model-token cost: heading / table-row / code-fence / Mermaid-block count parity, `dok_id` and URL set equality, banned-English-phrase detection (`Executive Brief`, `Decisions`, `Confidence`, `BLUF`, …), the `<!-- dir: rtl -->` marker for `ar`/`he`, and the `<!-- source-sha: -->` trailer. **Do not read passing translations back into model context** — reading every target file back in full is exactly what drove run [#26633644372](https://github.com/Hack23/riksdagsmonitor/actions/runs/26633644372) to `7.6M` effective tokens and a `429 Maximum effective tokens exceeded` abort. Only read back (and re-translate / fix) the specific `executive-brief_<lang>.md` files the validator reports as failing, then re-run the validator until it is clean for the requested languages.
3. **Append the source-revision marker**: compute `SRC_SHA=$(git log -1 --format=%H -- <source>)` in the runtime shell. Use the `edit` tool to add or replace the final `<!-- source-sha: $SRC_SHA -->` line at the end of every translation file (consistent with the file-write contract — no `>>` or `echo`). This is the drift signal future runs use to decide whether to retranslate.
…N_LANGS batch

Addresses review feedback:
- source-sha trailer is now only written to files whose language is in
  $TRANSLATION_LANGS, preventing deferred languages from being marked as
  current without actual retranslation.
- Title post-processing now iterates per-language instead of using
  executive-brief_*.md glob, avoiding rewriting deferred-language files
  without subsequent validation.

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI requested a review from pethers May 29, 2026 17:42
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers requested a review from Copilot May 29, 2026 17:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@pethers pethers merged commit b356c7e into main May 29, 2026
17 checks passed
@pethers pethers deleted the copilot/fix-workflow-issues branch May 29, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agentic-workflow Agentic workflow changes ci-cd CI/CD pipeline changes documentation Documentation updates news News articles and content generation size-l Large change (250-1000 lines) testing Test coverage workflow GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants