Enforce English-only analysis artifacts; render non-EN via executive-brief cascade#2529
Conversation
- Add §Output language in 00-base-contract: all 23+E analysis artifacts must be English - Add §Output language in 04-analysis-pipeline: headings/prose in English, Swedish proper nouns preserved - Replace Step 2 in 06-article-generation: per-type workflows no longer write article.<lang>.md - Add Check 12 in 05-analysis-gate: npx tsx scripts/check-analysis-language.ts Localized content is confined to executive-brief_<lang>.md (news-translate) and rendered HTML (cascade). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
- Create scripts/check-analysis-language.ts: enforce English-only analysis artifacts - Scan for Swedish function words / political vocab with 5% density threshold - Exempt executive-brief_<lang>.md, pass1/, data-download-manifest.md - Add npm script check:analysis-language Implements gate Check 12 from 05-analysis-gate.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
- Add isLocalizedArticleMd() predicate matching article.<lang>.md pattern - Return false in isFileOwnedByCategory() for all categories (forbidden) - Include article.<lang>.md in validateFileList() ownedFiles filter Non-English HTML is now rendered via localized executive-brief cascade. Per-type workflows must NOT write article.<lang>.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
- Remove 'Translate article.md → article.<lang>.md' row from time budget tables - Update prose: 'aggregate → render' instead of 'aggregate → translate → render' - Update file budget: 50 files max (was 63), drop '13 article.<lang>.md' slot - Update description: 'via executive-brief cascade' instead of 'EN + SV + 12 translated' - Clarify: non-English HTML rendered via localized executive-brief cascade Applies to all 13 per-type workflows (news-propositions, news-motions, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…scade - Per-type workflows write only English article.md - Non-English HTML rendered via localized executive-brief cascade (mergeLocalizedWithEnglish) - English body remains English; only executive-brief hero + SEO is localized - article.<lang>.md is now forbidden by validate-file-ownership.ts - news-translate owns executive-brief_<lang>.md exclusively Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Tests for scripts/check-analysis-language.ts: - stripMarkdownCodeAndFrontmatter removes YAML, code fences, inline code - tokenizeWords extracts lowercase words including Swedish å/ä/ö - calculateSwedishDensity detects 100% English (density=0) vs heavily Swedish (>5%) - findAnalysisMarkdownFiles exempts executive-brief_<lang>.md, pass1/, data-download-manifest.md - validateAnalysisLanguage enforces 5% threshold with MIN_SWEDISH_MARKERS=5 Tests for validate-file-ownership.ts article.<lang>.md ban: - article.sv.md is violation in both content AND translation categories - article.md (English source) continues to pass as before - Multiple localized article files all flagged as violations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
- Fixed multi-line @description in JSDoc (tsc interpreted as code) - Removed unused 'sep' import from node:path - Added beforeEach/afterEach imports from vitest Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Recompiled all 14 agentic workflow lock files with gh aw compile v0.74.3. No errors, 0 new warnings (existing warnings for IMF secrets and schedule timing unchanged). Lock files updated to reflect: - Removed translation phase from per-type workflows - Updated time budgets and file count arithmetic - Changed prose from 'aggregate → translate → render' to 'aggregate → render' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…age scan Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: documentation,dependencies,workflow,i18n,translation,ci-cd,testing,refactor,size-xl,news,agentic-workflow Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| // Remove YAML frontmatter (---\n...\n---) | ||
| body = body.replace(/^---\n[\s\S]*?\n---\n/m, ''); |
| ## Implementation | ||
|
|
||
| No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–11, plus conditional check 9b where applicable): | ||
| No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–13, plus conditional check 9b where applicable): |
| ### Step 2 — (No-op) Per-language Markdown translation is no longer performed | ||
|
|
||
| Before rendering, the agent **SHOULD** produce a per-language Markdown sibling for every supported non-English language. The translation surface is the same canonical `article.md`; the renderer picks up `article.<lang>.md` automatically when it exists, and falls back to the English source otherwise — so any missing sibling temporarily degrades that language's HTML to English content under a non-English `<html lang>`. This fallback is acceptable as a **temporary** state within a single run's time budget. The `news-translate` workflow does **not** repair `article.<lang>.md` — its mission is the executive-brief markdown pipeline (`executive-brief.md` → `executive-brief_<lang>.md`). If `article.<lang>.md` is missing, the next scheduled per-type run regenerates the whole article (including translations) from fresh analysis. | ||
| Per-type workflows do **not** produce `article.<lang>.md` for any non-English language. The agent stops after writing the canonical English `article.md` from Step 1. Non-English HTML pages are produced by `scripts/render-articles.ts` via the localized executive-brief cascade — the renderer composes the English `article.md` body with `executive-brief_<lang>.md` (when present) into a single Markdown document and emits chrome-wrapped HTML in the target language. See `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) for the merge contract. |
| @@ -284,15 +284,15 @@ Generates deep political intelligence analysis **and** renders the HTML article | |||
| - **Aggregated markdown**: `analysis/daily/$ARTICLE_DATE/interpellations/article.md` (produced by `scripts/aggregate-analysis.ts`) | |||
| - **Per-language Markdown**: `analysis/daily/$ARTICLE_DATE/interpellations/article.<lang>.md` for `sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh` (produced by the agent inside the same run) | |||
| "sync-schemas": "npx tsx scripts/sync-cia-schemas.ts", | ||
| "validate-data": "npx tsx scripts/validate-against-cia-schemas.ts", | ||
| "check-updates": "npx tsx scripts/check-cia-schema-updates.ts", | ||
| "check:analysis-language": "tsx scripts/check-analysis-language.ts", |
| // article.<lang>.md is now forbidden for ALL workflow categories. | ||
| // Per-type workflows must NOT write these files. The renderer uses | ||
| // the localized executive-brief cascade instead (mergeLocalizedWithEnglish). | ||
| if (isLocalizedArticleMd(filepath)) { | ||
| return false; | ||
| } |
|
@copilot apply changes based on the comments in this thread |
…r ignore, default arg, ordering note Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/91d7c03c-8cda-454b-9ea2-71d9271d9493 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all 6 review comments in
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| if (entry.isDirectory()) { | ||
| // Skip pass1/ subdirectories | ||
| if (entry.name === 'pass1') continue; | ||
| walk(fullPath); |
| const { totalWords, swedishMarkerCount, density } = calculateSwedishDensity(filepath); | ||
|
|
||
| // Violation: density > threshold AND absolute count >= minimum | ||
| if (density > SWEDISH_DENSITY_THRESHOLD && swedishMarkerCount >= MIN_SWEDISH_MARKERS) { |
| ## Implementation | ||
|
|
||
| No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–11, plus conditional check 9b where applicable): | ||
| No dedicated validator script exists yet — implement the checks as an inline bash gate. Full implementation (covers checks 1–13, plus conditional check 9b where applicable). Check 12 invokes `scripts/validate-article.ts` when `article.md` is already present (after aggregation); Check 13 invokes `scripts/check-analysis-language.ts`: |
| | 28–30 | Analysis Gate (checks 1–8) | 05 | | ||
| | 30–32 | `scripts/aggregate-analysis.ts` → `article.md` | 06 | | ||
| | 32–40 | Translate `article.md` → `article.<lang>.md` for all 13 non-English languages (sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh) | 06 | | ||
| | 40–42 | `scripts/render-articles.ts --lang all` → **all 14** HTML files | 06 | |
| @@ -302,11 +302,10 @@ Generates deep political intelligence analysis **and** renders the HTML article | |||
| | 18–28 | Analysis Pass 2 (read-back + improvements on all 22 text files) | 04 | | |||
| | 28–30 | Analysis Gate (checks 1–8) | 05 | | |||
| | 30–32 | `scripts/aggregate-analysis.ts` → `article.md` | 06 | | |||
| | 28–30 | Analysis Gate (checks 1–8) | 05 | | ||
| | 30–32 | `scripts/aggregate-analysis.ts` → `article.md` | 06 | | ||
| | 32–40 | Translate `article.md` → `article.<lang>.md` for all 13 non-English languages (sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh) | 06 | | ||
| | 40–42 | `scripts/render-articles.ts --lang all` → **all 14** HTML files | 06 | |
| | 34–36 | Analysis Gate (checks 1–11 + Tier-C additive + long-horizon checks) | 05 | | ||
| | 36–38 | Aggregate (`article.md`) | 06 | | ||
| | 38–40 | Translate `article.md` → `article.<lang>.md` × 13 (sv,da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh) | 06 | | ||
| | 40–42 | Render (`scripts/render-articles.ts --lang all` → all 14 HTML) | 06 | |
| | 41–42 | Render (`scripts/render-articles.ts --lang all` → all 14 HTML per anchor) | | ||
| | 42–43 | Stage + commit + ONE `safeoutputs___create_pull_request` — **HARD DEADLINE agent minute 45** | |
| All 14 language renderings are produced by the per-type workflows themselves via the per-language `article.<lang>.md` step inside `06-article-generation.md`. The standalone [`news-translate`](.github/workflows/news-translate.md) workflow **does not touch** `news/*.html` at all; its sole mission is **executive-brief Markdown translation** — see §"Executive Brief Markdown Translations" below. | ||
| **All 14 language renderings are produced by the per-type workflows themselves** via the **localized executive-brief cascade** inside the renderer. Per-type workflows write **only** the canonical English `article.md` (aggregated from all 23 analysis artifacts). The renderer then composes the English `article.md` body with `executive-brief_<lang>.md` (when present) via the cascade chain in `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) + `scripts/render-lib/aggregator/seo/localized-brief.ts` to produce each non-English HTML file. The English body remains in English prose — **only the executive-brief hero section + SEO metadata is localized**. | ||
|
|
||
| **Per-type workflows do NOT write `article.<lang>.md`** — these files are now **forbidden** by `scripts/validate-file-ownership.ts` (category-independent reject). Historical `article.<lang>.md` files left in the repo from old runs will be flagged as violations if they appear in a future validation input (e.g. a PR touching that folder). |
| ART_MD_GATE="$ANALYSIS_DIR/article.md" | ||
| if [ -s "$ART_MD_GATE" ]; then | ||
| if command -v npx >/dev/null 2>&1; then | ||
| npx tsx scripts/validate-article.ts "$ART_MD_GATE" || FAIL=1 | ||
| else |
|
@copilot apply changes based on the comments in this thread |
…ime-budget reallocation, post-aggregate check 12, ownership wording Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/af5b8bba-de99-4bdf-886c-3ea11bfb512b Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all 16 review comments in
New tests added: |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| - The **only translated artifacts** are `analysis/daily/$DATE/$SUB/executive-brief_<lang>.md` for the 13 non-English target languages. They are produced exclusively by the dedicated `news-translate` workflow and consumed at render-time via the localized-brief cascade in `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) + `scripts/render-lib/aggregator/seo/localized-brief.ts`. Per-type workflows MUST NOT write `executive-brief_<lang>.md` and MUST NOT write `article.<lang>.md` (the latter is now forbidden — see below). | ||
| - Non-English HTML pages (`news/$DATE-$SUB-<lang>.html`) are rendered by composing the English `article.md` body with the localized executive-brief overlay; no per-language article-body translation is performed any more. |
| ### Step 2 — (No-op) Per-language Markdown translation is no longer performed | ||
|
|
||
| Translation contract: | ||
| Per-type workflows do **not** produce `article.<lang>.md` for any non-English language. The agent stops after writing the canonical English `article.md` from Step 1. Non-English HTML pages are produced by `scripts/render-articles.ts` via the localized executive-brief cascade — the renderer composes the English `article.md` body with `executive-brief_<lang>.md` (when present) into a single Markdown document and emits chrome-wrapped HTML in the target language. See `scripts/render-lib/article-merge.ts` (`mergeLocalizedWithEnglish`) for the merge contract. | ||
|
|
||
| - Translate the body prose, headings and table cells. | ||
| - **Preserve verbatim**: YAML front-matter values that are identifiers (`subfolder`, `slug`, `source_folder`, `dok_id` references, file paths, GitHub URLs), Mermaid code fences, JSON code blocks, numeric values, and Schema.org / dataflow / dataset identifiers. Update `language:` in the front-matter to the target language code. | ||
| - Keep Swedish political terminology in Swedish where it is the proper noun (party names, committee names, document type acronyms, Riksdagsmonitor brand). | ||
| - For Arabic (`ar`) and Hebrew (`he`) the chrome handles `dir="rtl"` automatically — do not add inline direction overrides. | ||
| - Keep IMF / SCB / WB / Statskontoret citation blocks intact, including `economicProvenance` JSON. | ||
| > ⚠️ **Workflow ordering**: per-type workflows render HTML during the same run that produces the English `executive-brief.md`. The dedicated `news-translate` workflow runs on a separate schedule and back-fills `executive-brief_<lang>.md` *after the fact*. On the first HTML render the cascade therefore falls through to the English brief title/description for every non-EN language (`language: <lang>` is still forced so `<html lang>` / JSON-LD `inLanguage` are correct). The newly translated briefs only appear in the localized HTML on the **next** per-type re-render of the same subfolder (e.g. the next scheduled run, a `force_generation=true` re-run, or an explicit `npm run render-articles`). This is intentional — `news-translate` is **forbidden from touching `news/*.html`** (see `validate-file-ownership.ts`) to keep the file-ownership contract free of merge conflicts. |
| // Remove verbatim-source-title lines (e.g. `Source title: Proposition om …`, | ||
| // `Källa: Riksdagens protokoll …`). Match the label at the start of a line | ||
| // (after optional list markers / bold wrappers) and drop the rest of the line. | ||
| body = body.replace( | ||
| /^[ \t]*(?:[-*+][ \t]+)?\**(?:Source title|Källa|Källtitel|Title|Original title)\**[ \t]*:[^\n]*$/gim, | ||
| '' |
|
|
||
| ### Check 12 ordering note | ||
|
|
||
| Check 12 (`scripts/validate-article.ts`) is the **editorial QA gate** and runs on the aggregated `article.md`. The blocking branch in §Implementation only fires when `article.md` is already on disk; the inline gate runs before aggregation, so on a first pass the article validator is **informational** (the gate logs `ℹ️ Check 12 (editorial QA): … skipped (run after aggregator)`). Workflows MUST re-invoke the gate (or call `npx tsx scripts/validate-article.ts $ANALYSIS_DIR/article.md` directly) **after** `scripts/aggregate-analysis.ts` writes `article.md` so the editorial checks (banned phrases, citation density, `economicProvenance` vintage) become blocking before staging. See `06-article-generation.md §Step 2` for the post-aggregation invocation pattern. |
|
@copilot apply changes based on the comments in this thread |
…nment, body-stays-English clarity, narrow Title: exemption, §Step 1b xref Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/95dadcfb-32a5-4bb0-ab1b-69ffe37e7ec2 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all 4 review comments in
All 22 |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Address review thread 4304087566
>=threshold + Swedish marker false positivesfull-text/from analysis-language scan (raw Swedish source material)Source title:/Källa:lines before Swedish-density countarticle.<lang>.md× 13" from 05-analysis-gate.md re-run noteTRANSLATION_GUIDE.mdwording: validator only rejects when localized article path is in changed-file inputfetch-depth: 0hand-edit to all 14 safe_outputs checkouts after recompileAddress review thread 4304114192
03-data-download.mdimprovement-mode table with the English-only cascade (drop "re-translate any non-Englisharticle.<lang>.md"; re-aggregate Englisharticle.mdand re-render 14 HTML via cascade; call out forbidden-artifact rule)06-article-generation.md §Step 2that the renderer keeps the detailed article body in English and only swaps in the localized hero/SEO overlay (H1, dek, BLUF, JSON-LDheadline/description,<title>,<meta>description, OG) fromexecutive-brief_<lang>.md;<html lang>/ JSON-LDinLanguageare still forced to the target languagecheck-analysis-language.tssource-title exemption to explicit attribution labels (Source title,Källa,Källtitel,Original title); bareTitle:lines are no longer exempted, with a regression test locking the behavior in05-analysis-gate.mdre-run note: post-aggregation validator invocation lives in06-article-generation.md §Step 1b — Editorial QA re-check (post-aggregation), not §Step 2