Skip to content

refactor(render-lib): bounded-context decomposition of aggregator + markdown#2066

Merged
pethers merged 6 commits intomainfrom
copilot/improve-code-quality-article-generation
Apr 27, 2026
Merged

refactor(render-lib): bounded-context decomposition of aggregator + markdown#2066
pethers merged 6 commits intomainfrom
copilot/improve-code-quality-article-generation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Refactor: bounded-context decomposition for sitemap/RSS/news-index/political-intelligence generators

Latest commit — fix broken RSS link in political-intelligence pages

Reviewer flagged that political-intelligence/render/page.ts set rssHref to /rss/news.xml and /rss/news_${lang}.xml, but the RSS generator only writes rss.xml at the site root. This produced a broken <link rel="alternate" type="application/rss+xml"> header on all 14 PI pages (a pre-existing bug carried over verbatim from the 2289-LOC monolith on main).

Changed to a single rssHref: '/rss.xml' for all languages — verified the 14 regenerated PI pages now reference the actual feed file:

<link rel="alternate" type="application/rss+xml" title="Riksdagsmonitor news (English)" href="/rss.xml">

Verification

  • npx tsc --project tsconfig.scripts.json --noEmit — clean
  • ✅ PI tests (192) pass — both generate-political-intelligence.test.ts and the new leaf-module suite
  • ✅ All 14 PI HTML pages regenerated; <link rel="alternate" rss> and footer RSS link both point at /rss.xml
  • ✅ Single occurrence repo-wide — no dangling rss/news references remain

Final sizes (LOC) — unchanged

Generator Before After (entry shim) Bounded context total
generate-political-intelligence.ts 2289 79 2554 across 12 files
generate-sitemap-html.ts 1041 80 1128 across 7 files
generate-sitemap.ts 599 63 757 across 8 files
generate-rss.ts 372 68 477 across 7 files
generate-news-indexes.ts (already split) 37 1657 across 5 files

Plan

  • P0 — Split generate-political-intelligence.ts into scripts/political-intelligence/
  • P1 — Split generate-sitemap-html.ts into scripts/sitemap-html/
  • P2 — Split generate-sitemap.ts into scripts/sitemap-xml/
  • P3 — Split generate-rss.ts into scripts/rss/
  • P4generate-news-indexes.ts already split
  • All scripts continue to run under npx tsx scripts/generate-X.ts
  • Add focused unit tests for new leaf modules (103 new tests across 4 files)
  • Full test suite + npm run lint green
  • Verify byte-identical output for every generator vs main
  • Address review feedback — PI no longer imports from another generator's CLI shim
  • Fix CI type-check failure (scripts/sitemap-html/i18n.ts import path)
  • Fix broken RSS <link> in PI pages — point at actual /rss.xml

Copilot AI and others added 2 commits April 27, 2026 18:17
@github-actions github-actions Bot added documentation Documentation updates testing Test coverage refactor Code refactoring size-xl Extra large change (> 1000 lines) labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: documentation,testing,refactor,size-xl

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…d-context leaf modules

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/6302e73d-bc77-4fbf-b173-8b7a60325d74

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions Bot added the html-css HTML/CSS changes label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…leaf modules + add 103 leaf tests

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/6302e73d-bc77-4fbf-b173-8b7a60325d74

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI requested a review from pethers April 27, 2026 19:04
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers marked this pull request as ready for review April 27, 2026 19:07
Copilot AI review requested due to automatic review settings April 27, 2026 19:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors the generator “monoliths” into bounded-context leaf modules with thin CLI shims, adds focused unit tests for the new leaves, and preserves byte-identical outputs (timestamps aside).

Changes:

  • Split sitemap XML, sitemap HTML, RSS, political-intelligence, and render-lib markdown/aggregator logic into leaf modules with public barrels.
  • Added new unit test suites to pin leaf-module invariants and barrel parity.
  • Introduced lightweight structural validators for sitemap XML and RSS feeds.

Reviewed changes

Copilot reviewed 74 out of 80 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/sitemap-xml-leaf-modules.test.ts Adds unit tests for sitemap-xml leaf helpers (hreflang, url entry, validator).
tests/sitemap-html-leaf-modules.test.ts Adds unit tests for sitemap-html escaping and LANGUAGE_META invariants.
tests/rss-leaf-modules.test.ts Adds unit tests for RSS escape/hreflang/pubDate/article meta/validator helpers.
tests/render-lib-leaf-modules.test.ts Adds extensive leaf-module tests for render-lib aggregator + markdown pipeline and barrel parity.
tests/political-intelligence-leaf-modules.test.ts Adds unit tests for political-intelligence i18n/helpers and render helpers.
scripts/sitemap-xml/validator.ts New sitemap.xml structural validator leaf.
scripts/sitemap-xml/scanners/news.ts New recursive news scanner grouping per base slug and unified lastmod.
scripts/sitemap-xml/scanners/docs.ts New docs/ HTML scanner leaf.
scripts/sitemap-xml/scanners/api.ts New api/ (TypeDoc) HTML scanner leaf.
scripts/sitemap-xml/render/url-entry.ts New pure <url> XML block builder with hreflang alternates.
scripts/sitemap-xml/render/sitemap.ts New sitemap.xml builder composing entries and scanner outputs.
scripts/sitemap-xml/index.ts Public barrel for sitemap-xml bounded context.
scripts/sitemap-xml/hreflang.ts New hreflang mapping helper (no → nb).
scripts/sitemap-xml/git-timestamps.ts New git timestamp preload cache with filesystem fallback.
scripts/sitemap-html/render/page.ts New sitemap HTML page renderer using chrome, i18n, scanners, and inline styles.
scripts/sitemap-html/index.ts Public barrel for sitemap-html bounded context.
scripts/sitemap-html/escape.ts New HTML escaper leaf used by sitemap-html rendering.
scripts/sitemap-html/articles/scanner.ts New recursive news scanner + HTML metadata extractor for sitemap HTML pages.
scripts/sitemap-html/articles/docs-sections.ts New docs section probe for conditional sitemap HTML links.
scripts/rss/validator.ts New rss.xml structural validator leaf.
scripts/rss/scanner.ts New RSS news scanner (English-primary) with alternates.
scripts/rss/render/feed.ts New RSS feed renderer composing items and channel metadata.
scripts/rss/pub-date.ts New deterministic pubDate resolver (filename → mtime → sentinel).
scripts/rss/index.ts Public barrel for RSS bounded context.
scripts/rss/hreflang.ts New hreflang helper for RSS context (no → nb).
scripts/rss/escape.ts New XML escaper leaf used by RSS rendering.
scripts/rss/article-meta.ts New extractor for title/description/pubDate/author/category from article HTML.
scripts/render-lib/markdown/sanitize-schema.ts Extracted rehype-sanitize schema and heading ID prefix constant.
scripts/render-lib/markdown/rehype-wrap-tables.ts Extracted rehype plugin to wrap tables for horizontal scrolling.
scripts/render-lib/markdown/rehype-slug-prefixed.ts Extracted prefixed rehype-slug variant to ensure stable IDs.
scripts/render-lib/markdown/pipeline.ts Extracted unified/remark/rehype pipeline orchestrator.
scripts/render-lib/markdown/mermaid-preprocess.ts Extracted pre-remark mermaid fence → <pre class="mermaid"> transform.
scripts/render-lib/markdown/index.ts New markdown pipeline barrel (leaf modules + orchestrator).
scripts/render-lib/markdown.ts Reduced to compat shim re-exporting the new markdown barrel.
scripts/render-lib/aggregator/sources-appendix.ts Extracted appendix builder for GitHub-linked artifact list.
scripts/render-lib/aggregator/seo/title.ts Extracted SEO title parsing/cleaning + BLUF fallback.
scripts/render-lib/aggregator/seo/description.ts Extracted SEO description readers and sentence-aware truncation.
scripts/render-lib/aggregator/reader-guide.ts Extracted reader guide table builder + anchor parity logic.
scripts/render-lib/aggregator/per-document.ts Extracted documents/* expansion logic and existence probe.
scripts/render-lib/aggregator/order.ts Extracted canonical aggregation order + curated titles + fallback prettifier.
scripts/render-lib/aggregator/index.ts New aggregator public barrel + frozen test API assembled from leaves.
scripts/render-lib/aggregator/frontmatter.ts Extracted YAML and inline-markdown escaping + frontmatter builder.
scripts/render-lib/aggregator/cleaning/structural.ts Extracted structural body cleaning, heading demotion, and link rewriting.
scripts/render-lib/aggregator/cleaning/process-meta.ts Extracted line-level process metadata stripper and regex constants.
scripts/render-lib/aggregator/cleaning/pass-two.ts Extracted pass-2 audit section stripper and regex constant.
scripts/render-lib/aggregator/cleaning/admin-bylines.ts Extracted paragraph-level admin byline stripper and regex constants.
scripts/render-lib/aggregator/aggregate.ts Extracted slim top-level aggregation orchestrator (file reads + composition).
scripts/political-intelligence/render/style.ts Extracted inline CSS for political-intelligence page.
scripts/political-intelligence/render/page.ts Extracted political-intelligence page builder (chrome, JSON-LD, sections).
scripts/political-intelligence/render/grid.ts Extracted catalog grid renderer.
scripts/political-intelligence/render/daily-day.ts Extracted daily-day renderer + artifact basename/icon helpers.
scripts/political-intelligence/index.ts Public barrel for political-intelligence bounded context.
scripts/political-intelligence/i18n/artifact-i18n.ts Extracted artifact title i18n + generic description fallback helpers.
scripts/political-intelligence/daily-streams.ts Extracted daily stream/day discovery + artifact collection.
scripts/political-intelligence/catalog.ts Extracted catalog discovery + GitHub URL builder.
scripts/generate-rss.ts Converted to thin CLI shim + re-export shim for RSS generator.
analysis/daily/2026-04-27/propositions/article.md Re-generated artifact showing Reader Guide placement shift and timestamp update.
Comments suppressed due to low confidence (7)

scripts/render-lib/markdown/mermaid-preprocess.ts:1

  • The markdown pipeline leaf module depends on generate-sitemap-html.js just to get escapeHtml. This creates an undesirable cross-generator dependency (and can easily introduce circular imports or inflate import cost) inside what’s intended to be a small, reusable render-lib building block. Move the escaper into a shared, dependency-light utility (e.g., under scripts/render-lib/ or scripts/shared/) or import from the scripts/sitemap-html/escape.js leaf/barrel directly (not the CLI shim), so render-lib doesn’t depend on another generator’s entrypoint.
    scripts/rss/render/feed.ts:1
  • This makes generateRss() non-deterministic when there are zero articles, because it falls back to new Date().toUTCString(). The PR description stresses byte-identical output and reproducibility; this is a concrete case where output will vary run-to-run. Use a fixed sentinel (similar to stablePubDate’s sentinel) or derive a stable value from the repo (e.g., latest commit time) for the empty-feed case.
    scripts/sitemap-xml/render/sitemap.ts:1
  • The module-level description claims the function is pure and only composes scanner output, but generateSitemap() in this file directly performs filesystem existence checks (e.g., RSS/dashboard variants) and calls getFileModTime() on concrete paths. Please either update the docstring to reflect the current behavior, or move these filesystem checks into scanners/orchestrator inputs so the render module matches the stated ‘pure composition’ contract.
    scripts/sitemap-xml/validator.ts:1
  • The validator is used as a leaf-module unit under tests, but it logs to stdout on every call. This makes the helper impure and adds noise to test output/CI logs. Consider removing console.log from validateSitemap() (and letting the CLI shim handle logging), or accept an optional logger callback so library consumers can stay quiet by default.
    scripts/sitemap-xml/validator.ts:1
  • The validator is used as a leaf-module unit under tests, but it logs to stdout on every call. This makes the helper impure and adds noise to test output/CI logs. Consider removing console.log from validateSitemap() (and letting the CLI shim handle logging), or accept an optional logger callback so library consumers can stay quiet by default.
    scripts/sitemap-xml/validator.ts:1
  • The validator is used as a leaf-module unit under tests, but it logs to stdout on every call. This makes the helper impure and adds noise to test output/CI logs. Consider removing console.log from validateSitemap() (and letting the CLI shim handle logging), or accept an optional logger callback so library consumers can stay quiet by default.
    scripts/sitemap-html/escape.ts:1
  • Preserving arbitrary named/numeric entities means inputs like &#x3C;script&#x3E;... will not have their & escaped and can decode into real tags when inserted into HTML, defeating the XSS protection goal stated in the docstring. If this function is used for any untrusted content, it should not preserve general entities; instead, escape & unconditionally or only allowlist a small set of safe entities (e.g., &amp;, &lt;, &gt;, &quot;, &#039;). Align the tests with the safer contract if needed.

import { fileURLToPath } from 'url';

import type { Language } from '../../types/language.js';
import { LANGUAGE_META, escapeHtml } from '../../generate-sitemap-html.js';
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Political-intelligence leaf modules import LANGUAGE_META / escapeHtml from generate-sitemap-html.js (a generator entrypoint). This undermines the bounded-context split by coupling PI rendering to another generator’s CLI shim API surface. Prefer importing from the sitemap-html bounded-context barrel (scripts/sitemap-html/index.js) or extracting LANGUAGE_META + escaping into a small shared module used by both contexts.

Suggested change
import { LANGUAGE_META, escapeHtml } from '../../generate-sitemap-html.js';
import { LANGUAGE_META, escapeHtml } from '../../sitemap-html/index.js';

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot apply changes based on the comments in this thread and fix issues in TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)
TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)Failing after 45s

…sitemap-html CLI shim

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/abefd963-1d55-4fc9-87a3-af54322599d6

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot apply changes based on the comments in this thread and fix issues in [TypeScript & JavaScript Testing / Uni...

Both addressed in f2835a0:

  1. Reviewer threadpolitical-intelligence/render/{page,daily-day,grid}.ts now import LANGUAGE_META / escapeHtml from the scripts/sitemap-html/index.js bounded-context barrel instead of the generate-sitemap-html.js CLI shim, removing the cross-context coupling.
  2. Vitest CI failure — root cause was scripts/sitemap-html/i18n.ts(18,31): TS2307: Cannot find module '../../types/language.js'. That file sits at depth 2 (unlike the other leaves at depth 3), so the relative path needed one fewer ../. Fixed to ../types/language.js. Both tsc --project tsconfig.scripts.json --noEmit and tsconfig.browser.json --noEmit are now clean, and the affected tests (250) plus byte-identical PI output verified locally.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers requested a review from Copilot April 27, 2026 19:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 80 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (6)

scripts/render-lib/markdown/mermaid-preprocess.ts:1

  • This leaf module depends on ../../generate-sitemap-html.js (a CLI shim), reintroducing cross-context coupling and making the markdown pipeline sensitive to unrelated generator entrypoints. Move the escaping dependency into render-lib (e.g., a local minimal escaper inside this module, or a shared render-lib utility) so the markdown bounded context can load in isolation without importing another generator’s CLI.
    scripts/render-lib/markdown/mermaid-preprocess.ts:1
  • This leaf module depends on ../../generate-sitemap-html.js (a CLI shim), reintroducing cross-context coupling and making the markdown pipeline sensitive to unrelated generator entrypoints. Move the escaping dependency into render-lib (e.g., a local minimal escaper inside this module, or a shared render-lib utility) so the markdown bounded context can load in isolation without importing another generator’s CLI.
    scripts/rss/render/feed.ts:1
  • This breaks the stated determinism contract for RSS output when there are zero articles: lastBuildDate becomes the current time, making builds non-reproducible. Use a fixed sentinel (similar to stablePubDate’s sentinel) or derive a stable value from the repo (e.g., a git timestamp) so empty-feed renders are deterministic as well.
    scripts/sitemap-xml/scanners/api.ts:1
  • Unlike the docs scanner, this scanner never sorts results before returning. That can make sitemap generation output order non-deterministic across filesystems/platforms. Sort by file (and consider sorting entries inside scanDir as well) before returning so the sitemap XML stays stable.
    scripts/sitemap-xml/render/url-entry.ts:1
  • Values interpolated into XML (loc, alt.href, and potentially lastmod) are not XML-escaped or URL-encoded. If any scanned path contains characters like &, <, \", or non-URL-safe characters, the generated sitemap can become invalid XML or contain invalid URLs. Add escaping/encoding for attribute values and element text (at minimum &, <, >, \", ') and consider encodeURI/encodeURIComponent as appropriate for URL paths.
    scripts/sitemap-xml/validator.ts:1
  • This validator is exported from the bounded-context barrel and currently always logs to stdout. That can add noise in unit tests and in library-style consumption. Consider injecting a logger (or a quiet option) and/or moving logging responsibilities to the CLI shim so validateSitemap remains a pure validation utility.

Comment on lines +166 to +185
const chrome = buildChrome({
lang,
title: t.title,
description: t.metaDescription,
keywords: t.metaKeywords,
canonicalPath: selfFile,
hreflangAlternates,
defaultAlternateBase: 'political-intelligence.html',
ogType: 'website',
section: t.title,
publishedIso: `${latestDate}T00:00:00Z`,
modifiedIso: buildIso,
rssHref: lang === 'en' ? '/rss/news.xml' : `/rss/news_${lang}.xml`,
breadcrumb: [
{ label: t.home, href: indexFile },
{ label: t.title },
],
jsonLd,
extraStyle: PI_EXTRA_STYLE,
});
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RSS link path here (/rss/news.xml and /rss/news_${lang}.xml) is inconsistent with the RSS generator in this PR, which writes rss.xml at the site root (/rss.xml). This will publish a broken RSS <link>/header reference on the political-intelligence page. Align rssHref with the actual generated feed path (e.g. /rss.xml, and only add language variants if they are truly produced elsewhere).

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

…n /rss/news.xml link)

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/827708e4-f2cf-4ce5-b745-11af9f98b679

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

Fixed in 16023f5. Changed rssHref in scripts/political-intelligence/render/page.ts from /rss/news.xml (and /rss/news_${lang}.xml) to a single /rss.xml — matches the actual file written by generate-rss.ts. Verified all 14 regenerated PI pages now emit <link rel="alternate" type="application/rss+xml" href="/rss.xml"> and a footer link to the same path; PI tests (192) pass and tsc --project tsconfig.scripts.json --noEmit is clean.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers merged commit 33d214a into main Apr 27, 2026
16 checks passed
@pethers pethers deleted the copilot/improve-code-quality-article-generation branch April 27, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation updates html-css HTML/CSS changes refactor Code refactoring size-xl Extra large change (> 1000 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants