Add MCP data persistence layer with collision-free sidecar design#1440
Add MCP data persistence layer with collision-free sidecar design#1440
Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
- Create analysis/data/ directory structure for persistent MCP data storage (documents/propositions, motions, committeeReports, votes, speeches, questions, interpellations, government; votes by date; events by date; mps) - Add data-persistence.ts module with persistDownloadedData(), persistEvents(), persistMPs() for consistent ID-based filenames and provenance metadata - Add pdf-converter.ts utility for PDF-to-text/markdown conversion using poppler-utils pdftotext - Integrate data persistence into pre-article-analysis.ts pipeline (Step 1a) - Update all 12 agentic workflow safe-outputs create-pull-request config with labels, draft:false, and expires:14 for analysis data PRs - Update analysis/README.md with new data directory documentation - Add comprehensive tests (16 new tests, all passing) - All 108 existing pre-article-analysis tests pass Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/71e2d9ad-ee6f-4cb9-815f-7a455f4d2848 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…uplicate regex - Extract MIN_HEADING_LENGTH/MAX_HEADING_LENGTH constants in pdf-converter.ts - Extract ISO_DATE_RE constant in data-persistence.ts to eliminate regex duplication - Fix test type safety by using delete instead of undefined cast Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/71e2d9ad-ee6f-4cb9-815f-7a455f4d2848 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…add MCP response storage
- Rewrite data-persistence.ts to use collision-free design:
- Data files ({id}.json) contain ONLY raw source data (no _metadata injection)
- Metadata in separate sidecar files ({id}.meta.json) — safe to overwrite
- Parallel workflows writing same document produce byte-identical output
- Add persistMCPResponse() for storing ANY MCP tool call/response
- Add persistWorldBankData() for World Bank economic indicators
- Add persistSCBData() for Statistics Sweden table data
- Add analysis/data/worldbank/, scb/, mcp-responses/ directories
- Export resolveDocId() for testing
- Update tests: 25 tests covering collision-free design, MCP storage, resolveDocId
- Update analysis/data/README.md with v2 collision-free design documentation
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/bb68f8b6-e5d2-4da1-83e5-483a9f58073e
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ts, test isolation - Remove duplicate directory structure/filename convention sections from README - Clarify ISO_DATE_RE comment (format check only, not date validity) - Add debug logging to PDF converter temp file cleanup - Use unique test IDs with timestamps for MCP storage test isolation Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/bb68f8b6-e5d2-4da1-83e5-483a9f58073e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ove unused 'fallback' type, suppress cleanup noise Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/a5090ace-74b1-4e4a-9482-d54d0f285662 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/a5090ace-74b1-4e4a-9482-d54d0f285662 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
All 5 review comments addressed in
Added 2 new tests for riksmote derivation. All 4599 tests pass, CodeQL: 0 alerts. |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| persistMCPResponse( | ||
| { tool: 'search_regering', params: { limit }, server: 'riksdag-regering' }, | ||
| doc, | ||
| resolveDocId(doc, i), | ||
| ); |
There was a problem hiding this comment.
The persisted MCP provenance for government documents doesn't reflect the actual fetch parameters. fetchGovernmentDocuments is called with dateFrom, dateTo, and limit, but the persistMCPResponse metadata only records { limit }, which makes the sidecar params misleading for audit/reproducibility. Include the same parameter set used for the fetch (at least dateFrom, dateTo, limit).
scripts/populate-analysis-data.ts
Outdated
| const id = (typeof record['parti'] === 'string' && record['parti']) | ||
| || `group-${i + 1}`; | ||
| persistMCPResponse( | ||
| { tool: 'get_voting_group', params: { rm, groupBy: 'parti' }, server: 'riksdag-regering' }, |
There was a problem hiding this comment.
The persisted MCP provenance for voting groups omits limit, even though fetchVotingGroup is called with { rm, groupBy: 'parti', limit }. This makes the .meta.json params inconsistent with the actual request that produced the data. Record the full param set (including limit) in the persistMCPResponse call.
| { tool: 'get_voting_group', params: { rm, groupBy: 'parti' }, server: 'riksdag-regering' }, | |
| { tool: 'get_voting_group', params: { rm, groupBy: 'parti', limit }, server: 'riksdag-regering' }, |
|
@copilot apply changes based on the comments in this thread |
…t for voting groups Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/e86d7c55-f815-4e02-8c16-58ce9d052207 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Both review comments addressed in
All 4599 tests pass. |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| const candidate = args[i + 1]; | ||
| if (!/^\d{4}-\d{2}-\d{2}$/.test(candidate) || isNaN(new Date(candidate + 'T00:00:00Z').getTime())) { | ||
| console.error(`❌ Invalid date: "${candidate}". Expected YYYY-MM-DD format.`); | ||
| process.exit(1); | ||
| } | ||
| date = candidate; | ||
| i++; |
There was a problem hiding this comment.
--date validation uses isNaN(new Date(candidate + 'T00:00:00Z').getTime()), but JS Dates accept overflow values (e.g. 2026-02-30 becomes a valid March date). This means invalid YYYY-MM-DD inputs can slip through and produce an incorrect riksmöte and date window. Consider reusing the repo’s strict parse-and-roundtrip validation pattern (construct UTC Date from Y/M/D and verify components) like parseAndValidateIsoDate in scripts/pre-article-analysis.ts.
| * Converted content is stored alongside the original JSON metadata with a | ||
| * `.txt` or `.md` extension. | ||
| * |
There was a problem hiding this comment.
Module header says converted content is stored alongside original JSON metadata as .txt/.md, but this module only returns converted text and never writes those output files (other than a temporary PDF for buffer conversion). Either update the documentation to reflect that callers are responsible for persistence, or implement the described .txt/.md output behavior.
| * - **Sidecar metadata** (`{id}.meta.json`) tracks provenance (fetch | ||
| * timestamp, MCP tool name, riksmöte, document type) in a separate file | ||
| * that is safely overwritten on each run. |
There was a problem hiding this comment.
The module-level comment says sidecar metadata always includes riksmöte and document type for all persisted MCP data, but persistWorldBankData and persistSCBData currently write sidecars without riksmote/documentType (they only include fetchedAt/mcpTool plus indicator/country or tableId/query). Consider either (a) standardizing these sidecars to include the common fields (with riksmote: '' and documentType: 'worldbank'|'scb') or (b) updating the module docs/types to reflect that sidecar schemas differ by data source.
| * - **Sidecar metadata** (`{id}.meta.json`) tracks provenance (fetch | |
| * timestamp, MCP tool name, riksmöte, document type) in a separate file | |
| * that is safely overwritten on each run. | |
| * - **Sidecar metadata** (`{id}.meta.json`) tracks provenance (at minimum | |
| * fetch timestamp and MCP tool name) in a separate file that is safely | |
| * overwritten on each run. Riksdag/Riksdag-regeringen documents also | |
| * include `riksmöte` and `documentType`, while external MCP tools | |
| * (World Bank, SCB, etc.) use tool-specific fields (e.g. indicator / | |
| * country, tableId / query) instead. |
|
@copilot apply changes based on the comments in this thread |
…ence sidecar docs Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/61d94536-7fdb-4d6f-ba20-230ad1899462 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
All 3 review comments addressed in
All 4599 tests pass. |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| // No system tool available | ||
| return { | ||
| success: false, | ||
| text: '', | ||
| tool: 'none', | ||
| error: 'No PDF conversion tool available. Install poppler-utils: apt-get install poppler-utils', | ||
| }; |
There was a problem hiding this comment.
The fallback error message hard-codes Debian/Ubuntu install instructions (apt-get install poppler-utils). Since this code can run on macOS/Windows as well (and CI runners may vary), consider making the message platform-agnostic (e.g. “install Poppler/pdftotext”) or include multiple common install commands (apt/brew/choco) or a link to Poppler docs.
os.tmpdir()instead of/tmp#!/usr/bin/env tsx-1,-2for duplicate IDs)government/from documented tree (stored in mcp-responses)analysis/data/documents/government/.gitkeep(government docs use mcp-responses)riksmoteparam topersistMCPResponse+ derive fromcall.params.rmcrypto.randomUUID()fallback in persistMCPResponse for unique IDs'fallback'fromConversionResult.toolunion typecrypto.randomUUID()for temp PDF filenames + suppressconsole.debugon cleanupdateFrom,dateTo,limitin sidecar paramslimitin sidecar params2026-02-30⚡ Quickly spin up Copilot coding agent tasks from anywhere on your macOS or Windows machine with Raycast.