Harden MCP coverage-state reporting, deferred retries, and voteringar lag diagnostics#2498
Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db0617e3-4e82-40c9-b3ad-61ad861020d0 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db0617e3-4e82-40c9-b3ad-61ad861020d0 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db0617e3-4e82-40c9-b3ad-61ad861020d0 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db0617e3-4e82-40c9-b3ad-61ad861020d0 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This PR adds explicit MCP coverage/provenance reporting to the parliamentary data pipeline so analysis artifacts can distinguish full text, metadata-only responses, not-yet-indexed documents, and empty searches.
Changes:
- Adds MCP coverage/provenance types, client diagnostics wrappers, and coverage inference helpers.
- Introduces a file-backed deferred MCP retry queue and integrates it into the download pipeline/manifest.
- Updates docs and tests for manifest diagnostics, retry queue behavior, smoke testing, and workflow coverage counts.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tests/network-diagnostics.test.ts |
Updates workflow count/list expectations to 14 news workflows. |
tests/mcp-retry-queue.test.ts |
Adds unit tests for retry queue dedupe and resolved document drain behavior. |
tests/mcp-client-smoke.test.ts |
Adds gated live MCP smoke test for structured provenance. |
tests/mcp-client-core-part2.test.ts |
Adds coverage/diagnostic tests for empty searches, voting lag, and document details. |
tests/data-downloader-enrichment.test.ts |
Updates mock client with MCP base URL for provenance generation. |
tests/auto-full-text-top-n.test.ts |
Updates mocks and adds manifest serialization assertions for coverage/queue sections. |
scripts/types/mcp.ts |
Defines coverage state, provenance, signal, and diagnostic interfaces. |
scripts/parliamentary-data/mcp-retry-queue.ts |
Adds file-backed retry queue creation, persistence, enqueue, and drain logic. |
scripts/parliamentary-data/full-text-threshold.ts |
Extracts shared full-text length threshold. |
scripts/parliamentary-data/data-downloader.ts |
Propagates MCP coverage/provenance through downloads and full-text enrichment. |
scripts/mcp-client/index.ts |
Exposes new diagnostic wrapper convenience functions. |
scripts/mcp-client/coverage.ts |
Adds coverage inference and provenance helper utilities. |
scripts/mcp-client/client.ts |
Adds diagnostic wrappers for document search, voting search, and document details. |
scripts/mcp-client.ts |
Re-exports the new MCP diagnostic APIs. |
scripts/download-parliamentary-data.ts |
Integrates retry draining/enqueueing and extends manifest serialization. |
scripts/data-transformers/types.ts |
Adds MCP coverage/provenance fields to raw document shape. |
data/mcp-retry-queue.json |
Adds initial empty retry queue file. |
analysis/templates/data-download-manifest.md |
Documents the new MCP coverage-state manifest contract. |
analysis/methodologies/ai-driven-analysis-guide.md |
Updates analysis workflow guidance for coverage/provenance and deferred retries. |
| const provenance = buildMcpProvenance({ | ||
| endpoint: client.baseURL, | ||
| tool: 'get_dokument_innehall', | ||
| query: { dok_id: dokId, include_full_text: true }, | ||
| resultCount: 0, | ||
| coverageState: 'not_indexed', | ||
| }); | ||
| outcome = { | ||
| dokId, | ||
| success: false, | ||
| chars: 0, | ||
| reason: `fetchDocumentDetails failed: ${err instanceof Error ? err.message : String(err)}`, | ||
| coverageState: 'not_indexed', | ||
| provenance, |
| dokId, | ||
| true, | ||
| { | ||
| requestedDate: typeof docRecord['datum'] === 'string' ? docRecord['datum'] as string : null, |
| try { | ||
| const response = await this.fetchDocumentDetails(dok_id, include_full_text); | ||
| const coverageState = inferDocumentCoverageState(response, { | ||
| requestedDate: options.requestedDate ?? extractDocumentDate(response), |
| if (entry.resourceType === 'document_fulltext') { | ||
| const result = await client.fetchDocumentDetailsWithCoverage( | ||
| entry.resourceId, | ||
| true, | ||
| { | ||
| requestedDate: (entry.params['requestedDate'] as string | undefined) ?? null, | ||
| retrieval: 'retry_queue', | ||
| }, | ||
| ); | ||
|
|
||
| diagnostics.push({ | ||
| tool: entry.tool, | ||
| query: { ...entry.params, dok_id: entry.resourceId, include_full_text: true }, | ||
| resultCount: result.resultCount, | ||
| coverageState: result.coverageState, | ||
| provenance: result.provenance, | ||
| notes: entry.reason, | ||
| }); | ||
|
|
||
| if (result.coverageState === 'full_text') { | ||
| resolved++; | ||
| resolvedDocuments[entry.resourceId] = result.document; | ||
| continue; | ||
| } | ||
|
|
||
| remaining.push({ | ||
| ...entry, | ||
| attemptCount: entry.attemptCount + 1, | ||
| coverageState: result.coverageState, | ||
| reason: entry.reason ?? `Deferred ${entry.tool} retry still ${result.coverageState}`, | ||
| lastAttemptAt, | ||
| }); | ||
| retained++; | ||
| continue; | ||
| } | ||
|
|
||
| const votingParams = entry.params; | ||
| if (typeof votingParams !== 'object' || votingParams === null) { | ||
| remaining.push({ | ||
| ...entry, | ||
| attemptCount: entry.attemptCount + 1, | ||
| reason: 'retry queue entry has invalid voting params payload', | ||
| lastAttemptAt, | ||
| }); | ||
| retained++; | ||
| continue; | ||
| } | ||
|
|
||
| const votingResult = await client.fetchVotingRecordsWithDiagnostics( | ||
| votingParams as FetchVotingFilters, | ||
| ); | ||
|
|
||
| diagnostics.push({ | ||
| tool: entry.tool, | ||
| query: { ...(entry.params as Record<string, unknown>) }, | ||
| resultCount: votingResult.resultCount, | ||
| coverageState: votingResult.coverageState, | ||
| provenance: votingResult.provenance, | ||
| notes: entry.reason, | ||
| ...(votingResult.signal ? { signal: votingResult.signal } : {}), | ||
| }); | ||
|
|
||
| if (votingResult.resultCount > 0) { | ||
| resolved++; | ||
| continue; | ||
| } | ||
|
|
||
| remaining.push({ | ||
| ...entry, | ||
| attemptCount: entry.attemptCount + 1, | ||
| coverageState: votingResult.coverageState, | ||
| reason: votingResult.signal?.message ?? entry.reason, | ||
| lastAttemptAt, | ||
| }); | ||
| retained++; |
| const votingResult = await client.fetchVotingRecordsWithDiagnostics( | ||
| votingParams as FetchVotingFilters, | ||
| ); |
| const coverageState = inferDocumentCoverageState( | ||
| { ...docRecord, ...details }, | ||
| { | ||
| requestedDate: typeof docRecord['datum'] === 'string' ? docRecord['datum'] as string : null, |
| const notes = outcome?.reason | ||
| ?? outcome?.filePath | ||
| ?? (doc.contentFetched | ||
| ? (typeof doc.summary === 'string' && doc.summary.trim().length > 0 ? 'summary present' : 'metadata-only payload') | ||
| : 'list payload only; get_dokument_innehall not attempted in this run'); | ||
| return { | ||
| dokId, | ||
| coverageState: provenance.coverageState, | ||
| retrieval: provenance.retrieval, | ||
| tool: provenance.tool, | ||
| resultCount: provenance.resultCount, | ||
| notes, |
| toolDiagnostics.push({ | ||
| tool: task.source, | ||
| query, | ||
| resultCount: 0, | ||
| coverageState: 'search_empty', | ||
| provenance: buildMcpProvenance({ | ||
| endpoint: client.baseURL, | ||
| tool: task.source, | ||
| query, | ||
| resultCount: 0, | ||
| coverageState: 'search_empty', |
| const resolvedIds = new Set(Object.keys(retryDrain.resolvedDocuments)); | ||
| for (const doc of allDocs) { | ||
| const dokId = extractDokId(doc, ''); | ||
| if (!dokId || !resolvedIds.has(dokId)) continue; | ||
| Object.assign(doc, retryDrain.resolvedDocuments[dokId]); | ||
| } | ||
| console.log(` 🔁 Deferred queue restored full text for ${resolvedIds.size} document(s)`); |
|
@copilot apply changes based on the comments in this thread |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| resultCount, | ||
| coverageState, | ||
| }); | ||
| Object.assign(record, attachCoverageMetadata(record, provenance)); |
| if (fullTextOutcomes) { | ||
| const docMap = new Map(allDocs.map(doc => [extractDokId(doc, ''), doc])); | ||
| for (const outcome of fullTextOutcomes) { | ||
| const doc = docMap.get(outcome.dokId); | ||
| if (!doc) continue; | ||
| if (outcome.coverageState !== 'full_text' && typeof doc.datum === 'string' && doc.datum.slice(0, 10) === date) { | ||
| queueEntries.push(createRetryQueueEntry({ | ||
| resourceType: 'document_fulltext', | ||
| resourceId: outcome.dokId, | ||
| tool: 'get_dokument_innehall', | ||
| coverageState: outcome.coverageState, | ||
| docType, | ||
| params: { requestedDate: doc.datum.slice(0, 10), include_full_text: true }, | ||
| reason: outcome.reason, | ||
| requestedAt: new Date().toISOString(), | ||
| })); | ||
| } | ||
| } |
| const record = doc as Record<string, unknown>; | ||
| if (!record['mcpCoverageState']) { | ||
| const coverageState = inferDocumentCoverageState(record, { | ||
| requestedDate: typeof doc.datum === 'string' ? doc.datum : null, |
| let content = selectContent(docRecord); | ||
| const runDate = new Date().toISOString().slice(0, 10); | ||
|
|
||
| if (content.length <= FULL_TEXT_MIN_LENGTH) { | ||
| details = (await client.fetchDocumentDetails(dokId, true)) as Record<string, unknown>; | ||
| const detailsWithCoverage = await client.fetchDocumentDetailsWithCoverage( | ||
| dokId, | ||
| true, | ||
| { | ||
| requestedDate: runDate, |
| } catch (drainErr) { | ||
| console.warn( | ||
| `[mcp-retry-queue] Document retry failed for ${entry.resourceId}:`, | ||
| drainErr instanceof Error ? drainErr.message : String(drainErr), | ||
| ); | ||
| remaining.push({ | ||
| ...entry, | ||
| attemptCount: entry.attemptCount + 1, | ||
| reason: `Retry failed: ${drainErr instanceof Error ? drainErr.message : String(drainErr)}`, | ||
| lastAttemptAt, | ||
| }); | ||
| retained++; | ||
| } |
| const notes = diag.signal?.code | ||
| ? `${diag.signal.code}: ${diag.signal.message}` | ||
| : (diag.notes ?? ''); | ||
| lines.push( | ||
| `| ${diag.tool} | \`${JSON.stringify(diag.query)}\` | ${diag.resultCount} | ${diag.coverageState} | ${notes} |`, | ||
| ); |
| const NOT_INDEXED_ERROR_PATTERNS = [ | ||
| 'not found', | ||
| '404', | ||
| 'not indexed', | ||
| 'no document', | ||
| 'ingen', | ||
| ] as const; |
| // Distinguish true indexing-gap errors from transient/operational failures | ||
| const isNotIndexed = ['not found', '404', 'not indexed', 'no document', 'ingen'].some( | ||
| (p) => msgLower.includes(p), | ||
| ); | ||
| const state = isNotIndexed ? 'not_indexed' : 'fetch_error' as const; |
| const updatedQueue: MCPRetryQueueFile = { | ||
| schema: MCP_RETRY_QUEUE_SCHEMA, | ||
| updatedAt: now.toISOString(), | ||
| entries: remaining, | ||
| }; | ||
| saveMcpRetryQueue(updatedQueue, queuePath); |
| if (votingResult.resultCount > 0) { | ||
| resolved++; | ||
| continue; |
|
@copilot apply changes based on the comments in this thread and resolve conflicts and improve |
…etry-queue isolation, escaping) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/70723dd0-25d0-4003-80e3-19b0590ab06f Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Merged
Test runs after the merge: targeted ( |
|
@copilot resolve the merge conflicts in this pull request |
…indexing-lag # Conflicts: # scripts/download-parliamentary-data.ts # tests/auto-full-text-top-n.test.ts Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Resolved in |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| return str | ||
| .replace(/\r\n?/g, ' ') | ||
| .replace(/\n/g, ' ') | ||
| .replace(/\t/g, ' ') | ||
| .replace(/\|/g, '\\|') |
| const detailsWithCoverage = await client.fetchDocumentDetailsWithCoverage( | ||
| dokId, | ||
| true, | ||
| { | ||
| requestedDate: runDate, | ||
| }, | ||
| ); | ||
| details = detailsWithCoverage.document; | ||
| content = selectContent(details); | ||
| } | ||
|
|
||
| const coverageState = inferDocumentCoverageState( | ||
| { ...docRecord, ...(details ?? {}) }, |
| const coverageState = inferDocumentCoverageState( | ||
| { ...docRecord, ...details }, | ||
| { | ||
| requestedDate: analysisRunDate, | ||
| fullTextRequested: true, | ||
| }, | ||
| ); | ||
| docRecord['mcpCoverageState'] = coverageState; |
| let mergedVoteCount = 0; | ||
| for (const [queryKey, items] of Object.entries(retryDrain.resolvedVoteringar)) { | ||
| if (!Array.isArray(items) || items.length === 0) continue; | ||
| data.votes.push(...(items as RawDocument[])); | ||
| mergedVoteCount += items.length; | ||
| console.log(` 🗳️ Recovered ${items.length} voteringar from deferred queue (${queryKey})`); | ||
| } | ||
| if (mergedVoteCount > 0) { | ||
| console.log(` 🔁 Deferred queue restored ${mergedVoteCount} voteringar row(s) — appended to current-run output`); |
| // the current download selection. | ||
| if (Array.isArray(votingResult.items)) { | ||
| resolvedVoteringar[entry.resourceId] = votingResult.items as unknown[]; |
|
@copilot apply changes based on the comments in this thread |
Plan — round 2 review feedback
origin/mainand resolve conflicts indownload-parliamentary-data.ts+auto-full-text-top-n.test.tsmcpCoverageState/mcpProvenance/mcpSignalsintoanalysis/data/raw files viastripInMemoryCoverageMetadataindata-persistence.tsanalysisRunDateplumb-through so the default enrichment path's coverage state is tied to the pipeline date (and anynot_indexedenriched docs are still surfaced to the retry queue)buildDocumentCoverageSummarynow takesanalysisRunDatefrom the pipeline (date) rather than the host wall clockfetchFullTextForTopNaccepts{ runDate }and threads it intoinferDocumentCoverageStatefetch_errordiagnostic with the exact query + error so they appear in## MCP Query DiagnosticsescapeMarkdownCellescapes pipes / collapses newlines / tabs across the diagnostics, coverage and full-text tables404/not foundmatching withisDocumentNotIndexedError— transport sentinels (mcp server error,econnrefused,5xx, …) forcefetch_error; document-level not-found requires a dok_id echodrainMcpRetryQueueonly writes the queue file when there was real work (entries pre-existing, processed, expired, or remaining)resolvedVoteringarand the downloader appends recovered rows todata.votesfor the current runtests/mcp-retry-queue.test.ts,tests/auto-full-text-top-n.test.ts,tests/data-persistence.test.ts) — 87 tests passingorigin/main, kept our coverage/provenance features in both conflict zones