Add comprehensive analysis quality validation test suite#1573
Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Creates tests/analysis-quality-validation.test.ts with 36 test cases across 7 categories validating analysis output quality: - Template Structure (8 tests): metadata fields, sibling files, placeholders - Evidence Density (6 tests): dok_id citations, document counts - Mermaid Diagrams (5 tests): presence, color-coded styling - Confidence Labels (5 tests): metadata values, inline labels, L×I scoring - Banned Patterns (3 tests): boilerplate detection, unattributed claims - Cross-Reference Integrity (3 tests): dok_id matching, document consistency - Overall Quality (6 tests): meta-validation of test coverage Tests detect v1 (paragraph) vs v2 (table) format and apply appropriate validation rules. Strict-v2 checks require structured IDs (SYN-YYYY-MM-DD-NNN). Resolves issue requirements for automated quality gate. Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…e, stricter matching - Replace non-null assertions with optional chaining in extractDocumentsAnalyzedCount - Add null check for readAnalysisFile in confidence label verification - Limit "significant because" regex lookahead to 200 chars (prevent cross-paragraph false positives) - Add minimum 4-char length check for substring matching in cross-reference integrity Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
Adds a comprehensive Vitest-based quality gate that scans analysis/daily/YYYY-MM-DD/{articleType}/ directories on the real filesystem and validates analysis outputs against structural, evidence, diagram, confidence, banned-pattern, and cross-reference requirements, with format-aware rules for v1 vs strict-v2 analysis templates.
Changes:
- Introduces discovery + helper utilities to classify analysis directories (strict-v2 detection, document presence) and parse key quality signals (dok_id citations, metadata, Mermaid blocks, confidence indicators).
- Adds 36+ assertions across 7 categories to enforce template completeness, evidence density, Mermaid presence/styling, confidence labeling, banned patterns, cross-reference integrity, and meta “coverage” checks.
- Provides detailed failure messages that pinpoint the exact directory/file and unmet requirement.
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | ||
| if (!content) continue; | ||
| const count = countConfidenceLabels(content); | ||
| expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1); |
There was a problem hiding this comment.
The test description/comments say inline confidence labels should have ≥2 occurrences for coverage, but the assertion currently only enforces ≥1. Please align the expectation with the stated requirement (either update the threshold to 2 or adjust the wording so the test matches the intended gate).
| expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1); | |
| expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(2); |
| it('should have a valid Overall Confidence value (HIGH, MEDIUM, or LOW) in synthesis files', () => { | ||
| const failures: string[] = []; | ||
|
|
||
| for (const dir of synthesisDirectories) { | ||
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | ||
| if (!content) continue; | ||
| // Check both v2 table and v1 paragraph formats | ||
| const tableMatch = /\*\*(?:Overall\s+)?Confidence(?:\s+Level)?\*\*\s*\|\s*(.+)/i.exec(content); | ||
| const paraMatch = /\*\*(?:Overall\s+)?Confidence\*\*:\s*(.+)/i.exec(content); | ||
| const value = tableMatch?.[1]?.trim() ?? paraMatch?.[1]?.trim(); | ||
| if (value && !/\b(HIGH|MEDIUM|LOW)\b/i.test(value)) { | ||
| failures.push( | ||
| `${dir.date}/${dir.articleType}/synthesis-summary.md: invalid confidence value "${value}"` | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
This test is named as if it validates that synthesis files have a valid Overall Confidence value, but the current logic only flags invalid values when a match exists. If the confidence metadata line is missing entirely (and the file still contains words like HIGH/MEDIUM/LOW elsewhere), this test will pass. Consider treating a missing match/value as a failure to ensure the metadata is actually present.
|
@copilot apply changes based on the comments in this thread |
…etadata presence 1. Inline confidence labels: raise threshold from ≥1 to ≥2 and scope to strict-v2 files only (aligns assertion with stated requirement) 2. Overall Confidence metadata: treat missing metadata as failure for strict-v2 files (not just invalid values), and add colon-inside-bold regex variant (**Confidence:**) to handle all real-world formats Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/7042da95-9849-4539-9636-667ae7596fcf Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied both changes from the review thread in ff72818:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| for (const date of dateDirs) { | ||
| const datePath = path.join(ANALYSIS_BASE, date); | ||
| const subdirs = fs.readdirSync(datePath, { withFileTypes: true }) | ||
| .filter(d => d.isDirectory()); | ||
|
|
||
| for (const sub of subdirs) { | ||
| const fullPath = path.join(datePath, sub.name); | ||
| const mdFiles = fs.readdirSync(fullPath) | ||
| .filter(f => f.endsWith('.md')); | ||
|
|
||
| if (mdFiles.length > 0) { | ||
| const files = new Map<string, string>(); | ||
| for (const f of mdFiles) { | ||
| files.set(f, path.join(fullPath, f)); | ||
| } | ||
|
|
||
| // Detect strict v2 format from synthesis or the first available file | ||
| const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0]; | ||
| const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : ''; | ||
| const isStrictV2 = isStrictV2Format(sampleContent); | ||
|
|
||
| const docsDir = path.join(fullPath, 'documents'); | ||
| const hasDocuments = fs.existsSync(docsDir) && | ||
| fs.readdirSync(docsDir).some(f => f.endsWith('.json')); | ||
|
|
||
| dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments }); | ||
| } |
There was a problem hiding this comment.
discoverAnalysisDirectories() only scans subdirectories under each YYYY-MM-DD folder (analysis/daily/YYYY-MM-DD/{articleType}/) and never considers Markdown files directly under the date directory itself. The repo currently has root-level daily analysis sets (e.g., analysis/daily/2026-04-01/synthesis-summary.md, risk-assessment.md, etc.), so those files are completely excluded from this validation suite. Update discovery to treat the date directory itself as an AnalysisDirectory when it contains .md files (and handle its documents/ folder), in addition to scanning its child subdirectories.
| for (const date of dateDirs) { | |
| const datePath = path.join(ANALYSIS_BASE, date); | |
| const subdirs = fs.readdirSync(datePath, { withFileTypes: true }) | |
| .filter(d => d.isDirectory()); | |
| for (const sub of subdirs) { | |
| const fullPath = path.join(datePath, sub.name); | |
| const mdFiles = fs.readdirSync(fullPath) | |
| .filter(f => f.endsWith('.md')); | |
| if (mdFiles.length > 0) { | |
| const files = new Map<string, string>(); | |
| for (const f of mdFiles) { | |
| files.set(f, path.join(fullPath, f)); | |
| } | |
| // Detect strict v2 format from synthesis or the first available file | |
| const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0]; | |
| const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : ''; | |
| const isStrictV2 = isStrictV2Format(sampleContent); | |
| const docsDir = path.join(fullPath, 'documents'); | |
| const hasDocuments = fs.existsSync(docsDir) && | |
| fs.readdirSync(docsDir).some(f => f.endsWith('.json')); | |
| dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments }); | |
| } | |
| const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => { | |
| const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true }) | |
| .filter(entry => entry.isFile() && entry.name.endsWith('.md')) | |
| .map(entry => entry.name); | |
| if (mdFiles.length === 0) return; | |
| const files = new Map<string, string>(); | |
| for (const f of mdFiles) { | |
| files.set(f, path.join(fullPath, f)); | |
| } | |
| // Detect strict v2 format from synthesis or the first available file | |
| const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0]; | |
| const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : ''; | |
| const isStrictV2 = isStrictV2Format(sampleContent); | |
| const docsDir = path.join(fullPath, 'documents'); | |
| const hasDocuments = fs.existsSync(docsDir) && | |
| fs.readdirSync(docsDir, { withFileTypes: true }) | |
| .some(entry => entry.isFile() && entry.name.endsWith('.json')); | |
| dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments }); | |
| }; | |
| for (const date of dateDirs) { | |
| const datePath = path.join(ANALYSIS_BASE, date); | |
| // Support root-level daily analysis sets stored directly under YYYY-MM-DD/ | |
| addAnalysisDirectory(datePath, date, date); | |
| const subdirs = fs.readdirSync(datePath, { withFileTypes: true }) | |
| .filter(d => d.isDirectory() && d.name !== 'documents'); | |
| for (const sub of subdirs) { | |
| const fullPath = path.join(datePath, sub.name); | |
| addAnalysisDirectory(fullPath, date, sub.name); |
| * - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`) | ||
| */ | ||
| function isStrictV2Format(content: string): boolean { | ||
| const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content); | ||
| const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content); |
There was a problem hiding this comment.
isStrictV2Format() requires IDs ending in -NNN (\d{3}), but the repository already contains v2-style table metadata with other structured suffixes (e.g., SYN-2026-04-02-CR01, SYN-2026-04-03-RT1018, SYN-2026-04-03-DI, SYN-2026-03-27-MOT). Those directories will be classified as non-strict and will skip most v2 checks, reducing the effectiveness of this quality gate. Either expand the accepted ID patterns to match current outputs or explicitly validate/flag these non-template IDs.
| * - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`) | |
| */ | |
| function isStrictV2Format(content: string): boolean { | |
| const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content); | |
| const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content); | |
| * - Must have a structured analysis ID with a known analysis prefix, ISO date, | |
| * and a repository-supported suffix (for example `SYN-2026-04-04-001`, | |
| * `SYN-2026-04-02-CR01`, `SYN-2026-04-03-RT1018`, `SYN-2026-04-03-DI`) | |
| */ | |
| function isStrictV2Format(content: string): boolean { | |
| const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content); | |
| const hasStructuredId = | |
| /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-(?:\d{3}|[A-Z]{2,}(?:\d{2,})?)\b/.test( | |
| content, | |
| ); |
| function hasAnyConfidenceIndicator(content: string): boolean { | ||
| return /\b(HIGH|MEDIUM|LOW)\b/i.test(content); |
There was a problem hiding this comment.
hasAnyConfidenceIndicator() matches any standalone HIGH/MEDIUM/LOW token anywhere in the file, which can be satisfied by unrelated content (e.g., Mermaid labels like "HIGH" or phrases like "high risk") even if the confidence metadata line is missing. This can let the 'synthesis files have confidence metadata' assertions pass without actually verifying required metadata. Prefer checking for the actual metadata key (e.g., Overall Confidence row/line) or reuse the existing Overall Confidence extractor for presence.
| function hasAnyConfidenceIndicator(content: string): boolean { | |
| return /\b(HIGH|MEDIUM|LOW)\b/i.test(content); | |
| function extractOverallConfidence(content: string): 'HIGH' | 'MEDIUM' | 'LOW' | null { | |
| // V2 table format: | **Overall Confidence** | HIGH | | |
| const tableMatch = /\*\*Overall Confidence\*\*\s*\|\s*(HIGH|MEDIUM|LOW)\b/i.exec(content); | |
| if (tableMatch?.[1]) return tableMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW'; | |
| // V1 paragraph format: **Overall Confidence**: HIGH | |
| const paraMatch = /\*\*Overall Confidence\*\*:\s*(HIGH|MEDIUM|LOW)\b/i.exec(content); | |
| if (paraMatch?.[1]) return paraMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW'; | |
| return null; | |
| } | |
| function hasAnyConfidenceIndicator(content: string): boolean { | |
| return extractOverallConfidence(content) !== null; |
| it('should have confidence metadata in all synthesis files', () => { | ||
| const failures: string[] = []; | ||
|
|
||
| for (const dir of synthesisDirectories) { | ||
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | ||
| if (!content) continue; | ||
| if (!hasAnyConfidenceIndicator(content)) { | ||
| failures.push( | ||
| `${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found` | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0); |
There was a problem hiding this comment.
The confidence check for synthesis summaries is duplicated: 'should have confidence metadata in all synthesis files' (Template Structure) and 'should have a confidence value in all synthesis summary files' (Confidence Labels) perform the same loop and predicate. Keeping both increases runtime and maintenance cost without adding coverage; consider removing one or differentiating their intent (e.g., one for Overall Confidence metadata presence/value, one for inline label coverage).
| it('should have confidence metadata in all synthesis files', () => { | |
| const failures: string[] = []; | |
| for (const dir of synthesisDirectories) { | |
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | |
| if (!content) continue; | |
| if (!hasAnyConfidenceIndicator(content)) { | |
| failures.push( | |
| `${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found` | |
| ); | |
| } | |
| } | |
| expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0); | |
| it('should have Overall Confidence metadata in all synthesis files', () => { | |
| const failures: string[] = []; | |
| for (const dir of synthesisDirectories) { | |
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | |
| if (!content) continue; | |
| // Template-structure check: require a dedicated confidence metadata field | |
| // with a non-empty value, rather than any inline confidence indicator. | |
| // Supports both documented formats: | |
| // - v1: **Confidence**: High / **Overall Confidence**: Medium | |
| // - v2: | Overall Confidence | High | | |
| const hasConfidenceMetadata = | |
| /\*\*(?:Overall\s+)?Confidence\*\*:\s*[^\n]+/i.test(content) || | |
| /\|\s*(?:📊\s*)?Overall Confidence\s*\|\s*[^|\n]+\|/i.test(content); | |
| if (!hasConfidenceMetadata) { | |
| failures.push( | |
| `${dir.date}/${dir.articleType}/synthesis-summary.md: missing Overall Confidence metadata field/value` | |
| ); | |
| } | |
| } | |
| expect( | |
| failures, | |
| `Synthesis files without Overall Confidence metadata:\n${failures.join('\n')}` | |
| ).toHaveLength(0); |
| it('should discover ≥5 analysis directories', () => { | ||
| expect(analysisDirs.length).toBeGreaterThanOrEqual(5); | ||
| }); | ||
|
|
||
| it('should have ≥3 synthesis summary files across all dates', () => { | ||
| expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3); | ||
| }); | ||
|
|
||
| it('should have analysis directories spanning multiple dates', () => { | ||
| const uniqueDates = new Set(analysisDirs.map(d => d.date)); | ||
| expect(uniqueDates.size).toBeGreaterThanOrEqual(2); |
There was a problem hiding this comment.
The 'Overall Quality' meta-tests assert hard minimums like ≥5 analysis directories, ≥2 unique dates, and ≥3 synthesis files. Given the retention policy in analysis/daily/README.md (older daily artifacts may be archived/removed), these thresholds are likely to become brittle and cause unrelated CI failures. Consider making these expectations conditional (e.g., only assert non-zero) or derive expectations from a committed fixture set rather than the entire repository history.
| it('should discover ≥5 analysis directories', () => { | |
| expect(analysisDirs.length).toBeGreaterThanOrEqual(5); | |
| }); | |
| it('should have ≥3 synthesis summary files across all dates', () => { | |
| expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3); | |
| }); | |
| it('should have analysis directories spanning multiple dates', () => { | |
| const uniqueDates = new Set(analysisDirs.map(d => d.date)); | |
| expect(uniqueDates.size).toBeGreaterThanOrEqual(2); | |
| it('should discover analysis directories', () => { | |
| expect(analysisDirs.length).toBeGreaterThan(0); | |
| }); | |
| it('should have synthesis summary files across retained dates', () => { | |
| expect(synthesisDirectories.length).toBeGreaterThan(0); | |
| }); | |
| it('should have analysis directories spanning retained dates when enough history exists', () => { | |
| const uniqueDates = new Set(analysisDirs.map(d => d.date)); | |
| if (analysisDirs.length > 1) { | |
| expect(uniqueDates.size).toBeGreaterThanOrEqual(2); | |
| return; | |
| } | |
| expect(uniqueDates.size).toBeGreaterThan(0); |
|
@copilot apply changes based on the comments in this thread |
…e confidence extraction, deduplicate tests, relax thresholds 1. discoverAnalysisDirectories: scan root-level .md files under date dirs 2. isStrictV2Format: accept all real-world ID suffixes (CR01, MOT, DI, etc.) 3. Replace loose hasAnyConfidenceIndicator with extractOverallConfidence for synthesis checks; keep loose check for threat-analysis files 4. Deduplicate confidence checks between Template Structure and Confidence Labels 5. Relax hard-coded minimums in Overall Quality meta-tests for retention stability Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…dence fallback, broaden threat analysis indicators - Extract STRUCTURED_ID_PATTERN as shared constant (single source of truth) - Improve hasAnyConfidenceIndicator to require context proximity for fallback - Broaden threat analysis check to accept severity/threat-level indicators - Accept 'Date' field variant in risk assessment metadata validation Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all 5 review comments across commits f194f85 and 570d045:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| /** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */ | ||
| const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [ | ||
| { | ||
| label: 'neutralText: "The political landscape remains fluid…"', | ||
| pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i, | ||
| }, | ||
| { | ||
| label: 'debateAnalysisMarker: "No chamber debate data is available…"', | ||
| pattern: /No chamber debate data is available for these items,? limiting our ability/i, | ||
| }, | ||
| { | ||
| label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"', | ||
| pattern: /Requires committee review and chamber debate/i, | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
ANALYSIS_BANNED_PATTERNS is a partial redefinition of the repo’s canonical banned patterns (BANNED_PATTERNS used by detectBannedPatterns()), so it can drift and currently misses patterns like "Touches on {domains}" and "Analysis of N documents covering". To keep the quality gate consistent, reuse detectBannedPatterns() (or import/share the canonical pattern list) rather than duplicating a subset here.
| /** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */ | |
| const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [ | |
| { | |
| label: 'neutralText: "The political landscape remains fluid…"', | |
| pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i, | |
| }, | |
| { | |
| label: 'debateAnalysisMarker: "No chamber debate data is available…"', | |
| pattern: /No chamber debate data is available for these items,? limiting our ability/i, | |
| }, | |
| { | |
| label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"', | |
| pattern: /Requires committee review and chamber debate/i, | |
| }, | |
| ]; | |
| /** Use the canonical banned-pattern detector to avoid policy drift in this test. */ | |
| function detectAnalysisBannedPatterns(content: string) { | |
| return detectBannedPatterns(content); | |
| } |
|
|
||
| for (const dir of strictV2SynthesisDirectories) { | ||
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | ||
| if (!content) continue; | ||
| const confidence = extractOverallConfidence(content); | ||
| if (!confidence) { | ||
| failures.push( | ||
| `${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)` | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0); |
There was a problem hiding this comment.
The two tests for strict-v2 synthesis confidence (should have a parseable... and should have a valid...) both assert the same condition (extractOverallConfidence(content) is truthy) and will fail with very similar messages. Consider removing one or consolidating them to avoid duplicated coverage and reduce test runtime/noise.
| for (const dir of strictV2SynthesisDirectories) { | |
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | |
| if (!content) continue; | |
| const confidence = extractOverallConfidence(content); | |
| if (!confidence) { | |
| failures.push( | |
| `${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)` | |
| ); | |
| } | |
| } | |
| expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0); | |
| const validConfidenceValues = new Set(['HIGH', 'MEDIUM', 'LOW']); | |
| for (const dir of strictV2SynthesisDirectories) { | |
| const content = readAnalysisFile(dir, 'synthesis-summary.md'); | |
| if (!content) continue; | |
| const confidence = extractOverallConfidence(content); | |
| const normalizedConfidence = confidence?.trim().toUpperCase(); | |
| if (!normalizedConfidence || !validConfidenceValues.has(normalizedConfidence)) { | |
| failures.push( | |
| `${dir.date}/${dir.articleType}/synthesis-summary.md: invalid Overall Confidence metadata "${confidence ?? 'missing'}" (expected HIGH, MEDIUM, or LOW)` | |
| ); | |
| } | |
| } | |
| expect(failures, `Strict-v2 synthesis files with invalid confidence values:\n${failures.join('\n')}`).toHaveLength(0); |
| const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => { | ||
| const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true }) | ||
| .filter(entry => entry.isFile() && entry.name.endsWith('.md')) | ||
| .map(entry => entry.name); | ||
|
|
||
| if (mdFiles.length === 0) return; | ||
|
|
||
| const files = new Map<string, string>(); | ||
| for (const f of mdFiles) { | ||
| files.set(f, path.join(fullPath, f)); | ||
| } | ||
|
|
||
| // Detect strict v2 format from synthesis or the first available file | ||
| const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0]; | ||
| const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : ''; | ||
| const isStrictV2 = isStrictV2Format(sampleContent); | ||
|
|
||
| const docsDir = path.join(fullPath, 'documents'); | ||
| const hasDocuments = fs.existsSync(docsDir) && | ||
| fs.readdirSync(docsDir, { withFileTypes: true }) | ||
| .some(entry => entry.isFile() && entry.name.endsWith('.json')); | ||
|
|
||
| dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments }); |
There was a problem hiding this comment.
This suite re-reads many of the same markdown files repeatedly across different tests (multiple nested loops calling fs.readFileSync), which will scale poorly as analysis/daily/ grows and can slow CI noticeably. Consider caching file contents once during discovery (e.g. store Map<filename, {path, content}> or a separate content cache keyed by full path) and reusing that in the assertions.
|
@copilot apply changes based on the comments in this thread and fix failures in TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request) |
… tests, add content cache, fix TS error 1. Import detectBannedPatterns() from shared.ts — canonical source of truth 2. Consolidate duplicate confidence tests into single combined assertion 3. Add file content cache to avoid repeated readFileSync across tests 4. Remove unused hasRequiredMetadata function (fixed TS6133 type-check error) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Applied all 3 review comments and fixed the CI failure in 53147a7 and a232153:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
Automated quality gate for analysis output files — validates template structure, evidence density, Mermaid diagrams, confidence labels, banned patterns, and cross-reference integrity across all
analysis/daily/directories.35 test cases across 7 categories
[REQUIRED]/[OPTIONAL]placeholders, structured IDsfill:#styling[HIGH]/[MEDIUM]/[LOW], L×I scoring, threat severity indicatorsdetectBannedPatterns()from shared.ts, unattributed claimsFormat-aware validation
Analysis files evolved from v1 (paragraph metadata) to v2 (table metadata + structured IDs like
SYN-2026-04-04-001). Strict-v2 checks only apply when both patterns are detected. TheSTRUCTURED_ID_PATTERNaccepts all real-world suffixes including numeric (001,1212), alphabetic (MOT,DI), and mixed (CR01,RT1018).Discovery
discoverAnalysisDirectories()scans both root-level.mdfiles directly underYYYY-MM-DD/directories and subdirectory-based analysis sets (YYYY-MM-DD/{articleType}/), ensuring full coverage of the repository's dual-level analysis layout.Confidence extraction
extractOverallConfidence()parses Overall Confidence metadata from v2 table format (including backtick-wrapped and bold-wrapped values), v1 paragraph format, and colon-inside-bold variants.hasAnyConfidenceIndicator()uses context-aware fallback requiring proximity of "confidence" and HIGH/MEDIUM/LOW tokens to reduce false positives. Threat analysis files additionally accept severity and threat-level indicators.Banned pattern detection
Uses the canonical
detectBannedPatterns()imported directly fromshared.tsrather than a duplicated pattern list, ensuring policy changes propagate automatically without drift.Performance
File contents are cached via a
_contentCacheMap keyed by full path, so each markdown file is read from disk at most once regardless of how many test assertions reference it.Retention-safe thresholds
Overall Quality meta-tests use conditional assertions (non-zero, conditional ≥2) instead of hard-coded minimums, avoiding brittleness as older analysis artifacts are archived per retention policy.
Tests use real filesystem scanning (no mocks), following the pattern from
workflow-architecture.test.ts. Every failure message identifies the specific file and requirement that failed.