Add comprehensive analysis quality validation test suite by Copilot · Pull Request #1573 · Hack23/riksdagsmonitor

Copilot · 2026-04-06T16:58:52Z

Automated quality gate for analysis output files — validates template structure, evidence density, Mermaid diagrams, confidence labels, banned patterns, and cross-reference integrity across all analysis/daily/ directories.

35 test cases across 7 categories

Template Structure (8): metadata fields, sibling file completeness, unfilled [REQUIRED]/[OPTIONAL] placeholders, structured IDs
Evidence Density (6): dok_id citation counts per file type, document count consistency
Mermaid Diagrams (5): presence in SWOT/risk/synthesis/cross-ref files, color-coded fill:# styling
Confidence Labels (4): combined metadata presence + validity check, inline [HIGH]/[MEDIUM]/[LOW], L×I scoring, threat severity indicators
Banned Patterns (3): canonical detectBannedPatterns() from shared.ts, unattributed claims
Cross-Reference Integrity (3): dok_id ↔ JSON file matching, document count consistency
Overall Quality (6): meta-validation of discovery coverage

Format-aware validation

Analysis files evolved from v1 (paragraph metadata) to v2 (table metadata + structured IDs like SYN-2026-04-04-001). Strict-v2 checks only apply when both patterns are detected. The STRUCTURED_ID_PATTERN accepts all real-world suffixes including numeric (001, 1212), alphabetic (MOT, DI), and mixed (CR01, RT1018).

Discovery

discoverAnalysisDirectories() scans both root-level .md files directly under YYYY-MM-DD/ directories and subdirectory-based analysis sets (YYYY-MM-DD/{articleType}/), ensuring full coverage of the repository's dual-level analysis layout.

Confidence extraction

extractOverallConfidence() parses Overall Confidence metadata from v2 table format (including backtick-wrapped and bold-wrapped values), v1 paragraph format, and colon-inside-bold variants. hasAnyConfidenceIndicator() uses context-aware fallback requiring proximity of "confidence" and HIGH/MEDIUM/LOW tokens to reduce false positives. Threat analysis files additionally accept severity and threat-level indicators.

Banned pattern detection

Uses the canonical detectBannedPatterns() imported directly from shared.ts rather than a duplicated pattern list, ensuring policy changes propagate automatically without drift.

Performance

File contents are cached via a _contentCache Map keyed by full path, so each markdown file is read from disk at most once regardless of how many test assertions reference it.

Retention-safe thresholds

Overall Quality meta-tests use conditional assertions (non-zero, conditional ≥2) instead of hard-coded minimums, avoiding brittleness as older analysis artifacts are archived per retention policy.

Tests use real filesystem scanning (no mocks), following the pattern from workflow-architecture.test.ts. Every failure message identifies the specific file and requirement that failed.

github-actions · 2026-04-06T16:59:33Z

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

🗳️ Content: news, dashboard, visualization, intelligence
💻 Technology: html-css, javascript, workflow, security
📊 Data: cia-data, riksdag-data, data-pipeline, schema
🌍 I18n: i18n, translation, rtl
🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
🔄 Quality: testing, accessibility, documentation, refactor
🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

github-actions · 2026-04-06T16:59:53Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Creates tests/analysis-quality-validation.test.ts with 36 test cases across 7 categories validating analysis output quality: - Template Structure (8 tests): metadata fields, sibling files, placeholders - Evidence Density (6 tests): dok_id citations, document counts - Mermaid Diagrams (5 tests): presence, color-coded styling - Confidence Labels (5 tests): metadata values, inline labels, L×I scoring - Banned Patterns (3 tests): boilerplate detection, unattributed claims - Cross-Reference Integrity (3 tests): dok_id matching, document consistency - Overall Quality (6 tests): meta-validation of test coverage Tests detect v1 (paragraph) vs v2 (table) format and apply appropriate validation rules. Strict-v2 checks require structured IDs (SYN-YYYY-MM-DD-NNN). Resolves issue requirements for automated quality gate. Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T17:18:29Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…e, stricter matching - Replace non-null assertions with optional chaining in extractDocumentsAnalyzedCount - Add null check for readAnalysisFile in confidence label verification - Limit "significant because" regex lookahead to 200 chars (prevent cross-paragraph false positives) - Add minimum 4-char length check for substring matching in cross-reference integrity Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T17:22:19Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot

Pull request overview

Adds a comprehensive Vitest-based quality gate that scans analysis/daily/YYYY-MM-DD/{articleType}/ directories on the real filesystem and validates analysis outputs against structural, evidence, diagram, confidence, banned-pattern, and cross-reference requirements, with format-aware rules for v1 vs strict-v2 analysis templates.

Changes:

Introduces discovery + helper utilities to classify analysis directories (strict-v2 detection, document presence) and parse key quality signals (dok_id citations, metadata, Mermaid blocks, confidence indicators).
Adds 36+ assertions across 7 categories to enforce template completeness, evidence density, Mermaid presence/styling, confidence labeling, banned patterns, cross-reference integrity, and meta “coverage” checks.
Provides detailed failure messages that pinpoint the exact directory/file and unmet requirement.

Copilot · 2026-04-06T18:01:35Z

tests/analysis-quality-validation.test.ts

+        const content = readAnalysisFile(dir, 'synthesis-summary.md');
+        if (!content) continue;
+        const count = countConfidenceLabels(content);
+        expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1);


The test description/comments say inline confidence labels should have ≥2 occurrences for coverage, but the assertion currently only enforces ≥1. Please align the expectation with the stated requirement (either update the threshold to 2 or adjust the wording so the test matches the intended gate).

Suggested change

expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1);

expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(2);

Copilot · 2026-04-06T18:01:36Z

tests/analysis-quality-validation.test.ts

+    it('should have a valid Overall Confidence value (HIGH, MEDIUM, or LOW) in synthesis files', () => {
+      const failures: string[] = [];
+
+      for (const dir of synthesisDirectories) {
+        const content = readAnalysisFile(dir, 'synthesis-summary.md');
+        if (!content) continue;
+        // Check both v2 table and v1 paragraph formats
+        const tableMatch = /\*\*(?:Overall\s+)?Confidence(?:\s+Level)?\*\*\s*\|\s*(.+)/i.exec(content);
+        const paraMatch = /\*\*(?:Overall\s+)?Confidence\*\*:\s*(.+)/i.exec(content);
+        const value = tableMatch?.[1]?.trim() ?? paraMatch?.[1]?.trim();
+        if (value && !/\b(HIGH|MEDIUM|LOW)\b/i.test(value)) {
+          failures.push(
+            `${dir.date}/${dir.articleType}/synthesis-summary.md: invalid confidence value "${value}"`
+          );
+        }
+      }


This test is named as if it validates that synthesis files have a valid Overall Confidence value, but the current logic only flags invalid values when a match exists. If the confidence metadata line is missing entirely (and the file still contains words like HIGH/MEDIUM/LOW elsewhere), this test will pass. Consider treating a missing match/value as a failure to ensure the metadata is actually present.

pethers · 2026-04-06T18:17:57Z

@copilot apply changes based on the comments in this thread

…etadata presence 1. Inline confidence labels: raise threshold from ≥1 to ≥2 and scope to strict-v2 files only (aligns assertion with stated requirement) 2. Overall Confidence metadata: treat missing metadata as failure for strict-v2 files (not just invalid values), and add colon-inside-bold regex variant (**Confidence:**) to handle all real-world formats Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/7042da95-9849-4539-9636-667ae7596fcf Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot · 2026-04-06T18:24:44Z

@copilot apply changes based on the comments in this thread

Applied both changes from the review thread in ff72818:

Inline confidence labels — raised threshold from ≥1 to ≥2, scoped to strict-v2 files only so older v1 files with a single label aren't false-flagged.
Overall Confidence metadata presence — missing metadata is now treated as a failure for strict-v2 synthesis files (not just invalid values). Also added a **Confidence:** colon-inside-bold regex variant to handle all real-world formats (e.g. **Overall Confidence:** HIGH).

github-actions · 2026-04-06T18:25:37Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-06T18:31:02Z

tests/analysis-quality-validation.test.ts

+  for (const date of dateDirs) {
+    const datePath = path.join(ANALYSIS_BASE, date);
+    const subdirs = fs.readdirSync(datePath, { withFileTypes: true })
+      .filter(d => d.isDirectory());
+
+    for (const sub of subdirs) {
+      const fullPath = path.join(datePath, sub.name);
+      const mdFiles = fs.readdirSync(fullPath)
+        .filter(f => f.endsWith('.md'));
+
+      if (mdFiles.length > 0) {
+        const files = new Map<string, string>();
+        for (const f of mdFiles) {
+          files.set(f, path.join(fullPath, f));
+        }
+
+        // Detect strict v2 format from synthesis or the first available file
+        const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
+        const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
+        const isStrictV2 = isStrictV2Format(sampleContent);
+
+        const docsDir = path.join(fullPath, 'documents');
+        const hasDocuments = fs.existsSync(docsDir) &&
+          fs.readdirSync(docsDir).some(f => f.endsWith('.json'));
+
+        dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments });
+      }


discoverAnalysisDirectories() only scans subdirectories under each YYYY-MM-DD folder (analysis/daily/YYYY-MM-DD/{articleType}/) and never considers Markdown files directly under the date directory itself. The repo currently has root-level daily analysis sets (e.g., analysis/daily/2026-04-01/synthesis-summary.md, risk-assessment.md, etc.), so those files are completely excluded from this validation suite. Update discovery to treat the date directory itself as an AnalysisDirectory when it contains .md files (and handle its documents/ folder), in addition to scanning its child subdirectories.

Suggested change

for (const date of dateDirs) {

const datePath = path.join(ANALYSIS_BASE, date);

const subdirs = fs.readdirSync(datePath, { withFileTypes: true })

.filter(d => d.isDirectory());

for (const sub of subdirs) {

const fullPath = path.join(datePath, sub.name);

const mdFiles = fs.readdirSync(fullPath)

.filter(f => f.endsWith('.md'));

if (mdFiles.length > 0) {

const files = new Map<string, string>();

for (const f of mdFiles) {

files.set(f, path.join(fullPath, f));

}

// Detect strict v2 format from synthesis or the first available file

const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];

const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';

const isStrictV2 = isStrictV2Format(sampleContent);

const docsDir = path.join(fullPath, 'documents');

const hasDocuments = fs.existsSync(docsDir) &&

fs.readdirSync(docsDir).some(f => f.endsWith('.json'));

dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments });

}

const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => {

const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true })

.filter(entry => entry.isFile() && entry.name.endsWith('.md'))

.map(entry => entry.name);

if (mdFiles.length === 0) return;

const files = new Map<string, string>();

for (const f of mdFiles) {

files.set(f, path.join(fullPath, f));

}

// Detect strict v2 format from synthesis or the first available file

const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];

const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';

const isStrictV2 = isStrictV2Format(sampleContent);

const docsDir = path.join(fullPath, 'documents');

const hasDocuments = fs.existsSync(docsDir) &&

fs.readdirSync(docsDir, { withFileTypes: true })

.some(entry => entry.isFile() && entry.name.endsWith('.json'));

dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments });

};

for (const date of dateDirs) {

const datePath = path.join(ANALYSIS_BASE, date);

// Support root-level daily analysis sets stored directly under YYYY-MM-DD/

addAnalysisDirectory(datePath, date, date);

const subdirs = fs.readdirSync(datePath, { withFileTypes: true })

.filter(d => d.isDirectory() && d.name !== 'documents');

for (const sub of subdirs) {

const fullPath = path.join(datePath, sub.name);

addAnalysisDirectory(fullPath, date, sub.name);

Copilot · 2026-04-06T18:31:03Z

tests/analysis-quality-validation.test.ts

+ * - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`)
+ */
+function isStrictV2Format(content: string): boolean {
+  const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);
+  const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content);


isStrictV2Format() requires IDs ending in -NNN (\d{3}), but the repository already contains v2-style table metadata with other structured suffixes (e.g., SYN-2026-04-02-CR01, SYN-2026-04-03-RT1018, SYN-2026-04-03-DI, SYN-2026-03-27-MOT). Those directories will be classified as non-strict and will skip most v2 checks, reducing the effectiveness of this quality gate. Either expand the accepted ID patterns to match current outputs or explicitly validate/flag these non-template IDs.

Suggested change

* - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`)

*/

function isStrictV2Format(content: string): boolean {

const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);

const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content);

* - Must have a structured analysis ID with a known analysis prefix, ISO date,

* and a repository-supported suffix (for example `SYN-2026-04-04-001`,

* `SYN-2026-04-02-CR01`, `SYN-2026-04-03-RT1018`, `SYN-2026-04-03-DI`)

*/

function isStrictV2Format(content: string): boolean {

const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);

const hasStructuredId =

/\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-(?:\d{3}|[A-Z]{2,}(?:\d{2,})?)\b/.test(

content,

);

Copilot · 2026-04-06T18:31:03Z

tests/analysis-quality-validation.test.ts

+function hasAnyConfidenceIndicator(content: string): boolean {
+  return /\b(HIGH|MEDIUM|LOW)\b/i.test(content);


hasAnyConfidenceIndicator() matches any standalone HIGH/MEDIUM/LOW token anywhere in the file, which can be satisfied by unrelated content (e.g., Mermaid labels like "HIGH" or phrases like "high risk") even if the confidence metadata line is missing. This can let the 'synthesis files have confidence metadata' assertions pass without actually verifying required metadata. Prefer checking for the actual metadata key (e.g., Overall Confidence row/line) or reuse the existing Overall Confidence extractor for presence.

Suggested change

function hasAnyConfidenceIndicator(content: string): boolean {

return /\b(HIGH|MEDIUM|LOW)\b/i.test(content);

function extractOverallConfidence(content: string): 'HIGH' | 'MEDIUM' | 'LOW' | null {

// V2 table format: | **Overall Confidence** | HIGH |

const tableMatch = /\*\*Overall Confidence\*\*\s*\|\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);

if (tableMatch?.[1]) return tableMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';

// V1 paragraph format: **Overall Confidence**: HIGH

const paraMatch = /\*\*Overall Confidence\*\*:\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);

if (paraMatch?.[1]) return paraMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';

return null;

}

function hasAnyConfidenceIndicator(content: string): boolean {

return extractOverallConfidence(content) !== null;

Copilot · 2026-04-06T18:31:04Z

tests/analysis-quality-validation.test.ts

+    it('should have confidence metadata in all synthesis files', () => {
+      const failures: string[] = [];
+
+      for (const dir of synthesisDirectories) {
+        const content = readAnalysisFile(dir, 'synthesis-summary.md');
+        if (!content) continue;
+        if (!hasAnyConfidenceIndicator(content)) {
+          failures.push(
+            `${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found`
+          );
+        }
+      }
+
+      expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0);


The confidence check for synthesis summaries is duplicated: 'should have confidence metadata in all synthesis files' (Template Structure) and 'should have a confidence value in all synthesis summary files' (Confidence Labels) perform the same loop and predicate. Keeping both increases runtime and maintenance cost without adding coverage; consider removing one or differentiating their intent (e.g., one for Overall Confidence metadata presence/value, one for inline label coverage).

Suggested change

it('should have confidence metadata in all synthesis files', () => {

const failures: string[] = [];

for (const dir of synthesisDirectories) {

const content = readAnalysisFile(dir, 'synthesis-summary.md');

if (!content) continue;

if (!hasAnyConfidenceIndicator(content)) {

failures.push(

`${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found`

);

}

}

expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0);

it('should have Overall Confidence metadata in all synthesis files', () => {

const failures: string[] = [];

for (const dir of synthesisDirectories) {

const content = readAnalysisFile(dir, 'synthesis-summary.md');

if (!content) continue;

// Template-structure check: require a dedicated confidence metadata field

// with a non-empty value, rather than any inline confidence indicator.

// Supports both documented formats:

// - v1: **Confidence**: High / **Overall Confidence**: Medium

// - v2: | Overall Confidence | High |

const hasConfidenceMetadata =

/\*\*(?:Overall\s+)?Confidence\*\*:\s*[^\n]+/i.test(content) ||

/\|\s*(?:📊\s*)?Overall Confidence\s*\|\s*[^|\n]+\|/i.test(content);

if (!hasConfidenceMetadata) {

failures.push(

`${dir.date}/${dir.articleType}/synthesis-summary.md: missing Overall Confidence metadata field/value`

);

}

}

expect(

failures,

`Synthesis files without Overall Confidence metadata:\n${failures.join('\n')}`

).toHaveLength(0);

Copilot · 2026-04-06T18:31:04Z

tests/analysis-quality-validation.test.ts

+    it('should discover ≥5 analysis directories', () => {
+      expect(analysisDirs.length).toBeGreaterThanOrEqual(5);
+    });
+
+    it('should have ≥3 synthesis summary files across all dates', () => {
+      expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3);
+    });
+
+    it('should have analysis directories spanning multiple dates', () => {
+      const uniqueDates = new Set(analysisDirs.map(d => d.date));
+      expect(uniqueDates.size).toBeGreaterThanOrEqual(2);


The 'Overall Quality' meta-tests assert hard minimums like ≥5 analysis directories, ≥2 unique dates, and ≥3 synthesis files. Given the retention policy in analysis/daily/README.md (older daily artifacts may be archived/removed), these thresholds are likely to become brittle and cause unrelated CI failures. Consider making these expectations conditional (e.g., only assert non-zero) or derive expectations from a committed fixture set rather than the entire repository history.

Suggested change

it('should discover ≥5 analysis directories', () => {

expect(analysisDirs.length).toBeGreaterThanOrEqual(5);

});

it('should have ≥3 synthesis summary files across all dates', () => {

expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3);

});

it('should have analysis directories spanning multiple dates', () => {

const uniqueDates = new Set(analysisDirs.map(d => d.date));

expect(uniqueDates.size).toBeGreaterThanOrEqual(2);

it('should discover analysis directories', () => {

expect(analysisDirs.length).toBeGreaterThan(0);

});

it('should have synthesis summary files across retained dates', () => {

expect(synthesisDirectories.length).toBeGreaterThan(0);

});

it('should have analysis directories spanning retained dates when enough history exists', () => {

const uniqueDates = new Set(analysisDirs.map(d => d.date));

if (analysisDirs.length > 1) {

expect(uniqueDates.size).toBeGreaterThanOrEqual(2);

return;

}

expect(uniqueDates.size).toBeGreaterThan(0);

pethers · 2026-04-06T18:35:05Z

@copilot apply changes based on the comments in this thread

…e confidence extraction, deduplicate tests, relax thresholds 1. discoverAnalysisDirectories: scan root-level .md files under date dirs 2. isStrictV2Format: accept all real-world ID suffixes (CR01, MOT, DI, etc.) 3. Replace loose hasAnyConfidenceIndicator with extractOverallConfidence for synthesis checks; keep loose check for threat-analysis files 4. Deduplicate confidence checks between Template Structure and Confidence Labels 5. Relax hard-coded minimums in Overall Quality meta-tests for retention stability Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T18:45:29Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

tests/analysis-quality-validation.test.ts

…dence fallback, broaden threat analysis indicators - Extract STRUCTURED_ID_PATTERN as shared constant (single source of truth) - Improve hasAnyConfidenceIndicator to require context proximity for fallback - Broaden threat analysis check to accept severity/threat-level indicators - Accept 'Date' field variant in risk assessment metadata validation Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot · 2026-04-06T18:50:04Z

@copilot apply changes based on the comments in this thread

Applied all 5 review comments across commits f194f85 and 570d045:

Discovery expansion — discoverAnalysisDirectories() now scans root-level .md files under date directories using a shared addAnalysisDirectory() helper
Widened ID regex — STRUCTURED_ID_PATTERN (extracted as shared constant) accepts CR01, MOT, DI, RT1018, PROP suffixes in addition to 3-4 digit NNN
Confidence extraction — extractOverallConfidence() handles backtick-wrapped (`HIGH`), bold-wrapped (**MEDIUM**), and colon-inside-bold formats; hasAnyConfidenceIndicator() requires context proximity for fallback
Deduplicated confidence — Template Structure checks metadata field presence; Confidence Labels validates parseable value — differentiated intent
Relaxed thresholds — Overall Quality meta-tests use conditional assertions that survive archive rotation

github-actions · 2026-04-06T18:50:44Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

tests/analysis-quality-validation.test.ts

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-06T19:10:03Z

tests/analysis-quality-validation.test.ts

+/** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */
+const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [
+  {
+    label: 'neutralText: "The political landscape remains fluid…"',
+    pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i,
+  },
+  {
+    label: 'debateAnalysisMarker: "No chamber debate data is available…"',
+    pattern: /No chamber debate data is available for these items,? limiting our ability/i,
+  },
+  {
+    label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"',
+    pattern: /Requires committee review and chamber debate/i,
+  },
+];


ANALYSIS_BANNED_PATTERNS is a partial redefinition of the repo’s canonical banned patterns (BANNED_PATTERNS used by detectBannedPatterns()), so it can drift and currently misses patterns like "Touches on {domains}" and "Analysis of N documents covering". To keep the quality gate consistent, reuse detectBannedPatterns() (or import/share the canonical pattern list) rather than duplicating a subset here.

Suggested change

/** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */

const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [

{

label: 'neutralText: "The political landscape remains fluid…"',

pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i,

},

{

label: 'debateAnalysisMarker: "No chamber debate data is available…"',

pattern: /No chamber debate data is available for these items,? limiting our ability/i,

},

{

label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"',

pattern: /Requires committee review and chamber debate/i,

},

];

/** Use the canonical banned-pattern detector to avoid policy drift in this test. */

function detectAnalysisBannedPatterns(content: string) {

return detectBannedPatterns(content);

}

Copilot · 2026-04-06T19:10:03Z

tests/analysis-quality-validation.test.ts

+
+      for (const dir of strictV2SynthesisDirectories) {
+        const content = readAnalysisFile(dir, 'synthesis-summary.md');
+        if (!content) continue;
+        const confidence = extractOverallConfidence(content);
+        if (!confidence) {
+          failures.push(
+            `${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)`
+          );
+        }
+      }
+
+      expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0);


The two tests for strict-v2 synthesis confidence (should have a parseable... and should have a valid...) both assert the same condition (extractOverallConfidence(content) is truthy) and will fail with very similar messages. Consider removing one or consolidating them to avoid duplicated coverage and reduce test runtime/noise.

Suggested change

for (const dir of strictV2SynthesisDirectories) {

const content = readAnalysisFile(dir, 'synthesis-summary.md');

if (!content) continue;

const confidence = extractOverallConfidence(content);

if (!confidence) {

failures.push(

`${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)`

);

}

}

expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0);

const validConfidenceValues = new Set(['HIGH', 'MEDIUM', 'LOW']);

for (const dir of strictV2SynthesisDirectories) {

const content = readAnalysisFile(dir, 'synthesis-summary.md');

if (!content) continue;

const confidence = extractOverallConfidence(content);

const normalizedConfidence = confidence?.trim().toUpperCase();

if (!normalizedConfidence || !validConfidenceValues.has(normalizedConfidence)) {

failures.push(

`${dir.date}/${dir.articleType}/synthesis-summary.md: invalid Overall Confidence metadata "${confidence ?? 'missing'}" (expected HIGH, MEDIUM, or LOW)`

);

}

}

expect(failures, `Strict-v2 synthesis files with invalid confidence values:\n${failures.join('\n')}`).toHaveLength(0);

Copilot · 2026-04-06T19:10:04Z

tests/analysis-quality-validation.test.ts

+  const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => {
+    const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true })
+      .filter(entry => entry.isFile() && entry.name.endsWith('.md'))
+      .map(entry => entry.name);
+
+    if (mdFiles.length === 0) return;
+
+    const files = new Map<string, string>();
+    for (const f of mdFiles) {
+      files.set(f, path.join(fullPath, f));
+    }
+
+    // Detect strict v2 format from synthesis or the first available file
+    const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
+    const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
+    const isStrictV2 = isStrictV2Format(sampleContent);
+
+    const docsDir = path.join(fullPath, 'documents');
+    const hasDocuments = fs.existsSync(docsDir) &&
+      fs.readdirSync(docsDir, { withFileTypes: true })
+        .some(entry => entry.isFile() && entry.name.endsWith('.json'));
+
+    dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments });


This suite re-reads many of the same markdown files repeatedly across different tests (multiple nested loops calling fs.readFileSync), which will scale poorly as analysis/daily/ grows and can slow CI noticeably. Consider caching file contents once during discovery (e.g. store Map<filename, {path, content}> or a separate content cache keyed by full path) and reusing that in the assertions.

pethers · 2026-04-06T19:19:13Z

@copilot apply changes based on the comments in this thread and fix failures in TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)
TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)Failing after 41s

… tests, add content cache, fix TS error 1. Import detectBannedPatterns() from shared.ts — canonical source of truth 2. Consolidate duplicate confidence tests into single combined assertion 3. Add file content cache to avoid repeated readFileSync across tests 4. Remove unused hasRequiredMetadata function (fixed TS6133 type-check error) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T19:27:57Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot · 2026-04-06T19:30:49Z

@copilot apply changes based on the comments in this thread and fix failures in [TypeScript & JavaSc...

Applied all 3 review comments and fixed the CI failure in 53147a7 and a232153:

Banned patterns — replaced ANALYSIS_BANNED_PATTERNS with direct import of detectBannedPatterns() from shared.ts, ensuring policy changes propagate automatically.
Duplicate confidence tests — consolidated into a single validConfidenceValues check that covers both presence and validity (35 tests, down from 36).
Content cache — added _contentCache Map keyed by file path, used by readAnalysisFile() and new readCachedFile() helper across all test loops.
CI fix — removed unused hasRequiredMetadata function that caused TS6133 error in the "TypeScript type-check (scripts)" step.

github-actions · 2026-04-06T19:31:27Z

🔍 Lighthouse Performance Audit

Category	Score	Status
Performance	85/100	🟡
Accessibility	95/100	🟢
Best Practices	90/100	🟢
SEO	95/100	🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Initial plan

7a68bd1

Copilot AI assigned Copilot and pethers Apr 6, 2026

Copilot started work on behalf of pethers April 6, 2026 16:58 View session

Copilot AI linked an issue Apr 6, 2026 that may be closed by this pull request

Add comprehensive analysis quality validation test suite #1556

Closed

8 tasks

github-actions bot added the size-xs Extra small change (< 10 lines) label Apr 6, 2026

github-actions bot added testing Test coverage size-l Large change (250-1000 lines) labels Apr 6, 2026

Copilot AI changed the title ~~[WIP] Add comprehensive analysis quality validation test suite~~ Add comprehensive analysis quality validation test suite Apr 6, 2026

Copilot finished work on behalf of pethers April 6, 2026 17:23

Copilot AI requested a review from pethers April 6, 2026 17:23

pethers marked this pull request as ready for review April 6, 2026 17:56

Copilot AI review requested due to automatic review settings April 6, 2026 17:56

Copilot started reviewing on behalf of pethers April 6, 2026 17:59 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of pethers April 6, 2026 18:18 View session

Copilot finished work on behalf of pethers April 6, 2026 18:25

pethers requested a review from Copilot April 6, 2026 18:26

Copilot started reviewing on behalf of pethers April 6, 2026 18:27 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of pethers April 6, 2026 18:35 View session

github-code-quality bot found potential problems Apr 6, 2026

View reviewed changes

tests/analysis-quality-validation.test.ts Fixed Show fixed Hide fixed

Copilot finished work on behalf of pethers April 6, 2026 18:51

github-code-quality bot found potential problems Apr 6, 2026

View reviewed changes

tests/analysis-quality-validation.test.ts Fixed Show fixed Hide fixed

pethers requested a review from Copilot April 6, 2026 19:05

Copilot started reviewing on behalf of pethers April 6, 2026 19:07 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of pethers April 6, 2026 19:19 View session

Remove unnecessary wrapper, call detectBannedPatterns directly

a232153

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot finished work on behalf of pethers April 6, 2026 19:31

pethers requested a review from Copilot April 6, 2026 19:32

Copilot started reviewing on behalf of pethers April 6, 2026 19:33 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

pethers merged commit 68a0fba into main Apr 6, 2026
21 checks passed

pethers deleted the copilot/add-analysis-quality-validation-tests branch April 6, 2026 19:44

	expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1);
	expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(2);

		function hasAnyConfidenceIndicator(content: string): boolean {
		return /\b(HIGH\|MEDIUM\|LOW)\b/i.test(content);

-function hasAnyConfidenceIndicator(content: string): boolean {
-  return /\b(HIGH|MEDIUM|LOW)\b/i.test(content);
+function extractOverallConfidence(content: string): 'HIGH' | 'MEDIUM' | 'LOW' | null {
+  // V2 table format: | **Overall Confidence** | HIGH |
+  const tableMatch = /\*\*Overall Confidence\*\*\s*\|\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);
+  if (tableMatch?.[1]) return tableMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';
+  // V1 paragraph format: **Overall Confidence**: HIGH
+  const paraMatch = /\*\*Overall Confidence\*\*:\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);
+  if (paraMatch?.[1]) return paraMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';
+  return null;
+}
+function hasAnyConfidenceIndicator(content: string): boolean {
+  return extractOverallConfidence(content) !== null;

Conversation

Copilot AI commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

35 test cases across 7 categories

Format-aware validation

Discovery

Confidence extraction

Banned pattern detection

Performance

Retention-safe thresholds

Uh oh!

github-actions bot commented Apr 6, 2026

🏷️ Automatic Labeling Summary

Label Categories

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pethers commented Apr 6, 2026

Uh oh!

Copilot AI commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pethers commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Uh oh!

Copilot AI commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Copilot AI commented Apr 6, 2026 •

edited

Loading