Skip to content

Add comprehensive analysis quality validation test suite#1573

Merged
pethers merged 8 commits intomainfrom
copilot/add-analysis-quality-validation-tests
Apr 6, 2026
Merged

Add comprehensive analysis quality validation test suite#1573
pethers merged 8 commits intomainfrom
copilot/add-analysis-quality-validation-tests

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

Automated quality gate for analysis output files — validates template structure, evidence density, Mermaid diagrams, confidence labels, banned patterns, and cross-reference integrity across all analysis/daily/ directories.

35 test cases across 7 categories

  • Template Structure (8): metadata fields, sibling file completeness, unfilled [REQUIRED]/[OPTIONAL] placeholders, structured IDs
  • Evidence Density (6): dok_id citation counts per file type, document count consistency
  • Mermaid Diagrams (5): presence in SWOT/risk/synthesis/cross-ref files, color-coded fill:# styling
  • Confidence Labels (4): combined metadata presence + validity check, inline [HIGH]/[MEDIUM]/[LOW], L×I scoring, threat severity indicators
  • Banned Patterns (3): canonical detectBannedPatterns() from shared.ts, unattributed claims
  • Cross-Reference Integrity (3): dok_id ↔ JSON file matching, document count consistency
  • Overall Quality (6): meta-validation of discovery coverage

Format-aware validation

Analysis files evolved from v1 (paragraph metadata) to v2 (table metadata + structured IDs like SYN-2026-04-04-001). Strict-v2 checks only apply when both patterns are detected. The STRUCTURED_ID_PATTERN accepts all real-world suffixes including numeric (001, 1212), alphabetic (MOT, DI), and mixed (CR01, RT1018).

Discovery

discoverAnalysisDirectories() scans both root-level .md files directly under YYYY-MM-DD/ directories and subdirectory-based analysis sets (YYYY-MM-DD/{articleType}/), ensuring full coverage of the repository's dual-level analysis layout.

Confidence extraction

extractOverallConfidence() parses Overall Confidence metadata from v2 table format (including backtick-wrapped and bold-wrapped values), v1 paragraph format, and colon-inside-bold variants. hasAnyConfidenceIndicator() uses context-aware fallback requiring proximity of "confidence" and HIGH/MEDIUM/LOW tokens to reduce false positives. Threat analysis files additionally accept severity and threat-level indicators.

Banned pattern detection

Uses the canonical detectBannedPatterns() imported directly from shared.ts rather than a duplicated pattern list, ensuring policy changes propagate automatically without drift.

Performance

File contents are cached via a _contentCache Map keyed by full path, so each markdown file is read from disk at most once regardless of how many test assertions reference it.

Retention-safe thresholds

Overall Quality meta-tests use conditional assertions (non-zero, conditional ≥2) instead of hard-coded minimums, avoiding brittleness as older analysis artifacts are archived per retention policy.

Tests use real filesystem scanning (no mocks), following the pattern from workflow-architecture.test.ts. Every failure message identifies the specific file and requirement that failed.

Copilot AI linked an issue Apr 6, 2026 that may be closed by this pull request
8 tasks
@github-actions github-actions bot added the size-xs Extra small change (< 10 lines) label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Creates tests/analysis-quality-validation.test.ts with 36 test cases across
7 categories validating analysis output quality:
- Template Structure (8 tests): metadata fields, sibling files, placeholders
- Evidence Density (6 tests): dok_id citations, document counts
- Mermaid Diagrams (5 tests): presence, color-coded styling
- Confidence Labels (5 tests): metadata values, inline labels, L×I scoring
- Banned Patterns (3 tests): boilerplate detection, unattributed claims
- Cross-Reference Integrity (3 tests): dok_id matching, document consistency
- Overall Quality (6 tests): meta-validation of test coverage

Tests detect v1 (paragraph) vs v2 (table) format and apply appropriate
validation rules. Strict-v2 checks require structured IDs (SYN-YYYY-MM-DD-NNN).

Resolves issue requirements for automated quality gate.

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions bot added testing Test coverage size-l Large change (250-1000 lines) labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…e, stricter matching

- Replace non-null assertions with optional chaining in extractDocumentsAnalyzedCount
- Add null check for readAnalysisFile in confidence label verification
- Limit "significant because" regex lookahead to 200 chars (prevent cross-paragraph false positives)
- Add minimum 4-char length check for substring matching in cross-reference integrity

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/216511c2-9aa7-42b9-8a88-37b7df7a1536

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI changed the title [WIP] Add comprehensive analysis quality validation test suite Add comprehensive analysis quality validation test suite Apr 6, 2026
Copilot AI requested a review from pethers April 6, 2026 17:23
@pethers pethers marked this pull request as ready for review April 6, 2026 17:56
Copilot AI review requested due to automatic review settings April 6, 2026 17:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a comprehensive Vitest-based quality gate that scans analysis/daily/YYYY-MM-DD/{articleType}/ directories on the real filesystem and validates analysis outputs against structural, evidence, diagram, confidence, banned-pattern, and cross-reference requirements, with format-aware rules for v1 vs strict-v2 analysis templates.

Changes:

  • Introduces discovery + helper utilities to classify analysis directories (strict-v2 detection, document presence) and parse key quality signals (dok_id citations, metadata, Mermaid blocks, confidence indicators).
  • Adds 36+ assertions across 7 categories to enforce template completeness, evidence density, Mermaid presence/styling, confidence labeling, banned patterns, cross-reference integrity, and meta “coverage” checks.
  • Provides detailed failure messages that pinpoint the exact directory/file and unmet requirement.

const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
const count = countConfidenceLabels(content);
expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test description/comments say inline confidence labels should have ≥2 occurrences for coverage, but the assertion currently only enforces ≥1. Please align the expectation with the stated requirement (either update the threshold to 2 or adjust the wording so the test matches the intended gate).

Suggested change
expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(1);
expect(count, `${dir.date}/${dir.articleType}: inline confidence label count`).toBeGreaterThanOrEqual(2);

Copilot uses AI. Check for mistakes.
Comment on lines +627 to +642
it('should have a valid Overall Confidence value (HIGH, MEDIUM, or LOW) in synthesis files', () => {
const failures: string[] = [];

for (const dir of synthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
// Check both v2 table and v1 paragraph formats
const tableMatch = /\*\*(?:Overall\s+)?Confidence(?:\s+Level)?\*\*\s*\|\s*(.+)/i.exec(content);
const paraMatch = /\*\*(?:Overall\s+)?Confidence\*\*:\s*(.+)/i.exec(content);
const value = tableMatch?.[1]?.trim() ?? paraMatch?.[1]?.trim();
if (value && !/\b(HIGH|MEDIUM|LOW)\b/i.test(value)) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: invalid confidence value "${value}"`
);
}
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is named as if it validates that synthesis files have a valid Overall Confidence value, but the current logic only flags invalid values when a match exists. If the confidence metadata line is missing entirely (and the file still contains words like HIGH/MEDIUM/LOW elsewhere), this test will pass. Consider treating a missing match/value as a failure to ensure the metadata is actually present.

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 6, 2026

@copilot apply changes based on the comments in this thread

…etadata presence

1. Inline confidence labels: raise threshold from ≥1 to ≥2 and scope to
   strict-v2 files only (aligns assertion with stated requirement)
2. Overall Confidence metadata: treat missing metadata as failure for
   strict-v2 files (not just invalid values), and add colon-inside-bold
   regex variant (**Confidence:**) to handle all real-world formats

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/7042da95-9849-4539-9636-667ae7596fcf

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot apply changes based on the comments in this thread

Applied both changes from the review thread in ff72818:

  1. Inline confidence labels — raised threshold from ≥1 to ≥2, scoped to strict-v2 files only so older v1 files with a single label aren't false-flagged.
  2. Overall Confidence metadata presence — missing metadata is now treated as a failure for strict-v2 synthesis files (not just invalid values). Also added a **Confidence:** colon-inside-bold regex variant to handle all real-world formats (e.g. **Overall Confidence:** HIGH).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

Comment on lines +135 to +161
for (const date of dateDirs) {
const datePath = path.join(ANALYSIS_BASE, date);
const subdirs = fs.readdirSync(datePath, { withFileTypes: true })
.filter(d => d.isDirectory());

for (const sub of subdirs) {
const fullPath = path.join(datePath, sub.name);
const mdFiles = fs.readdirSync(fullPath)
.filter(f => f.endsWith('.md'));

if (mdFiles.length > 0) {
const files = new Map<string, string>();
for (const f of mdFiles) {
files.set(f, path.join(fullPath, f));
}

// Detect strict v2 format from synthesis or the first available file
const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
const isStrictV2 = isStrictV2Format(sampleContent);

const docsDir = path.join(fullPath, 'documents');
const hasDocuments = fs.existsSync(docsDir) &&
fs.readdirSync(docsDir).some(f => f.endsWith('.json'));

dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments });
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discoverAnalysisDirectories() only scans subdirectories under each YYYY-MM-DD folder (analysis/daily/YYYY-MM-DD/{articleType}/) and never considers Markdown files directly under the date directory itself. The repo currently has root-level daily analysis sets (e.g., analysis/daily/2026-04-01/synthesis-summary.md, risk-assessment.md, etc.), so those files are completely excluded from this validation suite. Update discovery to treat the date directory itself as an AnalysisDirectory when it contains .md files (and handle its documents/ folder), in addition to scanning its child subdirectories.

Suggested change
for (const date of dateDirs) {
const datePath = path.join(ANALYSIS_BASE, date);
const subdirs = fs.readdirSync(datePath, { withFileTypes: true })
.filter(d => d.isDirectory());
for (const sub of subdirs) {
const fullPath = path.join(datePath, sub.name);
const mdFiles = fs.readdirSync(fullPath)
.filter(f => f.endsWith('.md'));
if (mdFiles.length > 0) {
const files = new Map<string, string>();
for (const f of mdFiles) {
files.set(f, path.join(fullPath, f));
}
// Detect strict v2 format from synthesis or the first available file
const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
const isStrictV2 = isStrictV2Format(sampleContent);
const docsDir = path.join(fullPath, 'documents');
const hasDocuments = fs.existsSync(docsDir) &&
fs.readdirSync(docsDir).some(f => f.endsWith('.json'));
dirs.push({ fullPath, date, articleType: sub.name, files, isStrictV2, hasDocuments });
}
const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => {
const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true })
.filter(entry => entry.isFile() && entry.name.endsWith('.md'))
.map(entry => entry.name);
if (mdFiles.length === 0) return;
const files = new Map<string, string>();
for (const f of mdFiles) {
files.set(f, path.join(fullPath, f));
}
// Detect strict v2 format from synthesis or the first available file
const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
const isStrictV2 = isStrictV2Format(sampleContent);
const docsDir = path.join(fullPath, 'documents');
const hasDocuments = fs.existsSync(docsDir) &&
fs.readdirSync(docsDir, { withFileTypes: true })
.some(entry => entry.isFile() && entry.name.endsWith('.json'));
dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments });
};
for (const date of dateDirs) {
const datePath = path.join(ANALYSIS_BASE, date);
// Support root-level daily analysis sets stored directly under YYYY-MM-DD/
addAnalysisDirectory(datePath, date, date);
const subdirs = fs.readdirSync(datePath, { withFileTypes: true })
.filter(d => d.isDirectory() && d.name !== 'documents');
for (const sub of subdirs) {
const fullPath = path.join(datePath, sub.name);
addAnalysisDirectory(fullPath, date, sub.name);

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +102
* - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`)
*/
function isStrictV2Format(content: string): boolean {
const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);
const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isStrictV2Format() requires IDs ending in -NNN (\d{3}), but the repository already contains v2-style table metadata with other structured suffixes (e.g., SYN-2026-04-02-CR01, SYN-2026-04-03-RT1018, SYN-2026-04-03-DI, SYN-2026-03-27-MOT). Those directories will be classified as non-strict and will skip most v2 checks, reducing the effectiveness of this quality gate. Either expand the accepted ID patterns to match current outputs or explicitly validate/flag these non-template IDs.

Suggested change
* - Must have a structured analysis ID (e.g., `SYN-2026-04-04-001`, `RSK-...`)
*/
function isStrictV2Format(content: string): boolean {
const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);
const hasStructuredId = /\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-\d{3}\b/.test(content);
* - Must have a structured analysis ID with a known analysis prefix, ISO date,
* and a repository-supported suffix (for example `SYN-2026-04-04-001`,
* `SYN-2026-04-02-CR01`, `SYN-2026-04-03-RT1018`, `SYN-2026-04-03-DI`)
*/
function isStrictV2Format(content: string): boolean {
const hasTableMetadata = /\|\s*\*\*\w+.*\*\*\s*\|/.test(content);
const hasStructuredId =
/\b(?:SYN|RSK|SWT|THR|STK|SIG|CLS|XRF|DDM)-\d{4}-\d{2}-\d{2}-(?:\d{3}|[A-Z]{2,}(?:\d{2,})?)\b/.test(
content,
);

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +219
function hasAnyConfidenceIndicator(content: string): boolean {
return /\b(HIGH|MEDIUM|LOW)\b/i.test(content);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasAnyConfidenceIndicator() matches any standalone HIGH/MEDIUM/LOW token anywhere in the file, which can be satisfied by unrelated content (e.g., Mermaid labels like "HIGH" or phrases like "high risk") even if the confidence metadata line is missing. This can let the 'synthesis files have confidence metadata' assertions pass without actually verifying required metadata. Prefer checking for the actual metadata key (e.g., Overall Confidence row/line) or reuse the existing Overall Confidence extractor for presence.

Suggested change
function hasAnyConfidenceIndicator(content: string): boolean {
return /\b(HIGH|MEDIUM|LOW)\b/i.test(content);
function extractOverallConfidence(content: string): 'HIGH' | 'MEDIUM' | 'LOW' | null {
// V2 table format: | **Overall Confidence** | HIGH |
const tableMatch = /\*\*Overall Confidence\*\*\s*\|\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);
if (tableMatch?.[1]) return tableMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';
// V1 paragraph format: **Overall Confidence**: HIGH
const paraMatch = /\*\*Overall Confidence\*\*:\s*(HIGH|MEDIUM|LOW)\b/i.exec(content);
if (paraMatch?.[1]) return paraMatch[1].toUpperCase() as 'HIGH' | 'MEDIUM' | 'LOW';
return null;
}
function hasAnyConfidenceIndicator(content: string): boolean {
return extractOverallConfidence(content) !== null;

Copilot uses AI. Check for mistakes.
Comment on lines +261 to +274
it('should have confidence metadata in all synthesis files', () => {
const failures: string[] = [];

for (const dir of synthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
if (!hasAnyConfidenceIndicator(content)) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found`
);
}
}

expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confidence check for synthesis summaries is duplicated: 'should have confidence metadata in all synthesis files' (Template Structure) and 'should have a confidence value in all synthesis summary files' (Confidence Labels) perform the same loop and predicate. Keeping both increases runtime and maintenance cost without adding coverage; consider removing one or differentiating their intent (e.g., one for Overall Confidence metadata presence/value, one for inline label coverage).

Suggested change
it('should have confidence metadata in all synthesis files', () => {
const failures: string[] = [];
for (const dir of synthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
if (!hasAnyConfidenceIndicator(content)) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: no confidence indicator found`
);
}
}
expect(failures, `Synthesis files without confidence:\n${failures.join('\n')}`).toHaveLength(0);
it('should have Overall Confidence metadata in all synthesis files', () => {
const failures: string[] = [];
for (const dir of synthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
// Template-structure check: require a dedicated confidence metadata field
// with a non-empty value, rather than any inline confidence indicator.
// Supports both documented formats:
// - v1: **Confidence**: High / **Overall Confidence**: Medium
// - v2: | Overall Confidence | High |
const hasConfidenceMetadata =
/\*\*(?:Overall\s+)?Confidence\*\*:\s*[^\n]+/i.test(content) ||
/\|\s*(?:📊\s*)?Overall Confidence\s*\|\s*[^|\n]+\|/i.test(content);
if (!hasConfidenceMetadata) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: missing Overall Confidence metadata field/value`
);
}
}
expect(
failures,
`Synthesis files without Overall Confidence metadata:\n${failures.join('\n')}`
).toHaveLength(0);

Copilot uses AI. Check for mistakes.
Comment on lines +826 to +836
it('should discover ≥5 analysis directories', () => {
expect(analysisDirs.length).toBeGreaterThanOrEqual(5);
});

it('should have ≥3 synthesis summary files across all dates', () => {
expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3);
});

it('should have analysis directories spanning multiple dates', () => {
const uniqueDates = new Set(analysisDirs.map(d => d.date));
expect(uniqueDates.size).toBeGreaterThanOrEqual(2);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'Overall Quality' meta-tests assert hard minimums like ≥5 analysis directories, ≥2 unique dates, and ≥3 synthesis files. Given the retention policy in analysis/daily/README.md (older daily artifacts may be archived/removed), these thresholds are likely to become brittle and cause unrelated CI failures. Consider making these expectations conditional (e.g., only assert non-zero) or derive expectations from a committed fixture set rather than the entire repository history.

Suggested change
it('should discover ≥5 analysis directories', () => {
expect(analysisDirs.length).toBeGreaterThanOrEqual(5);
});
it('should have ≥3 synthesis summary files across all dates', () => {
expect(synthesisDirectories.length).toBeGreaterThanOrEqual(3);
});
it('should have analysis directories spanning multiple dates', () => {
const uniqueDates = new Set(analysisDirs.map(d => d.date));
expect(uniqueDates.size).toBeGreaterThanOrEqual(2);
it('should discover analysis directories', () => {
expect(analysisDirs.length).toBeGreaterThan(0);
});
it('should have synthesis summary files across retained dates', () => {
expect(synthesisDirectories.length).toBeGreaterThan(0);
});
it('should have analysis directories spanning retained dates when enough history exists', () => {
const uniqueDates = new Set(analysisDirs.map(d => d.date));
if (analysisDirs.length > 1) {
expect(uniqueDates.size).toBeGreaterThanOrEqual(2);
return;
}
expect(uniqueDates.size).toBeGreaterThan(0);

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 6, 2026

@copilot apply changes based on the comments in this thread

…e confidence extraction, deduplicate tests, relax thresholds

1. discoverAnalysisDirectories: scan root-level .md files under date dirs
2. isStrictV2Format: accept all real-world ID suffixes (CR01, MOT, DI, etc.)
3. Replace loose hasAnyConfidenceIndicator with extractOverallConfidence for
   synthesis checks; keep loose check for threat-analysis files
4. Deduplicate confidence checks between Template Structure and Confidence Labels
5. Relax hard-coded minimums in Overall Quality meta-tests for retention stability

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…dence fallback, broaden threat analysis indicators

- Extract STRUCTURED_ID_PATTERN as shared constant (single source of truth)
- Improve hasAnyConfidenceIndicator to require context proximity for fallback
- Broaden threat analysis check to accept severity/threat-level indicators
- Accept 'Date' field variant in risk assessment metadata validation

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/126defd2-b7b7-4efa-9eca-847536422cb4

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot apply changes based on the comments in this thread

Applied all 5 review comments across commits f194f85 and 570d045:

  1. Discovery expansiondiscoverAnalysisDirectories() now scans root-level .md files under date directories using a shared addAnalysisDirectory() helper
  2. Widened ID regexSTRUCTURED_ID_PATTERN (extracted as shared constant) accepts CR01, MOT, DI, RT1018, PROP suffixes in addition to 3-4 digit NNN
  3. Confidence extractionextractOverallConfidence() handles backtick-wrapped (`HIGH`), bold-wrapped (**MEDIUM**), and colon-inside-bold formats; hasAnyConfidenceIndicator() requires context proximity for fallback
  4. Deduplicated confidence — Template Structure checks metadata field presence; Confidence Labels validates parseable value — differentiated intent
  5. Relaxed thresholds — Overall Quality meta-tests use conditional assertions that survive archive rotation

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines +64 to +78
/** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */
const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [
{
label: 'neutralText: "The political landscape remains fluid…"',
pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i,
},
{
label: 'debateAnalysisMarker: "No chamber debate data is available…"',
pattern: /No chamber debate data is available for these items,? limiting our ability/i,
},
{
label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"',
pattern: /Requires committee review and chamber debate/i,
},
];
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANALYSIS_BANNED_PATTERNS is a partial redefinition of the repo’s canonical banned patterns (BANNED_PATTERNS used by detectBannedPatterns()), so it can drift and currently misses patterns like "Touches on {domains}" and "Analysis of N documents covering". To keep the quality gate consistent, reuse detectBannedPatterns() (or import/share the canonical pattern list) rather than duplicating a subset here.

Suggested change
/** Banned content patterns per SHARED_PROMPT_PATTERNS.md §BANNED Content Patterns */
const ANALYSIS_BANNED_PATTERNS: readonly { label: string; pattern: RegExp }[] = [
{
label: 'neutralText: "The political landscape remains fluid…"',
pattern: /The political landscape remains fluid,? with both government and opposition positioning for advantage/i,
},
{
label: 'debateAnalysisMarker: "No chamber debate data is available…"',
pattern: /No chamber debate data is available for these items,? limiting our ability/i,
},
{
label: 'policySignificanceGeneric: "Requires committee review and chamber debate…"',
pattern: /Requires committee review and chamber debate/i,
},
];
/** Use the canonical banned-pattern detector to avoid policy drift in this test. */
function detectAnalysisBannedPatterns(content: string) {
return detectBannedPatterns(content);
}

Copilot uses AI. Check for mistakes.
Comment on lines +700 to +712

for (const dir of strictV2SynthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
const confidence = extractOverallConfidence(content);
if (!confidence) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)`
);
}
}

expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two tests for strict-v2 synthesis confidence (should have a parseable... and should have a valid...) both assert the same condition (extractOverallConfidence(content) is truthy) and will fail with very similar messages. Consider removing one or consolidating them to avoid duplicated coverage and reduce test runtime/noise.

Suggested change
for (const dir of strictV2SynthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
const confidence = extractOverallConfidence(content);
if (!confidence) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: missing or invalid Overall Confidence metadata (expected HIGH, MEDIUM, or LOW)`
);
}
}
expect(failures, `Strict-v2 synthesis files with missing/invalid confidence:\n${failures.join('\n')}`).toHaveLength(0);
const validConfidenceValues = new Set(['HIGH', 'MEDIUM', 'LOW']);
for (const dir of strictV2SynthesisDirectories) {
const content = readAnalysisFile(dir, 'synthesis-summary.md');
if (!content) continue;
const confidence = extractOverallConfidence(content);
const normalizedConfidence = confidence?.trim().toUpperCase();
if (!normalizedConfidence || !validConfidenceValues.has(normalizedConfidence)) {
failures.push(
`${dir.date}/${dir.articleType}/synthesis-summary.md: invalid Overall Confidence metadata "${confidence ?? 'missing'}" (expected HIGH, MEDIUM, or LOW)`
);
}
}
expect(failures, `Strict-v2 synthesis files with invalid confidence values:\n${failures.join('\n')}`).toHaveLength(0);

Copilot uses AI. Check for mistakes.
Comment on lines +144 to +166
const addAnalysisDirectory = (fullPath: string, date: string, articleType: string): void => {
const mdFiles = fs.readdirSync(fullPath, { withFileTypes: true })
.filter(entry => entry.isFile() && entry.name.endsWith('.md'))
.map(entry => entry.name);

if (mdFiles.length === 0) return;

const files = new Map<string, string>();
for (const f of mdFiles) {
files.set(f, path.join(fullPath, f));
}

// Detect strict v2 format from synthesis or the first available file
const samplePath = files.get('synthesis-summary.md') ?? [...files.values()][0];
const sampleContent = samplePath ? fs.readFileSync(samplePath, 'utf-8') : '';
const isStrictV2 = isStrictV2Format(sampleContent);

const docsDir = path.join(fullPath, 'documents');
const hasDocuments = fs.existsSync(docsDir) &&
fs.readdirSync(docsDir, { withFileTypes: true })
.some(entry => entry.isFile() && entry.name.endsWith('.json'));

dirs.push({ fullPath, date, articleType, files, isStrictV2, hasDocuments });
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suite re-reads many of the same markdown files repeatedly across different tests (multiple nested loops calling fs.readFileSync), which will scale poorly as analysis/daily/ grows and can slow CI noticeably. Consider caching file contents once during discovery (e.g. store Map<filename, {path, content}> or a separate content cache keyed by full path) and reusing that in the assertions.

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 6, 2026

@copilot apply changes based on the comments in this thread and fix failures in TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)
TypeScript & JavaScript Testing / Unit Tests (Vitest) (pull_request)Failing after 41s

… tests, add content cache, fix TS error

1. Import detectBannedPatterns() from shared.ts — canonical source of truth
2. Consolidate duplicate confidence tests into single combined assertion
3. Add file content cache to avoid repeated readFileSync across tests
4. Remove unused hasRequiredMetadata function (fixed TS6133 type-check error)

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/edbbfc49-f4b2-47b8-96a0-068a11204c62

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

@copilot apply changes based on the comments in this thread and fix failures in [TypeScript & JavaSc...

Applied all 3 review comments and fixed the CI failure in 53147a7 and a232153:

  1. Banned patterns — replaced ANALYSIS_BANNED_PATTERNS with direct import of detectBannedPatterns() from shared.ts, ensuring policy changes propagate automatically.
  2. Duplicate confidence tests — consolidated into a single validConfidenceValues check that covers both presence and validity (35 tests, down from 36).
  3. Content cache — added _contentCache Map keyed by file path, used by readAnalysisFile() and new readCachedFile() helper across all test loops.
  4. CI fix — removed unused hasRequiredMetadata function that caused TS6133 error in the "TypeScript type-check (scripts)" step.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@pethers pethers merged commit 68a0fba into main Apr 6, 2026
21 checks passed
@pethers pethers deleted the copilot/add-analysis-quality-validation-tests branch April 6, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size-l Large change (250-1000 lines) size-xs Extra small change (< 10 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add comprehensive analysis quality validation test suite

3 participants