feat: anchor WEP/DIW/scenario probabilities to base-rate datasets and enforce full ACH KJ coverage#2494
Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…CH/wildcard gate checks, and banned-phrase scanner - analysis/methodologies/calibration-ledger.md: 30+ backfilled forecasts with schema - analysis/methodologies/base-rates/: 4 JSON datasets (Lagrådet, ministerial, coalition, motions) - analysis/methodologies/admiralty-rubric.md: 22 worked examples for ≥80% inter-analyst agreement - analysis/templates/wildcards-blackswans.md: raised minimum to ≥15 wildcards with base-rate anchoring - analysis/templates/devils-advocate.md: KJ–ACH coverage map enforcing 100% coverage - .github/prompts/05-analysis-gate.md: Check 7b (ACH=100% KJ), 7c (wildcards≥15), 7d (banned-phrase scanner) - analysis/methodologies/political-style-guide.json: v1.1 with uncalibrated-probability category Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/015148ac-95dc-4b0a-9d35-c1d4db545a67 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ity, simplified banned-phrase scanner Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/015148ac-95dc-4b0a-9d35-c1d4db545a67 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This PR strengthens the project’s political-intelligence tradecraft by (1) requiring probabilistic claims (WEP/DIW/scenario/coalition probabilities) to be anchored to explicit base-rate datasets and (2) tightening the analysis gate to enforce full ACH coverage of all Key Judgments (KJs), plus a higher wildcard minimum.
Changes:
- Added new calibration artefacts (calibration ledger + Admiralty rubric) and base-rate JSON datasets to anchor priors.
- Updated analysis templates (wildcards + devil’s advocate) to require base-rate anchoring and explicit KJ→ACH coverage mapping.
- Extended the analysis gate to enforce 100% KJ coverage, ≥15 wildcards, and banned “uncalibrated probability” phrasing unless accompanied by a base-rate citation.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
analysis/templates/wildcards-blackswans.md |
Raises wildcard minimum to 15 and requires base-rate anchoring for each WEP. |
analysis/templates/devils-advocate.md |
Adds mandatory KJ–ACH coverage map and base-rate requirement for ACH hypotheses. |
analysis/methodologies/political-style-guide.json |
Adds an uncalibrated-probability banned-phrase category and expands allPhrases. |
analysis/methodologies/calibration-ledger.md |
Introduces a calibration ledger schema + backfilled forecast register with priors and Brier contributions. |
analysis/methodologies/base-rates/opposition-motion-adoption-rate.json |
Adds a base-rate dataset for opposition motion outcomes by committee/riksmöte. |
analysis/methodologies/base-rates/ministerial-answer-quality.json |
Adds a base-rate dataset for ministerial answer-quality tier rates + breakdowns. |
analysis/methodologies/base-rates/coalition-formation-outcomes.json |
Adds a base-rate dataset for coalition outcomes (Nordic + DACH) and Sweden-specific rates. |
analysis/methodologies/admiralty-rubric.md |
Adds a detailed Admiralty grading rubric with decision trees and worked examples. |
.github/prompts/05-analysis-gate.md |
Wires in new checks for ACH KJ coverage, wildcard minimum, and uncalibrated-probability phrase scanning. |
| # ACH = 100 % KJ coverage: count KJs in intelligence-assessment.md and verify | ||
| # that devils-advocate.md contains a KJ–ACH coverage map with at least that many rows. | ||
| # Proxy: count "KJ-" occurrences in intelligence-assessment.md and compare to | ||
| # the number of coverage-map rows in devils-advocate.md (rows matching "KJ-[0-9]+"). | ||
| if [ -s "$ANALYSIS_DIR/intelligence-assessment.md" ]; then | ||
| KJ_COUNT=$(grep -cE 'KJ-?[0-9]+' "$ANALYSIS_DIR/intelligence-assessment.md" || true) | ||
| # Each KJ should appear at least once in devils-advocate.md (via the KJ–ACH map) | ||
| DA_KJ_COVER=$(grep -cE 'KJ-?[0-9]+' "$ANALYSIS_DIR/devils-advocate.md" || true) | ||
| # Allow DA_KJ_COVER >= KJ_COUNT (KJs may appear multiple times in DA) | ||
| [ "${DA_KJ_COVER:-0}" -ge "${KJ_COUNT:-0}" ] \ | ||
| || { echo "❌ devils-advocate.md: ACH KJ-coverage gap — intelligence-assessment.md references ${KJ_COUNT:-0} KJ(s) but devils-advocate.md references only ${DA_KJ_COVER:-0}. Every KJ must be challenged in devils-advocate.md (see analysis/methodologies/admiralty-rubric.md)."; FAIL=1; } |
| # Check 7d — banned-phrase scan for uncalibrated-probability phrases in analysis artifacts. | ||
| # "analyst judgement, not derived from data" and similar phrases are banned unless the same | ||
| # paragraph contains a base-rate citation (base-rates/ file name or a calibration-ledger reference). | ||
| UNCAL_PHRASE_RE='analyst judgement, not derived|analyst judgment, not derived|not based on polling data|not derived from quantitative model|analyst judgement, not' |
| | **wep_pct_hi** | 95 | | ||
| | **point_estimate_pct** | 87 | | ||
| | **base_rate_source** | `base-rates/coalition-formation-outcomes.json` → `nordic_coalition_quarterly_survival_rate` | | ||
| | **base_rate_prior_pct** | 84 | |
| @@ -0,0 +1,172 @@ | |||
| { | |||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | |||
| "nordic_coalition_quarterly_survival_rate": 89, | ||
| "nordic_coalition_quarterly_survival_rate_note": "Proportion of Nordic governments surviving any given quarter (quarterly granularity, N=320 quarters across 4 Nordic countries 2010–2024)", | ||
| "nordic_incumbent_retention_rate_pct": 41, | ||
| "nordic_incumbent_retention_rate_note": "Proportion of Nordic elections where the incumbent government coalition is returned to power (N=18 Nordic elections 2010–2024)", | ||
| "sweden_budget_pass_first_vote_rate": 72, | ||
| "sweden_budget_pass_first_vote_rate_note": "Proportion of Swedish budgetramverk first-chamber votes passing without minority amendment (N=14 votes 2010–2024)", |
| @@ -0,0 +1,155 @@ | |||
| { | |||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | |||
| "aggregate_rates": { | ||
| "overall_adoption_rate_pct": 7, | ||
| "overall_adoption_rate_note": "All opposition motions adopted in any form (full, partial, or chamber override), across all committees and all riksmöten 2010–2025", | ||
| "cross_bloc_amendment_survival_rate_pct": 8, | ||
| "cross_bloc_amendment_survival_rate_note": "Motions receiving cross-bloc amendment treatment (incorporated into government bill after negotiation); subset of overall adoption", | ||
| "government_bill_rate_per_riksmote": 58, |
| @@ -0,0 +1,190 @@ | |||
| { | |||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | |||
| "tier3_substantive_no_commit_rate_pct": 32, | ||
| "tier3_plus_rate_pct": 54, | ||
| "tier4_plus_rate_pct": 11, | ||
| "tier5_full_accountability_rate_pct": 3, | ||
| "concrete_timeline_commitment_rate_pct": 22, | ||
| "scheduled_debate_completion_rate_pct": 96, | ||
| "high_salience_electoral_translation_rate_pct": 61, | ||
| "aggregate_notes": "tier1_2 + tier3 + tier4_plus = 100 %. High-salience electoral translation = proportion of interpellations rated tier 3+ AND on a domain polling in top-3 voter concerns (SVT/Demoskop) that translated to measurable polling shift within 60 days." |
| "analyst judgment, not derived from data", | ||
| "analyst judgement without base-rate", | ||
| "not based on polling data", | ||
| "analyst judgement, not", |
|
@copilot apply changes based on the comments in this thread and resolve conflicts with main |
- Merge origin/main with conflict resolution in 05-analysis-gate.md and devils-advocate.md - Replace raw KJ-grep count with unique-set comparison + ## Key Judgment Coverage Matrix heading check (comment 1) - Load uncalibrated-probability banned phrases directly from political-style-guide.json so gate and JSON cannot drift (comment 2) - Align calibration-ledger nordic_coalition_quarterly_survival_rate prior to dataset value 89 (comment 3) - Remove $schema metaschema URL from base-rate dataset instances (comments 4, 6, 8) - Standardise _pct suffix in coalition-formation-outcomes aggregate_rates (comment 5) - Clarify cross_bloc_amendment_survival_rate_pct is a parallel-track measure, not strict subset (comment 7) - Reconcile ministerial-answer-quality tier percentages: tier3_substantive_no_commit_rate_pct 32→43 so tier1_2(46)+tier3(43)+tier4_plus(11)=100 and tier3_plus(54)=tier3(43)+tier4_plus(11) (comment 9) - Remove overly broad "analyst judgement, not" banned-phrase prefix that could match legitimate sentences via literal substring matching; add "analyst judgment without base-rate" US-spelling variant (comment 10) Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/f8176389-02d4-41d5-899e-f2c160727d52 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…sis-methodology # Conflicts: # .github/prompts/05-analysis-gate.md # analysis/templates/devils-advocate.md Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Done in 1ca1a96 (review-comment fixes) and 478c583 (proper 2-parent merge with main). Review comments addressed:
|
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
$schemametaschema URLs from base-rate instance files_pctsuffix in coalition-formation-outcomes aggregate_ratesnpx vitest run tests/agentic-analysis-gate.test.ts(84 passed)npx vitest run tests/validate-article.test.ts(41 passed)