Harness artifacts contain fabricated evidence: numbers, benchmarks, and citations not in case packet

## Summary

Artifacts generated by the conflict harness regularly contain fabricated specifics — numbers, benchmarks, and citations invented by the model to make arguments feel more grounded than the evidence supports. The structural layer (decision frames, evidence gap lists, pass/fail gates) is reliable. The specificity layer is not.

## Evidence

**Example 1 — Fabricated citation (`retention-vs-growth-tradeoff`)**

The scenario has zero context files. Side A's final artifact cites:

> "Retention investment generates higher ROI vs. new acquisition spend (internal modeling, **ctx-1**)"

`ctx-1` does not exist. The model invented a citation to a non-existent source document. The judge did not flag it.

**Example 2 — Fabricated industry benchmarks (`multi-tenant-shared-learning`)**

> "Enterprise customers lose **0.3–1.8% of GMV** to fraud."
> "Near-parity with centralized training at sufficient tenant count (**≥20–30 active tenants**)"

Neither figure is in the case packet. The judge *did* catch this one and penalized Side A for it — but the numbers are still in the committed artifact.

**Example 3 — Fabricated cost estimate (`mobile-first-single-survey`)**

Scenario states only "6 months of engineering effort." Side A adds:

> "A 6-month engineering commitment represents **$600K–$1.2M in fully-loaded cost**"

Not in the scenario. Invented to strengthen the evidence-gate argument.

**Example 4 — Fabricated segment size (`churn-correlation-causation`)**

> "closing the 18-percentage-point churn gap on even **20% of the inactive-collaboration segment**"

The 20% figure is invented. The structural point is correct; the number is not.

## Pattern

The model generates a structurally correct argument, then populates it with invented specifics to make the argument feel grounded. More pronounced in `executive_ambiguity` and `historical_strategy` scenarios (sparse evidence, room to fill). Less pronounced in `evidence_fragile_prd` scenarios (the prompt contract makes fabrication less available as a rhetorical move).

## What Is and Isn't Broken

| Artifact element | Reliability |
|---|---|
| Decision frame | High |
| Evidence gap list | High |
| Pass/fail gates | High |
| Recommended next artifact | High |
| Specific numbers, thresholds, benchmarks | Low — treat as invented until verified |
| Citations to context files (ctx-N) | Low — must be verified; may point to non-existent documents |
| Industry statistics | Low — drawn from model training data, not the packet |

The harness is working correctly as a **reasoning quality evaluator**. It is not a factual accuracy engine and should not be positioned as one.

## Recommended Fixes

**1. Add fabrication detection to the judge prompt**

Instruct the judge to explicitly flag any quantitative claim not traceable to the case packet:

> "For each piece of quantitative evidence cited by either side, note whether it appears in the case packet. Flag any claim that introduces specificity not present in the packet."

**2. Validate ctx-N references before generation**

The harness knows which context files exist (`case_packet.evidence`). Inject an explicit constraint into the first-pass prompt:

> "The only evidence available to you is contained in the case packet above. Do not introduce statistics, benchmarks, or citations not present in the packet."

Post-generation, scan for `ctx-N` references and verify each against the actual evidence array. Flag or reject artifacts with phantom citations before they reach the judge.

**3. Add `unsupported_claim_count` to the verdict schema**

The run schema already tracks `disagreement_rate`, `declared_adoption_rate`, and `substantive_revision_rate`. An `unsupported_claim_count` field — populated by the judge's explicit count of ungrounded claims — would make fabrication a first-class measurable output.

## Full Write-Up

`docs/review/fabricated-evidence-findings-2026-04-16.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harness artifacts contain fabricated evidence: numbers, benchmarks, and citations not in case packet #11

Summary

Evidence

Pattern

What Is and Isn't Broken

Recommended Fixes

Full Write-Up

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Artifact element	Reliability
Decision frame	High
Evidence gap list	High
Pass/fail gates	High
Recommended next artifact	High
Specific numbers, thresholds, benchmarks	Low — treat as invented until verified
Citations to context files (ctx-N)	Low — must be verified; may point to non-existent documents
Industry statistics	Low — drawn from model training data, not the packet

Harness artifacts contain fabricated evidence: numbers, benchmarks, and citations not in case packet #11

Description

Summary

Evidence

Pattern

What Is and Isn't Broken

Recommended Fixes

Full Write-Up

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions