Skip to content

Harness artifacts contain fabricated evidence: numbers, benchmarks, and citations not in case packet #11

@EdgeCaser

Description

@EdgeCaser

Summary

Artifacts generated by the conflict harness regularly contain fabricated specifics — numbers, benchmarks, and citations invented by the model to make arguments feel more grounded than the evidence supports. The structural layer (decision frames, evidence gap lists, pass/fail gates) is reliable. The specificity layer is not.

Evidence

Example 1 — Fabricated citation (retention-vs-growth-tradeoff)

The scenario has zero context files. Side A's final artifact cites:

"Retention investment generates higher ROI vs. new acquisition spend (internal modeling, ctx-1)"

ctx-1 does not exist. The model invented a citation to a non-existent source document. The judge did not flag it.

Example 2 — Fabricated industry benchmarks (multi-tenant-shared-learning)

"Enterprise customers lose 0.3–1.8% of GMV to fraud."
"Near-parity with centralized training at sufficient tenant count (≥20–30 active tenants)"

Neither figure is in the case packet. The judge did catch this one and penalized Side A for it — but the numbers are still in the committed artifact.

Example 3 — Fabricated cost estimate (mobile-first-single-survey)

Scenario states only "6 months of engineering effort." Side A adds:

"A 6-month engineering commitment represents $600K–$1.2M in fully-loaded cost"

Not in the scenario. Invented to strengthen the evidence-gate argument.

Example 4 — Fabricated segment size (churn-correlation-causation)

"closing the 18-percentage-point churn gap on even 20% of the inactive-collaboration segment"

The 20% figure is invented. The structural point is correct; the number is not.

Pattern

The model generates a structurally correct argument, then populates it with invented specifics to make the argument feel grounded. More pronounced in executive_ambiguity and historical_strategy scenarios (sparse evidence, room to fill). Less pronounced in evidence_fragile_prd scenarios (the prompt contract makes fabrication less available as a rhetorical move).

What Is and Isn't Broken

Artifact element Reliability
Decision frame High
Evidence gap list High
Pass/fail gates High
Recommended next artifact High
Specific numbers, thresholds, benchmarks Low — treat as invented until verified
Citations to context files (ctx-N) Low — must be verified; may point to non-existent documents
Industry statistics Low — drawn from model training data, not the packet

The harness is working correctly as a reasoning quality evaluator. It is not a factual accuracy engine and should not be positioned as one.

Recommended Fixes

1. Add fabrication detection to the judge prompt

Instruct the judge to explicitly flag any quantitative claim not traceable to the case packet:

"For each piece of quantitative evidence cited by either side, note whether it appears in the case packet. Flag any claim that introduces specificity not present in the packet."

2. Validate ctx-N references before generation

The harness knows which context files exist (case_packet.evidence). Inject an explicit constraint into the first-pass prompt:

"The only evidence available to you is contained in the case packet above. Do not introduce statistics, benchmarks, or citations not present in the packet."

Post-generation, scan for ctx-N references and verify each against the actual evidence array. Flag or reject artifacts with phantom citations before they reach the judge.

3. Add unsupported_claim_count to the verdict schema

The run schema already tracks disagreement_rate, declared_adoption_rate, and substantive_revision_rate. An unsupported_claim_count field — populated by the judge's explicit count of ungrounded claims — would make fabrication a first-class measurable output.

Full Write-Up

docs/review/fabricated-evidence-findings-2026-04-16.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions