Skip to content

fix(rca): normalize confidence score scale (no more "9000%")#233

Open
WZ wants to merge 1 commit into
mainfrom
fix/confidence-score-scale
Open

fix(rca): normalize confidence score scale (no more "9000%")#233
WZ wants to merge 1 commit into
mainfrom
fix/confidence-score-scale

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Jun 3, 2026

Problem

The synthesis LLM is inconsistent about the confidenceScore scale — some completions emit a 0–1 fraction (0.9), others a 0–100 value (90 / 95.00). Stored unnormalized, every consumer that does × 100 rendered nonsense for the 0–100 case:

  • investigation header: "9000%"
  • report card: "(95.00)"
  • the low-confidence styling (< 0.5) silently never fired for 0–100 scores

Found during browser-QA of the orchestrator; unrelated to that work, so it ships independently off main.

Fix

  • New src/lib/confidence.tsconfidenceFraction (coerce either scale to a clamped 0–1) + confidencePercent (0–100 int). Shared web/server.
  • Normalize at the source (synthesis step) so new reports persist a 0–1 fraction — also keeps the runner's low-confidence gate (which assumes 0–1) correct.
  • Defensive at every display site so reports already persisted on the wrong scale still render right: RcaReport header + low-confidence banner/styling, InvestigationPane metadata, ChatPane card, markdown export, Slack notifier, runner event log.

Test

  • New confidence.test.ts (both scales → same %, clamps pathological 9000, handles null/NaN).
  • tsc clean, full suite 2409 passing, web build clean.

The synthesis LLM is inconsistent about the confidenceScore scale — some
completions emit a 0–1 fraction (0.9), others a 0–100 value (90 / 95.00).
It was stored unnormalized, so every consumer that multiplies by 100 rendered
nonsense for the 0–100 case: the investigation header showed "9000%", the
report card "(95.00)", etc. The low-confidence styling (`< 0.5`) also silently
never fired for 0–100 scores.

Fix:
- New `src/lib/confidence.ts` — `confidenceFraction` (coerce either scale to a
  clamped 0–1) + `confidencePercent` (0–100 int). Shared web/server.
- Normalize at the source (synthesis step) so new reports persist a 0–1
  fraction — also keeps the runner's low-confidence gate (which assumes 0–1)
  correct.
- Use the helpers defensively at every display site (RcaReport header + the
  low-confidence banner/styling, InvestigationPane metadata, ChatPane card,
  markdown export, Slack notifier, runner event log) so reports already
  persisted on the wrong scale render correctly too.

Found during orchestrator browser-QA; unrelated to the orchestrator, so shipped
off main. tsc clean, full suite 2409 (+ confidence unit tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant