fix(rca): normalize confidence score scale (no more "9000%")#233
Open
WZ wants to merge 1 commit into
Open
Conversation
The synthesis LLM is inconsistent about the confidenceScore scale — some completions emit a 0–1 fraction (0.9), others a 0–100 value (90 / 95.00). It was stored unnormalized, so every consumer that multiplies by 100 rendered nonsense for the 0–100 case: the investigation header showed "9000%", the report card "(95.00)", etc. The low-confidence styling (`< 0.5`) also silently never fired for 0–100 scores. Fix: - New `src/lib/confidence.ts` — `confidenceFraction` (coerce either scale to a clamped 0–1) + `confidencePercent` (0–100 int). Shared web/server. - Normalize at the source (synthesis step) so new reports persist a 0–1 fraction — also keeps the runner's low-confidence gate (which assumes 0–1) correct. - Use the helpers defensively at every display site (RcaReport header + the low-confidence banner/styling, InvestigationPane metadata, ChatPane card, markdown export, Slack notifier, runner event log) so reports already persisted on the wrong scale render correctly too. Found during orchestrator browser-QA; unrelated to the orchestrator, so shipped off main. tsc clean, full suite 2409 (+ confidence unit tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The synthesis LLM is inconsistent about the
confidenceScorescale — some completions emit a 0–1 fraction (0.9), others a 0–100 value (90/95.00). Stored unnormalized, every consumer that does× 100rendered nonsense for the 0–100 case:< 0.5) silently never fired for 0–100 scoresFound during browser-QA of the orchestrator; unrelated to that work, so it ships independently off
main.Fix
src/lib/confidence.ts—confidenceFraction(coerce either scale to a clamped 0–1) +confidencePercent(0–100 int). Shared web/server.Test
confidence.test.ts(both scales → same %, clamps pathological9000, handles null/NaN).tscclean, full suite 2409 passing, web build clean.