Skip to content

fix: judge prompt should not penalize extra correct information#388

Merged
BYK merged 1 commit into
mainfrom
fix-judge
May 19, 2026
Merged

fix: judge prompt should not penalize extra correct information#388
BYK merged 1 commit into
mainfrom
fix-judge

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 19, 2026

Summary

The LLM-as-judge was penalizing Lore answers that included additional correct context beyond the reference answer, treating extra information as 'unverified specifics'.

Example

Route handler error handling question:

  • Reference: 'try/catch, 404/400/500, console.error'
  • Lore answer: all of the above PLUS Hono framework, inline try/catch, test writing rules
  • Judge scored 3.0 reasoning: 'adds unverified specifics not in the reference'

The extra information was correct (from other preferences in the same scenario) but the judge treated it as potentially fabricated.

Fix

Added guidance to the judge system prompt:

  • Reference answer is a MINIMUM, not an exhaustive list
  • Extra correct details should not be penalized
  • Only contradictions or fabrications should lower scores

Results

PR-2 at 400K tokens (Lore only):

  • Average: 4.68 → 4.80
  • Route handler pattern: 3.0 → 4.0
  • Tests alongside: 4.6 → 5.0

Files Changed

  • packages/core/eval/judge.ts — judge system prompt

The LLM judge was penalizing answers that included additional correct
context beyond the reference answer, treating them as 'unverified
specifics'. This caused Lore to score lower when it correctly recalled
MORE than the reference answer contained.

Added guidance: reference answer is a MINIMUM, extra correct details
should not be penalized, only contradictions or fabrications.

PR-2 at 400K: 4.68 → 4.80 (route handler: 3.0 → 4.0)
@BYK BYK self-assigned this May 19, 2026
@BYK BYK merged commit 243aa11 into main May 19, 2026
10 checks passed
@BYK BYK deleted the fix-judge branch May 19, 2026 11:22
This was referenced May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant