Skip to content

Fix triple-shot second pass caused by merge recommendations#662

Merged
Iron-Ham merged 1 commit intomainfrom
Iron-Ham/fix-tripleshot-merge-second-pass
Feb 15, 2026
Merged

Fix triple-shot second pass caused by merge recommendations#662
Iron-Ham merged 1 commit intomainfrom
Iron-Ham/fix-tripleshot-merge-second-pass

Conversation

@Iron-Ham
Copy link
Owner

Summary

  • Root cause: When the judge recommends "merge" strategy, it populates suggested_changes in the evaluation JSON. LLMs frequently write this as a plain string instead of []string, causing json.Unmarshal to fail → VerifyWork returns false → bridge calls gate.Fail() → with defaultMaxRetries=2, the task retries and spawns a duplicate judge instance
  • Fix: Added FlexibleStringSlice type (mirrors existing FlexibleString) that accepts both string and []string JSON values. Applied to all []string fields in LLM-parsed structs: Evaluation, AttemptEvaluationItem, AdversarialReviewFile
  • Also made Evaluation.Reasoning use FlexibleString for the same resilience
  • Logged SetMaxRetries errors instead of silently discarding with _ =
  • Consolidated redundant Team("judge") lookup in startJudge

Test plan

  • Added TestParseEvaluationFile_FlexibleFields with 4 cases: string-as-suggested_changes, array-as-suggested_changes, reasoning-as-array, strengths/weaknesses-as-string
  • All existing tripleshot, teamwire, adversarial, TUI, and bridge tests pass with -race
  • go build ./... clean
  • go vet ./... clean

When the judge recommends "merge" strategy, it populates the
suggested_changes field. LLMs frequently write this as a plain string
instead of []string, causing json.Unmarshal to fail. The parse failure
made VerifyWork return false, the bridge called gate.Fail(), and with
defaultMaxRetries=2 the task retried — spawning a duplicate judge.

Add FlexibleStringSlice type (mirrors existing FlexibleString) to
tolerate string/array mismatches in all LLM-parsed sentinel file
structs: Evaluation, AttemptEvaluationItem, AdversarialReviewFile.

Also log SetMaxRetries errors instead of silently discarding, and
consolidate the redundant Team("judge") lookup in startJudge.
@Iron-Ham Iron-Ham force-pushed the Iron-Ham/fix-tripleshot-merge-second-pass branch from 216dc56 to 6426b1d Compare February 15, 2026 21:20
@Iron-Ham Iron-Ham merged commit 8979c19 into main Feb 15, 2026
6 checks passed
@Iron-Ham Iron-Ham deleted the Iron-Ham/fix-tripleshot-merge-second-pass branch February 15, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments