fix:The failed task was wrongly recorded as a "successful experience".#1807
Conversation
Memtensor-AI
left a comment
There was a problem hiding this comment.
Needs minor changes — overall logic is solid
The core fix is correct and well-motivated: using objectiveOutcome() to authoritatively determine pass/fail prevents partial verifier results (e.g., 3/4 tests passing with reward 0) from being misclassified as positive exemplars. The test case directly reproduces the bug. Good work.
Specific findings
-
objectiveOutcome—rTask > 0may be too lenientif (rTask > 0) return "pass";
This contradicts the strict "only full credit counts" philosophy stated in the comments. If
rTaskis on a 0..1 scale, a value like 0.3 would be classified as "pass". Should this also requirerTask >= FULL_PASS_REWARD - 1e-9? Or isrTaskguaranteed to be on a {-1, 0, +1} scale? If so, document that assumption. -
verifierContainerparses arbitrary JSON strings fromrawif (typeof obj === "string") { try { obj = JSON.parse(obj); } catch { return null; } }
This is fine defensively, but note that a very large string payload could be expensive to parse here. Low risk in practice, just worth a comment if
rawcan come from user-facing feedback. -
Dead code path in
buildDraftpriority chain
After the refactor, theelse if (verifier)branch (line ~191 in the new code) now always assignspolarity = "neutral". Ifoutcomewas "pass" or "fail", we'd never reach this branch. So it only fires when outcome is "unknown" AND lexical signals are inconclusive. That's correct but subtle — a brief comment there would help future readers. -
Removed
verifierparam fromisPositiveSignal/isNegativeSignal
Callers no longer pass it, which is fine. But confirm no other call sites exist outside this file (the test file doesn't call them directly, so likely safe). -
Test coverage suggestion: Consider adding a case where
reward = 1andpassed < totalto verify which field wins. Currentlypassed/totalis checked first, so{reward: 1, passed: 3, total: 4}would be "fail" — is that intentional? It seems like a realistic conflict from a buggy verifier payload.
Nits
- The comment
// covers {-1,+1} and 0..1has an unclosed paren on the constant declaration line. timeout|time limit exceededadded toisNegativeSignalis good, buttime limit exceededcontains spaces so\bword boundaries work correctly here — just confirming that's intentional (it is, since the regex tests the fulllowerstring).
Overall a well-scoped fix with a clear test. The rTask > 0 semantics (point 1) is the only thing I'd want clarified before merging.
Automated PR from mem-agent-0520-niu to mem-agent-0520.