Skip to content

fix:The failed task was wrongly recorded as a "successful experience".#1807

Merged
whipser030 merged 1 commit into
mem-agent-0520from
mem-agent-0520-niu
May 26, 2026
Merged

fix:The failed task was wrongly recorded as a "successful experience".#1807
whipser030 merged 1 commit into
mem-agent-0520from
mem-agent-0520-niu

Conversation

@whipser030
Copy link
Copy Markdown
Collaborator

Automated PR from mem-agent-0520-niu to mem-agent-0520.

@whipser030 whipser030 merged commit 9233370 into mem-agent-0520 May 26, 2026
@Memtensor-AI Memtensor-AI added the plugin Plugin/adapter/bridge layer (apps/ directory) | 插件/适配层 label May 26, 2026
Copy link
Copy Markdown
Collaborator

@Memtensor-AI Memtensor-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR fixing a real bug where partial passes were incorrectly recorded as successes.

Needs minor changes — overall logic is solid

The core fix is correct and well-motivated: using objectiveOutcome() to authoritatively determine pass/fail prevents partial verifier results (e.g., 3/4 tests passing with reward 0) from being misclassified as positive exemplars. The test case directly reproduces the bug. Good work.

Specific findings

  1. objectiveOutcomerTask > 0 may be too lenient

    if (rTask > 0) return "pass";

    This contradicts the strict "only full credit counts" philosophy stated in the comments. If rTask is on a 0..1 scale, a value like 0.3 would be classified as "pass". Should this also require rTask >= FULL_PASS_REWARD - 1e-9? Or is rTask guaranteed to be on a {-1, 0, +1} scale? If so, document that assumption.

  2. verifierContainer parses arbitrary JSON strings from raw

    if (typeof obj === "string") {
      try { obj = JSON.parse(obj); } catch { return null; }
    }

    This is fine defensively, but note that a very large string payload could be expensive to parse here. Low risk in practice, just worth a comment if raw can come from user-facing feedback.

  3. Dead code path in buildDraft priority chain
    After the refactor, the else if (verifier) branch (line ~191 in the new code) now always assigns polarity = "neutral". If outcome was "pass" or "fail", we'd never reach this branch. So it only fires when outcome is "unknown" AND lexical signals are inconclusive. That's correct but subtle — a brief comment there would help future readers.

  4. Removed verifier param from isPositiveSignal/isNegativeSignal
    Callers no longer pass it, which is fine. But confirm no other call sites exist outside this file (the test file doesn't call them directly, so likely safe).

  5. Test coverage suggestion: Consider adding a case where reward = 1 and passed < total to verify which field wins. Currently passed/total is checked first, so {reward: 1, passed: 3, total: 4} would be "fail" — is that intentional? It seems like a realistic conflict from a buggy verifier payload.

Nits

  • The comment // covers {-1,+1} and 0..1 has an unclosed paren on the constant declaration line.
  • timeout|time limit exceeded added to isNegativeSignal is good, but time limit exceeded contains spaces so \b word boundaries work correctly here — just confirming that's intentional (it is, since the regex tests the full lower string).

Overall a well-scoped fix with a clear test. The rTask > 0 semantics (point 1) is the only thing I'd want clarified before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugin Plugin/adapter/bridge layer (apps/ directory) | 插件/适配层

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants