chore(session): raise arbitration threshold 0.65 → 0.8 by hijzy · Pull Request #1577 · MemTensor/MemOS

hijzy · 2026-04-29T03:00:33Z

Summary

Tighten the trust gate for the LLM's new_task verdict in apps/memos-local-plugin/core/session/relation-classifier.ts — bump ARBITRATION_THRESHOLD from 0.65 to 0.8.
Real-world traces show the primary relation classifier often returns new_task at 0.65–0.75 confidence for messages that are actually sub-tasks of the same project (e.g. "那数据库怎么配" right after "配置Nginx"). The old 0.65 cut-off let those slip through without a second-pass review, falsely splitting one logical task into two.
Routing more borderline new_task predictions through the bias-toward-follow_up arbitration prompt costs one extra LLM call on those turns, but reduces false topic splits (each false split prematurely closes the episode and leaves a stale lastEpisodeBySession entry that the L2/L3/skill chain has to recover from later).

Behaviour change

Only one knob changes; nothing about the public API, schema, or storage layout.

Path	Before	After
`core/session/relation-classifier.ts::ARBITRATION_THRESHOLD`	`0.65`	`0.8`

Decision flow (unchanged structure, only the cut-off moves):

LLM returns new_task with confidence < ARBITRATION_THRESHOLD → run the second-pass arbitration prompt (biased toward follow_up).
Otherwise → take the LLM verdict at face value.

Test plan

npx vitest run tests/unit/session/relation-classifier.test.ts — 16 tests passed (existing arbitration test uses a 0.5 confidence input which is below both old and new thresholds, so it still exercises the same path).
Smoke check on a real session where consecutive sub-task messages were previously being mis-split — confirm relation.classified log now shows signals: ["llm", "arbitration_override"] for those turns and the episode stays open.

Tighten the trust gate for the LLM's `new_task` verdict in `relation-classifier.ts`. Real-world traces show the primary classifier often returns `new_task` at 0.65–0.75 confidence for messages that are actually sub-tasks of the same project (e.g. "now configure the DB" right after "set up nginx"). The old 0.65 cut-off let those slip through without a second-pass review, splitting one logical task into two. Pulling the threshold up to 0.8 routes more borderline `new_task` predictions through the bias-toward-follow_up arbitration prompt. The trade-off is one extra LLM call on borderline turns; the upside is fewer false topic boundaries (each false split currently costs us a premature episode close + a stale `lastEpisodeBySession` entry, which the L2/L3/skill chain has to recover from later). No schema/API change. Existing arbitration unit test still passes (it was already using a 0.5 confidence input — well below either threshold).

hijzy merged commit ac276cb into MemTensor:mem-agent-0424 Apr 29, 2026

hijzy deleted the chore/raise-arbitration-threshold branch May 8, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(session): raise arbitration threshold 0.65 → 0.8#1577

chore(session): raise arbitration threshold 0.65 → 0.8#1577
hijzy merged 1 commit into
MemTensor:mem-agent-0424from
hijzy:chore/raise-arbitration-threshold

hijzy commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hijzy commented Apr 29, 2026

Summary

Behaviour change

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant