chore(session): raise arbitration threshold 0.65 → 0.8#1577
Merged
hijzy merged 1 commit intoApr 29, 2026
Conversation
Tighten the trust gate for the LLM's `new_task` verdict in `relation-classifier.ts`. Real-world traces show the primary classifier often returns `new_task` at 0.65–0.75 confidence for messages that are actually sub-tasks of the same project (e.g. "now configure the DB" right after "set up nginx"). The old 0.65 cut-off let those slip through without a second-pass review, splitting one logical task into two. Pulling the threshold up to 0.8 routes more borderline `new_task` predictions through the bias-toward-follow_up arbitration prompt. The trade-off is one extra LLM call on borderline turns; the upside is fewer false topic boundaries (each false split currently costs us a premature episode close + a stale `lastEpisodeBySession` entry, which the L2/L3/skill chain has to recover from later). No schema/API change. Existing arbitration unit test still passes (it was already using a 0.5 confidence input — well below either threshold).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
new_taskverdict inapps/memos-local-plugin/core/session/relation-classifier.ts— bumpARBITRATION_THRESHOLDfrom0.65to0.8.new_taskat 0.65–0.75 confidence for messages that are actually sub-tasks of the same project (e.g. "那数据库怎么配" right after "配置Nginx"). The old 0.65 cut-off let those slip through without a second-pass review, falsely splitting one logical task into two.new_taskpredictions through the bias-toward-follow_uparbitration prompt costs one extra LLM call on those turns, but reduces false topic splits (each false split prematurely closes the episode and leaves a stalelastEpisodeBySessionentry that the L2/L3/skill chain has to recover from later).Behaviour change
Only one knob changes; nothing about the public API, schema, or storage layout.
core/session/relation-classifier.ts::ARBITRATION_THRESHOLD0.650.8Decision flow (unchanged structure, only the cut-off moves):
new_taskwithconfidence < ARBITRATION_THRESHOLD→ run the second-pass arbitration prompt (biased towardfollow_up).Test plan
npx vitest run tests/unit/session/relation-classifier.test.ts— 16 tests passed (existing arbitration test uses a 0.5 confidence input which is below both old and new thresholds, so it still exercises the same path).relation.classifiedlog now showssignals: ["llm", "arbitration_override"]for those turns and the episode stays open.