fix: depth-evaluator 不再把真誠反思誤判成 EVASION#23
Merged
Conversation
pilot 真人第一句真誠回答(對工作意義的懷疑)被判 EVASION → 直接停題。 Gemini + Codex 雙審收斂的 4 件修法: - 收窄 EVASION 定義:對主題的 doubt/confusion/fatigue/sadness 等情緒 不算 evasion,只有明確拒答/轉移才算 - prompt 加 3 個 few-shot 範例 - 低信心 EVASION(confidence < 0.85)自動降級成 THIN + follow-up - EVASION 第一次改走溫和 bridge(認可難度、給空間),只有連續才 pause Constraint: pilot hotfix;不擴張 ontology(不新增 TurnKind) Rejected: 只改 prompt 定義 | Gemini+Codex 一致認定脆弱,需 few-shot + runtime guard Directive: 誤判成本不對稱 — 打斷真誠回答 >> 多問一次;故門檻 0.85 偏保守、第一次不 pause Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
摘要
pilot hotfix。真人 pilot 第一句真誠回答(「我有點迷惑,有時我會覺得我的工作有價值嗎?」—— 對工作意義的懷疑)被 depth-evaluator 誤判成 EVASION,bot 直接停題。這正是人格訪談最該萃取的回答類型,卻被當成迴避。
根因
兩個問題疊加:
_pause_current_question(),0 容忍(META 還有 2 次 bridge 緩衝)。分類器必有誤判,runtime 不該讓一次誤判就停題。修法(Gemini + Codex 雙審收斂的 4 件,不擴張 ontology)
depth_evaluator.py— 收窄 EVASION 定義:doubt/confusion/fatigue/sadness 等對主題的情緒不算 evasion,只有明確拒答/轉移才算。depth_evaluator.py— prompt 加 3 個 few-shot 範例(含 pilot 那句 → SUFFICIENT)。depth_evaluator.py— 低信心 EVASION(confidence < 0.85)自動降級成 THIN + follow-up。bot.py— EVASION 第一次改走溫和 bridge(_gentle_evasion_bridge,認可難度、給空間),只有連續才 pause。誤判成本不對稱:打斷真誠回答 >> 多問一次 —— 故門檻取保守的 0.85、第一次不 pause。
驗證
新測試:低信心 EVASION→THIN、高信心 EVASION 維持;
_handle_non_answer第一次 bridge / 第二次 pause。review chain
bug 路由給 Gemini(多方案分析)+ Codex(工程穩健度)→ 兩者強烈收斂 → Codex 實作 → Claude 獨立驗證(268 全綠 + 逐行 diff review)。
注意
pilot hotfix —— 合併後需部署到 VPS(vm.2ch.tw, git pull + restart)讓真人 pilot 能繼續。
🤖 Generated with Claude Code