-
Notifications
You must be signed in to change notification settings - Fork 4
[task] Add CLARIFY protocol path eval #39
Copy link
Copy link
Open
Labels
area: testsAutomated tests and test infrastructure.Automated tests and test infrastructure.kind: maintenanceRepository maintenance or infrastructure work.Repository maintenance or infrastructure work.status: ready-for-workTriaged and ready for a contributor to claim.Triaged and ready for a contributor to claim.
Metadata
Metadata
Assignees
Labels
area: testsAutomated tests and test infrastructure.Automated tests and test infrastructure.kind: maintenanceRepository maintenance or infrastructure work.Repository maintenance or infrastructure work.status: ready-for-workTriaged and ready for a contributor to claim.Triaged and ready for a contributor to claim.
Checks
Area
Tests
Goal
The only existing eval (
test_eval_harness_single_hop) exercises theDONEpath. There is no eval verifying a real model correctly identifies a genuinely ambiguous query and emitsCLARIFYrather than hallucinating forward withREMEMBERorDONE.Run
anchor.runwith queries that are genuinely underspecified — no entity names, no clear intent — and assert:result.stop_reason == "ask"result.kind == "ask"result.contentcontains a question directed at the userDONEorREMEMBERfor these inputsUse 2-3 distinct underspecified queries to reduce flakiness risk from any single prompt.
The model may fail this test currently, that means the prompt needs fine tuning which is not included within this ticket
Definition of done
tests/evals/test_eval_clarify.pyai_fn,light_ai_fn, andembed_fnfixtures fromconftest.py@pytest.mark.evaluv run pytest -m eval tests/evalslocally with Ollama running