Type: good first issue · no MLX needed to validate
Abstention traps are utterances that look like tool requests but should NOT trigger a call (e.g. "Remind me why I started this diet" contains "remind" but is not a reminder request). They are how we measure false-triggering.
What to do
- Add cases to
cases/handwritten/dev.jsonl and cases/handwritten/test.jsonl (keep dev/test slot-disjoint).
- Each case expects abstention (no tool call). Follow the format of the existing handwritten cases.
- Regenerate with
python scripts/gen_cases.py, then run python scripts/validate_cases.py and pytest tests/test_cases_valid.py.
Acceptance
- New traps pass validation, category distribution stays within the
caselint targets, and dev/test share no slot values.
Type: good first issue · no MLX needed to validate
Abstention traps are utterances that look like tool requests but should NOT trigger a call (e.g. "Remind me why I started this diet" contains "remind" but is not a reminder request). They are how we measure false-triggering.
What to do
cases/handwritten/dev.jsonlandcases/handwritten/test.jsonl(keep dev/test slot-disjoint).python scripts/gen_cases.py, then runpython scripts/validate_cases.pyandpytest tests/test_cases_valid.py.Acceptance
caselinttargets, and dev/test share no slot values.