Type: help wanted
Toolhound benchmarks existing zero-training tool-calling fixes; PA-Tool is integrated. Add another (e.g. TSCG, or a constrained-decoding wrapper once the seam lands).
What to do
- Implement the
Method.prepare(repo, tools, *, gen) -> MethodResult(tools, canonicalize) seam in src/toolprobe/methods/.
- A method may only transform how the tool catalog is PRESENTED; the scorer stays frozen (adapt-then-canonicalize).
- No
mlx import in methods/ (the hygiene test enforces this).
- Report gains on held-out
cases/test.jsonl with non-overlapping CIs — do not select on test.
Acceptance
toolprobe run --method <name> emits a comparison row; unit tests cover the transform + canonicalization.
Type: help wanted
Toolhound benchmarks existing zero-training tool-calling fixes; PA-Tool is integrated. Add another (e.g. TSCG, or a constrained-decoding wrapper once the seam lands).
What to do
Method.prepare(repo, tools, *, gen) -> MethodResult(tools, canonicalize)seam insrc/toolprobe/methods/.mlximport inmethods/(the hygiene test enforces this).cases/test.jsonlwith non-overlapping CIs — do not select on test.Acceptance
toolprobe run --method <name>emits a comparison row; unit tests cover the transform + canonicalization.