feat: add local model evaluation harness#71
Conversation
Review: solid eval harness, one safety concernMust fixAuto-approving const safetyGate = new SafetyGate({
auto_approve: registry.list().map(tool => tool.name),
})This auto-approves every registered tool including Suggestions (non-blocking)
What looks good
|
|
Addressed in Changes made:
Verification:
|
|
Follow-up looks good. All three items addressed cleanly:
One minor nit: the comment on the safety gate says "read-only tools" but the list includes Tests pass (7/7), build clean. LGTM — approve. |
Summary
locode eval-local-modelsfor repeatable tool-calling evaluation across local Ollama modelsllama3.1:8bllama3.1:8bas the default and positiongemma4:9bas the recommended upgrade to evaluateTesting
npm test -- --run src/cli/eval-local-models.test.ts src/config/loader.test.ts src/cli/setup.test.tsnpm run buildNotes
locode eval-local-modelslocally after pulling the desired Ollama models