Problem
Small Harness has a useful agent eval path (small-harness --eval ..., /eval agent ..., and /play ...), but eval fixtures are currently registered in Rust source via the built-in fixture list. That makes custom or domain-specific eval packs awkward: adding one requires changing and rebuilding Small Harness itself instead of dropping a fixture into a project or sharing a small fixture directory.
This limits use cases like:
- comparing local models against the same project-specific tasks
- shipping reusable eval packs alongside a repo
- testing domain-specific agent behavior without forking Small Harness
- using
/play-style sandboxes for non-built-in tasks
Proposal
Add support for external AgentEvalFixture definitions loaded from a file or project directory while keeping the current built-ins unchanged.
A minimal first version could support:
small-harness --eval ./evals/fixtures/fix-readme-badge.json --model qwen2.5:7b
and optionally later:
/eval agent ./evals/fixtures/fix-readme-badge.json
/eval agent project
Suggested fixture shape can mirror the existing serialized Rust structs:
{
"id": "fix-readme-badge",
"prompt": "Update the README version badge to match Cargo.toml.",
"workspace": "workspaces/fix-readme-badge",
"checks": [
{ "type": "fileContains", "path": "README.md", "needle": "version-1.0.4" },
{ "type": "testsPass" }
]
}
Design constraints
- Preserve all existing built-in fixtures and fixture IDs.
- Keep external fixtures data-only at first; no arbitrary commands in fixture files.
- Resolve fixture workspaces relative to the fixture file or fixture-pack root.
- Reject workspace paths that escape the fixture pack/root.
- Reuse existing check types initially (
testsPass, fileContains, gitClean, toolUsed, assistantMentions).
- Produce clear errors for unknown fixture paths, malformed JSON, unknown checks, or missing workspaces.
Acceptance criteria
small-harness --eval <builtin-id> continues to work unchanged.
small-harness --eval <path-to-fixture.json> runs an external fixture.
- External fixture workspace copy behavior matches built-ins.
- External fixture loading has unit tests for happy path, malformed fixture, missing workspace, and path traversal rejection.
- README/Quickstart document a small external fixture example.
This would make Small Harness more useful as a general local-agent benchmark harness without forcing every eval scenario into the main repository.
Problem
Small Harness has a useful agent eval path (
small-harness --eval ...,/eval agent ..., and/play ...), but eval fixtures are currently registered in Rust source via the built-in fixture list. That makes custom or domain-specific eval packs awkward: adding one requires changing and rebuilding Small Harness itself instead of dropping a fixture into a project or sharing a small fixture directory.This limits use cases like:
/play-style sandboxes for non-built-in tasksProposal
Add support for external
AgentEvalFixturedefinitions loaded from a file or project directory while keeping the current built-ins unchanged.A minimal first version could support:
and optionally later:
Suggested fixture shape can mirror the existing serialized Rust structs:
{ "id": "fix-readme-badge", "prompt": "Update the README version badge to match Cargo.toml.", "workspace": "workspaces/fix-readme-badge", "checks": [ { "type": "fileContains", "path": "README.md", "needle": "version-1.0.4" }, { "type": "testsPass" } ] }Design constraints
testsPass,fileContains,gitClean,toolUsed,assistantMentions).Acceptance criteria
small-harness --eval <builtin-id>continues to work unchanged.small-harness --eval <path-to-fixture.json>runs an external fixture.This would make Small Harness more useful as a general local-agent benchmark harness without forcing every eval scenario into the main repository.