Why
tilth/validators.py currently hard-codes the mechanical floor as ruff + pytest — Python-specific tools running against a Python-specific workspace. The harness itself isn't language-coupled (Brain/Hands/Session is generic; tools are generic), but the floor is.
This shows up in two places:
- The v1 worker–evaluator dialogue sketch (
proposals/v1-worker-evaluator-dialogue.md) leans on "the mechanical floor stays the anchor — the evaluator's prose-judgment sits on top of it, never replaces it." That principle is sound, but as written it implicitly bakes in Python. Non-Python workspaces have nowhere to plug their own static + test commands.
- The v2 custom mechanical checks direction (Tilth's analog of OpenAI's custom-lints-with-remediation pattern) wants a clean place to add new validators. Today there's no abstraction to extend — every new check requires editing
validators.py and run_all() directly.
What pluggable should mean
A per-workspace validator config — discoverable from the workspace itself (likely a section in AGENTS.md, or a sibling like .tilth/validators.yaml) — that names a list of (name, command, success_predicate) entries. validators.py becomes a runner over that list; the Python demo's config happens to be {ruff: \"ruff check .\", pytest: \"pytest -q\"}.
Open design questions worth working through in this issue:
- Where does the config live —
AGENTS.md section, sibling file, or pyproject.toml-style equivalent?
- How does per-task test filtering (
test_t<NNN>_*.py glob today) generalize across languages?
- Does the seeder need to know about the validator config to author tests that the floor will actually run?
- Does the prep-feature interview need a step that asks the user about their workspace's validators, or do we detect from the project shape?
Scope
This issue covers the design + implementation of the pluggable interface. It's a v1 prerequisite if Tilth is to support non-Python workspaces — and even for Python-only use, it cleans up a load-bearing coupling.
Related
Why
tilth/validators.pycurrently hard-codes the mechanical floor as ruff + pytest — Python-specific tools running against a Python-specific workspace. The harness itself isn't language-coupled (Brain/Hands/Session is generic; tools are generic), but the floor is.This shows up in two places:
proposals/v1-worker-evaluator-dialogue.md) leans on "the mechanical floor stays the anchor — the evaluator's prose-judgment sits on top of it, never replaces it." That principle is sound, but as written it implicitly bakes in Python. Non-Python workspaces have nowhere to plug their own static + test commands.validators.pyandrun_all()directly.What pluggable should mean
A per-workspace validator config — discoverable from the workspace itself (likely a section in
AGENTS.md, or a sibling like.tilth/validators.yaml) — that names a list of(name, command, success_predicate)entries.validators.pybecomes a runner over that list; the Python demo's config happens to be{ruff: \"ruff check .\", pytest: \"pytest -q\"}.Open design questions worth working through in this issue:
AGENTS.mdsection, sibling file, orpyproject.toml-style equivalent?test_t<NNN>_*.pyglob today) generalize across languages?Scope
This issue covers the design + implementation of the pluggable interface. It's a v1 prerequisite if Tilth is to support non-Python workspaces — and even for Python-only use, it cleans up a load-bearing coupling.
Related
proposals/v1-worker-evaluator-dialogue.md— the v1 sketch that depends on thistilth/validators.py— current hard-coded surface