Skip to content

validators.py: make the mechanical floor language-pluggable (today: hard-coded ruff + pytest) #20

@samkeen

Description

@samkeen

Why

tilth/validators.py currently hard-codes the mechanical floor as ruff + pytest — Python-specific tools running against a Python-specific workspace. The harness itself isn't language-coupled (Brain/Hands/Session is generic; tools are generic), but the floor is.

This shows up in two places:

  1. The v1 worker–evaluator dialogue sketch (proposals/v1-worker-evaluator-dialogue.md) leans on "the mechanical floor stays the anchor — the evaluator's prose-judgment sits on top of it, never replaces it." That principle is sound, but as written it implicitly bakes in Python. Non-Python workspaces have nowhere to plug their own static + test commands.
  2. The v2 custom mechanical checks direction (Tilth's analog of OpenAI's custom-lints-with-remediation pattern) wants a clean place to add new validators. Today there's no abstraction to extend — every new check requires editing validators.py and run_all() directly.

What pluggable should mean

A per-workspace validator config — discoverable from the workspace itself (likely a section in AGENTS.md, or a sibling like .tilth/validators.yaml) — that names a list of (name, command, success_predicate) entries. validators.py becomes a runner over that list; the Python demo's config happens to be {ruff: \"ruff check .\", pytest: \"pytest -q\"}.

Open design questions worth working through in this issue:

  • Where does the config live — AGENTS.md section, sibling file, or pyproject.toml-style equivalent?
  • How does per-task test filtering (test_t<NNN>_*.py glob today) generalize across languages?
  • Does the seeder need to know about the validator config to author tests that the floor will actually run?
  • Does the prep-feature interview need a step that asks the user about their workspace's validators, or do we detect from the project shape?

Scope

This issue covers the design + implementation of the pluggable interface. It's a v1 prerequisite if Tilth is to support non-Python workspaces — and even for Python-only use, it cleans up a load-bearing coupling.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions