An opinionated harness for AI/Codex-driven development.
This repository combines:
- project rule templates (
AGENTS.md,docs/*.md) - phase/step execution workflow (
phases/) - guardrails for TDD and commits (
.codex/hooks,.codex/rules) - a step executor with retries, context carry-over, and automatic status tracking (
scripts/execute.py)
The goal is to make agent-driven work more predictable: less drift, less hidden context, stronger test discipline, and clearer step-by-step execution.
Korean version:
README.ko.md
When AI agents work directly in a repository, a few problems show up quickly:
- they guess requirements that were never written down
- they lose architectural context between steps
- they modify implementation code before tests exist
- they claim completion without a repeatable execution flow
This harness addresses that by making the project documents, execution plan, and verification rules part of the workflow itself.
AGENTS.mddefines the working rules for the target projectdocs/PRD.md,docs/ARCHITECTURE.md,docs/ADR.md,docs/UI_GUIDE.mdare templates that downstream projects fill in
These files are not meant to be complete inside this framework repository. They are inputs that a real project adopts and customizes.
In practice, the content does not have to be written entirely by hand. A common flow is:
- a human sets the direction and constraints
- an LLM drafts
AGENTS.mdanddocs/*.md - the human reviews and corrects the draft
The harness expects a phases/ directory with:
phases/index.jsonphases/<phase-name>/index.jsonphases/<phase-name>/stepN.md
Each step is an isolated execution unit with explicit instructions, acceptance criteria, verification steps, and forbidden actions.
The phase/step spec itself lives in .agents/skills/harness-workflow/SKILL.md. That file defines:
- the expected
phases/file layout - required fields such as
project,phase,step,name, andstatus - allowed status values:
pending,completed,error,blocked - step authoring rules such as kebab-case step names and required sections in
stepN.md
scripts/execute.py then acts as the runtime contract that assumes this structure exists.
scripts/execute.py:
- loads project guardrails from
AGENTS.mdanddocs/*.md - carries forward summaries from completed steps
- runs the current step through Codex
- retries up to 3 times on failure
- records
completed,error, orblockedstatus - creates code and metadata commits separately
.codex/hooks/tdd-guard.shblocks implementation edits when no matching test file exists.codex/hooks/commit-check.shblocksgit commitunless the project-specific checks pass.codex/rules/default.rulesforbids obviously destructive commands such asgit push --forceandgit reset --hard
.
├── .agents/skills/ # Harness-specific operating instructions
├── .codex/hooks/ # TDD and commit guardrails
├── .codex/rules/ # Command restrictions for Codex sessions
├── docs/ # Project document templates for downstream repos
├── scripts/ # Executor and tests
├── AGENTS.md # Project working-rules template
├── EXAMPLES.md # Good/bad examples for agent behavior
└── README*.md # Framework documentation
Before using this harness in a real project, you need:
- Python 3
- Git
- Codex CLI available on your shell path
- a repository where shell hooks and JSON/Markdown-based planning are acceptable
If you want the same confidence level this repository uses, you should also be able to run pytest.
Adapt these files for your target project:
AGENTS.mddocs/PRD.mddocs/ARCHITECTURE.mddocs/ADR.mddocs/UI_GUIDE.md
Fill them in with real project constraints instead of placeholders.
You can do this in two ways:
- write them directly yourself
- have an LLM draft them first, then review and tighten them
The harness needs the files and their content, but it does not require that every line be authored manually by a human.
The expected phase/step format is defined in .agents/skills/harness-workflow/SKILL.md, and the executor relies on that structure at runtime.
Create a top-level phase index:
{
"phases": [
{
"dir": "0-mvp",
"status": "pending"
}
]
}Create a phase directory such as phases/0-mvp/ and add its index:
{
"project": "MyProject",
"phase": "mvp",
"steps": [
{ "step": 0, "name": "project-setup", "status": "pending" },
{ "step": 1, "name": "core-types", "status": "pending" },
{ "step": 2, "name": "api-layer", "status": "pending" }
]
}For each step, create phases/<phase-name>/stepN.md.
Each step file should include at least:
- files to read
- the task
- acceptance criteria
- verification steps
- forbidden actions
Like the project docs, these step files can be drafted by an LLM. What matters is that they follow the harness-workflow spec and are reviewed before execution.
python3 scripts/execute.py 0-mvp
python3 scripts/execute.py 0-mvp --pushThe executor will create or switch to a branch named feat-<phase>, run steps in order, retry failed steps, and update status fields in the phase index files.
This is the kind of shape the harness expects from a step document:
# Step 0: project setup
## Read first
- AGENTS.md
- docs/PRD.md
- docs/ARCHITECTURE.md
## Task
Set up the initial project skeleton.
## Acceptance Criteria
- The required directories exist
- The verification command exits successfully
## Verification
1. Run the acceptance-criteria command
2. Re-check AGENTS.md / ARCHITECTURE.md / ADR.md constraints
3. Update phases/0-mvp/index.json with the result
## Forbidden
- Do not add extra features
- Do not commit manuallyWhen an implementation file is edited through Codex tools, the TDD hook checks whether a matching test file already exists. If not, it denies the edit and tells the agent to write the test first.
When a git commit command is attempted through the guarded Bash tool, the commit hook runs project-level checks.
Current repository logic includes:
- Node.js projects:
npm run lint,npm run build,npm run test - Python projects:
ruff check .,pytestwhen available - Go projects:
go vet ./...,go build ./...,go test ./...
The default Codex rules block destructive commands such as:
git push --forcegit reset --hard- recursive force deletion patterns
This framework repository currently validates its own executor and hooks with:
pytestThe existing test suite covers:
- executor behavior
- phase/index handling
- retry and status transitions
- TDD guard behavior
- commit-check behavior
Good fit:
- you want AI-assisted development with explicit guardrails
- you prefer work to be broken into small, reviewable steps
- you want project rules and architectural context injected into agent runs
- you want stronger TDD and commit discipline
Poor fit:
- you want a loose, ad-hoc workflow
- you do not want step files or phase indexes
- you do not want hooks blocking edits or commits
- your team is not willing to maintain project templates and execution metadata
This repository provides the harness pieces, not a full project bootstrapper.
That means a downstream project still needs to:
- provide real project rules and context in the templates
- produce
phases/metadata that matches the harness-workflow spec - provide actual step instructions for the target codebase
An LLM can draft those artifacts, but the project still needs those artifacts to exist.
The framework gives you the structure and enforcement layer; the project supplies and approves the real content.