Codex Harness Framework

An opinionated harness for AI/Codex-driven development.

This repository combines:

project rule templates (AGENTS.md, docs/*.md)
phase/step execution workflow (phases/)
guardrails for TDD and commits (.codex/hooks, .codex/rules)
a step executor with retries, context carry-over, and automatic status tracking (scripts/execute.py)

The goal is to make agent-driven work more predictable: less drift, less hidden context, stronger test discipline, and clearer step-by-step execution.

Korean version: README.ko.md

What problem this solves

When AI agents work directly in a repository, a few problems show up quickly:

they guess requirements that were never written down
they lose architectural context between steps
they modify implementation code before tests exist
they claim completion without a repeatable execution flow

This harness addresses that by making the project documents, execution plan, and verification rules part of the workflow itself.

How the harness works

1. Project guidance lives in templates

AGENTS.md defines the working rules for the target project
docs/PRD.md, docs/ARCHITECTURE.md, docs/ADR.md, docs/UI_GUIDE.md are templates that downstream projects fill in

These files are not meant to be complete inside this framework repository. They are inputs that a real project adopts and customizes.

In practice, the content does not have to be written entirely by hand. A common flow is:

a human sets the direction and constraints
an LLM drafts AGENTS.md and docs/*.md
the human reviews and corrects the draft

2. Work is broken into phases and steps

The harness expects a phases/ directory with:

phases/index.json
phases/<phase-name>/index.json
phases/<phase-name>/stepN.md

Each step is an isolated execution unit with explicit instructions, acceptance criteria, verification steps, and forbidden actions.

The phase/step spec itself lives in .agents/skills/harness-workflow/SKILL.md. That file defines:

the expected phases/ file layout
required fields such as project, phase, step, name, and status
allowed status values: pending, completed, error, blocked
step authoring rules such as kebab-case step names and required sections in stepN.md

scripts/execute.py then acts as the runtime contract that assumes this structure exists.

3. The executor runs steps in order

scripts/execute.py:

loads project guardrails from AGENTS.md and docs/*.md
carries forward summaries from completed steps
runs the current step through Codex
retries up to 3 times on failure
records completed, error, or blocked status
creates code and metadata commits separately

4. Guardrails enforce discipline

.codex/hooks/tdd-guard.sh blocks implementation edits when no matching test file exists
.codex/hooks/commit-check.sh blocks git commit unless the project-specific checks pass
.codex/rules/default.rules forbids obviously destructive commands such as git push --force and git reset --hard

Repository structure

.
├── .agents/skills/         # Harness-specific operating instructions
├── .codex/hooks/           # TDD and commit guardrails
├── .codex/rules/           # Command restrictions for Codex sessions
├── docs/                   # Project document templates for downstream repos
├── scripts/                # Executor and tests
├── AGENTS.md               # Project working-rules template
├── EXAMPLES.md             # Good/bad examples for agent behavior
└── README*.md              # Framework documentation

Prerequisites

Before using this harness in a real project, you need:

Python 3
Git
Codex CLI available on your shell path
a repository where shell hooks and JSON/Markdown-based planning are acceptable

If you want the same confidence level this repository uses, you should also be able to run pytest.

Quick start

1. Start from the templates

Adapt these files for your target project:

AGENTS.md
docs/PRD.md
docs/ARCHITECTURE.md
docs/ADR.md
docs/UI_GUIDE.md

Fill them in with real project constraints instead of placeholders.

You can do this in two ways:

write them directly yourself
have an LLM draft them first, then review and tighten them

The harness needs the files and their content, but it does not require that every line be authored manually by a human.

2. Define the execution phases

The expected phase/step format is defined in .agents/skills/harness-workflow/SKILL.md, and the executor relies on that structure at runtime.

Create a top-level phase index:

{
  "phases": [
    {
      "dir": "0-mvp",
      "status": "pending"
    }
  ]
}

Create a phase directory such as phases/0-mvp/ and add its index:

{
  "project": "MyProject",
  "phase": "mvp",
  "steps": [
    { "step": 0, "name": "project-setup", "status": "pending" },
    { "step": 1, "name": "core-types", "status": "pending" },
    { "step": 2, "name": "api-layer", "status": "pending" }
  ]
}

3. Write step files

For each step, create phases/<phase-name>/stepN.md.

Each step file should include at least:

files to read
the task
acceptance criteria
verification steps
forbidden actions

Like the project docs, these step files can be drafted by an LLM. What matters is that they follow the harness-workflow spec and are reviewed before execution.

4. Run the executor

python3 scripts/execute.py 0-mvp
python3 scripts/execute.py 0-mvp --push

The executor will create or switch to a branch named feat-<phase>, run steps in order, retry failed steps, and update status fields in the phase index files.

Minimal step example

This is the kind of shape the harness expects from a step document:

# Step 0: project setup

## Read first
- AGENTS.md
- docs/PRD.md
- docs/ARCHITECTURE.md

## Task
Set up the initial project skeleton.

## Acceptance Criteria
- The required directories exist
- The verification command exits successfully

## Verification
1. Run the acceptance-criteria command
2. Re-check AGENTS.md / ARCHITECTURE.md / ADR.md constraints
3. Update phases/0-mvp/index.json with the result

## Forbidden
- Do not add extra features
- Do not commit manually

Guardrails

TDD guard

When an implementation file is edited through Codex tools, the TDD hook checks whether a matching test file already exists. If not, it denies the edit and tells the agent to write the test first.

Commit check

When a git commit command is attempted through the guarded Bash tool, the commit hook runs project-level checks.

Current repository logic includes:

Node.js projects: npm run lint, npm run build, npm run test
Python projects: ruff check ., pytest when available
Go projects: go vet ./..., go build ./..., go test ./...

Command restrictions

The default Codex rules block destructive commands such as:

git push --force
git reset --hard
recursive force deletion patterns

Development and tests

This framework repository currently validates its own executor and hooks with:

pytest

The existing test suite covers:

executor behavior
phase/index handling
retry and status transitions
TDD guard behavior
commit-check behavior

When to use this harness

Good fit:

you want AI-assisted development with explicit guardrails
you prefer work to be broken into small, reviewable steps
you want project rules and architectural context injected into agent runs
you want stronger TDD and commit discipline

Poor fit:

you want a loose, ad-hoc workflow
you do not want step files or phase indexes
you do not want hooks blocking edits or commits
your team is not willing to maintain project templates and execution metadata

Important caveat

This repository provides the harness pieces, not a full project bootstrapper.

That means a downstream project still needs to:

provide real project rules and context in the templates
produce phases/ metadata that matches the harness-workflow spec
provide actual step instructions for the target codebase

An LLM can draft those artifacts, but the project still needs those artifacts to exist.

The framework gives you the structure and enforcement layer; the project supplies and approves the real content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Harness Framework

What problem this solves

How the harness works

1. Project guidance lives in templates

2. Work is broken into phases and steps

3. The executor runs steps in order

4. Guardrails enforce discipline

Repository structure

Prerequisites

Quick start

1. Start from the templates

2. Define the execution phases

3. Write step files

4. Run the executor

Minimal step example

Guardrails

TDD guard

Commit check

Command restrictions

Development and tests

When to use this harness

Important caveat

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.agents/skills		.agents/skills
.codex		.codex
docs		docs
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
EXAMPLES.md		EXAMPLES.md
README.ko.md		README.ko.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Codex Harness Framework

What problem this solves

How the harness works

1. Project guidance lives in templates

2. Work is broken into phases and steps

3. The executor runs steps in order

4. Guardrails enforce discipline

Repository structure

Prerequisites

Quick start

1. Start from the templates

2. Define the execution phases

3. Write step files

4. Run the executor

Minimal step example

Guardrails

TDD guard

Commit check

Command restrictions

Development and tests

When to use this harness

Important caveat

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages