Skip to content

MARKZZAM/codex-harness-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codex Harness Framework

An opinionated harness for AI/Codex-driven development.

This repository combines:

  • project rule templates (AGENTS.md, docs/*.md)
  • phase/step execution workflow (phases/)
  • guardrails for TDD and commits (.codex/hooks, .codex/rules)
  • a step executor with retries, context carry-over, and automatic status tracking (scripts/execute.py)

The goal is to make agent-driven work more predictable: less drift, less hidden context, stronger test discipline, and clearer step-by-step execution.

Korean version: README.ko.md

What problem this solves

When AI agents work directly in a repository, a few problems show up quickly:

  • they guess requirements that were never written down
  • they lose architectural context between steps
  • they modify implementation code before tests exist
  • they claim completion without a repeatable execution flow

This harness addresses that by making the project documents, execution plan, and verification rules part of the workflow itself.

How the harness works

1. Project guidance lives in templates

  • AGENTS.md defines the working rules for the target project
  • docs/PRD.md, docs/ARCHITECTURE.md, docs/ADR.md, docs/UI_GUIDE.md are templates that downstream projects fill in

These files are not meant to be complete inside this framework repository. They are inputs that a real project adopts and customizes.

In practice, the content does not have to be written entirely by hand. A common flow is:

  • a human sets the direction and constraints
  • an LLM drafts AGENTS.md and docs/*.md
  • the human reviews and corrects the draft

2. Work is broken into phases and steps

The harness expects a phases/ directory with:

  • phases/index.json
  • phases/<phase-name>/index.json
  • phases/<phase-name>/stepN.md

Each step is an isolated execution unit with explicit instructions, acceptance criteria, verification steps, and forbidden actions.

The phase/step spec itself lives in .agents/skills/harness-workflow/SKILL.md. That file defines:

  • the expected phases/ file layout
  • required fields such as project, phase, step, name, and status
  • allowed status values: pending, completed, error, blocked
  • step authoring rules such as kebab-case step names and required sections in stepN.md

scripts/execute.py then acts as the runtime contract that assumes this structure exists.

3. The executor runs steps in order

scripts/execute.py:

  • loads project guardrails from AGENTS.md and docs/*.md
  • carries forward summaries from completed steps
  • runs the current step through Codex
  • retries up to 3 times on failure
  • records completed, error, or blocked status
  • creates code and metadata commits separately

4. Guardrails enforce discipline

  • .codex/hooks/tdd-guard.sh blocks implementation edits when no matching test file exists
  • .codex/hooks/commit-check.sh blocks git commit unless the project-specific checks pass
  • .codex/rules/default.rules forbids obviously destructive commands such as git push --force and git reset --hard

Repository structure

.
├── .agents/skills/         # Harness-specific operating instructions
├── .codex/hooks/           # TDD and commit guardrails
├── .codex/rules/           # Command restrictions for Codex sessions
├── docs/                   # Project document templates for downstream repos
├── scripts/                # Executor and tests
├── AGENTS.md               # Project working-rules template
├── EXAMPLES.md             # Good/bad examples for agent behavior
└── README*.md              # Framework documentation

Prerequisites

Before using this harness in a real project, you need:

  • Python 3
  • Git
  • Codex CLI available on your shell path
  • a repository where shell hooks and JSON/Markdown-based planning are acceptable

If you want the same confidence level this repository uses, you should also be able to run pytest.

Quick start

1. Start from the templates

Adapt these files for your target project:

  • AGENTS.md
  • docs/PRD.md
  • docs/ARCHITECTURE.md
  • docs/ADR.md
  • docs/UI_GUIDE.md

Fill them in with real project constraints instead of placeholders.

You can do this in two ways:

  • write them directly yourself
  • have an LLM draft them first, then review and tighten them

The harness needs the files and their content, but it does not require that every line be authored manually by a human.

2. Define the execution phases

The expected phase/step format is defined in .agents/skills/harness-workflow/SKILL.md, and the executor relies on that structure at runtime.

Create a top-level phase index:

{
  "phases": [
    {
      "dir": "0-mvp",
      "status": "pending"
    }
  ]
}

Create a phase directory such as phases/0-mvp/ and add its index:

{
  "project": "MyProject",
  "phase": "mvp",
  "steps": [
    { "step": 0, "name": "project-setup", "status": "pending" },
    { "step": 1, "name": "core-types", "status": "pending" },
    { "step": 2, "name": "api-layer", "status": "pending" }
  ]
}

3. Write step files

For each step, create phases/<phase-name>/stepN.md.

Each step file should include at least:

  • files to read
  • the task
  • acceptance criteria
  • verification steps
  • forbidden actions

Like the project docs, these step files can be drafted by an LLM. What matters is that they follow the harness-workflow spec and are reviewed before execution.

4. Run the executor

python3 scripts/execute.py 0-mvp
python3 scripts/execute.py 0-mvp --push

The executor will create or switch to a branch named feat-<phase>, run steps in order, retry failed steps, and update status fields in the phase index files.

Minimal step example

This is the kind of shape the harness expects from a step document:

# Step 0: project setup

## Read first
- AGENTS.md
- docs/PRD.md
- docs/ARCHITECTURE.md

## Task
Set up the initial project skeleton.

## Acceptance Criteria
- The required directories exist
- The verification command exits successfully

## Verification
1. Run the acceptance-criteria command
2. Re-check AGENTS.md / ARCHITECTURE.md / ADR.md constraints
3. Update phases/0-mvp/index.json with the result

## Forbidden
- Do not add extra features
- Do not commit manually

Guardrails

TDD guard

When an implementation file is edited through Codex tools, the TDD hook checks whether a matching test file already exists. If not, it denies the edit and tells the agent to write the test first.

Commit check

When a git commit command is attempted through the guarded Bash tool, the commit hook runs project-level checks.

Current repository logic includes:

  • Node.js projects: npm run lint, npm run build, npm run test
  • Python projects: ruff check ., pytest when available
  • Go projects: go vet ./..., go build ./..., go test ./...

Command restrictions

The default Codex rules block destructive commands such as:

  • git push --force
  • git reset --hard
  • recursive force deletion patterns

Development and tests

This framework repository currently validates its own executor and hooks with:

pytest

The existing test suite covers:

  • executor behavior
  • phase/index handling
  • retry and status transitions
  • TDD guard behavior
  • commit-check behavior

When to use this harness

Good fit:

  • you want AI-assisted development with explicit guardrails
  • you prefer work to be broken into small, reviewable steps
  • you want project rules and architectural context injected into agent runs
  • you want stronger TDD and commit discipline

Poor fit:

  • you want a loose, ad-hoc workflow
  • you do not want step files or phase indexes
  • you do not want hooks blocking edits or commits
  • your team is not willing to maintain project templates and execution metadata

Important caveat

This repository provides the harness pieces, not a full project bootstrapper.

That means a downstream project still needs to:

  • provide real project rules and context in the templates
  • produce phases/ metadata that matches the harness-workflow spec
  • provide actual step instructions for the target codebase

An LLM can draft those artifacts, but the project still needs those artifacts to exist.

The framework gives you the structure and enforcement layer; the project supplies and approves the real content.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors