ClawCode

A framework you hand to your clawbot that allows it to spec, audit, implement, and ship any feature via orchestrating Codex CLI instances. Autonomously. No Hassle.

Quickstart · How it works · Why it exists · Full framework doc

Zero setup. Point your OpenClaw bot at this repo and tell it:

"Read docs/framework.md. Set up the implementation pipeline for my project at [your-repo-path]."

Your clawbot reads the framework, creates the directory structure, generates the config, and is ready to run. No install. No SDK. No dependencies.

  You write a working doc
         ↓
  ┌─────────────────────────────────────────────────────────┐
  │  ClawCode Pipeline (your clawbot runs this)             │
  │                                                         │
  │  Spec Gen → Spec Audit Loop → Implement → Code Audit   │
  │     ↑            ↕                            ↕         │
  │  codebase    fix ↔ re-audit              fix ↔ re-audit │
  └─────────────────────────────────────────────────────────┘
         ↓
  Feature branch ready for review

🚀 Quickstart

1.  Point your OpenClaw bot to this repo
2.  Tell it: "Read docs/framework.md. Set up the implementation pipeline for my project."
3.  Your clawbot sets up the pipeline automatically (directories, config, prompts, schemas)
4.  Write a working doc describing the feature you want built
5.  Tell it: "Run the pipeline on this working doc: [path]"
6.  Go do something else
7.  Come back to a feature branch ready for review

Recommended stack

Role	Tool	Details
Orchestrator	OpenClaw	Your clawbot — manages phases, carries context, escalates
Coding agent	Codex CLI	`codex exec` / `exec resume` — does the actual code reading & writing
Model	Codex 3.6 Extra High	The model ClawCode was built and tested with

ClawCode was built with OpenClaw + Codex CLI running Codex 3.6 Extra High. This is the intended and tested configuration. The pipeline relies on codex exec for autonomous execution, --output-schema for structured audit verdicts, and exec resume for session continuity across iterations. You can adapt it to other agents, but this is the path of least resistance.

🧠 The bigger picture

The way we build software with AI has gone through phases:

Phase 1: Autocomplete. AI suggests the next line. You accept or reject. It's a typing accelerator.

Phase 2: Copilot. AI writes functions, fills in blocks, answers questions inline. You're still driving — it's riding shotgun.

Phase 3: Autonomous agents. AI reads your codebase, writes code, runs commands. You describe what you want and it goes and builds it.

We're solidly in Phase 3 now. The models are capable enough. But there's a gap between "AI can write code" and "AI can ship a production feature" — and it's not about model intelligence. It's about process:

How do you capture intent precisely enough that the AI doesn't reinterpret it?
How do you make sure the spec actually covers edge cases and matches your data contracts before a single line is written?
How do you audit the implementation without the audit loop spiraling into new tangential issues every round?
How do you recover when the AI's context resets mid-implementation?

ClawCode is my answer. Structured prompts at every step. Machine-parseable audits with convergence guarantees. Disk-based state that survives anything. The AI does the work — the framework makes sure the work is right.

💥 Why I built this

I was shipping features on a complex codebase with AI agents and kept hitting the same walls:

Spec drift — you ask for one thing, the AI builds three things and "improves" your architecture along the way.

The 80% problem — implementation looks done until you notice the stubs, the TODOs, the empty error handlers, the missing edge cases.

Audit death spirals — you ask the AI to review its own work. It finds issues. You fix them. It finds different issues. You fix those. It finds new issues. The loop never converges. You've been going in circles for an hour.

Context amnesia — the AI forgets what it was doing mid-implementation. Starts over. Makes different decisions the second time. Nothing is consistent.

Zero paper trail — nothing is tracked. No artifacts are saved. When something goes wrong you're debugging blind.

I realized the fix wasn't a better model — it was a better process. Structured prompts and a framework around each step of implementation. ClawCode is that framework, standardized and open-sourced.

⚙️ How it works

Your clawbot = the project manager

Your clawbot (OpenClaw) orchestrates the pipeline. It doesn't write code. It manages phases, launches Codex CLI sessions, parses structured verdicts, extracts issue ledgers, carries context between steps, and escalates when things go wrong. Separate Codex sessions do the actual code reading and writing. This separation prevents the orchestrator from hallucinating code and gives the coding agent full sandbox access.

What your clawbot sets up

When you point your clawbot at docs/framework.md, it reads Section 4 (Step-by-Step Setup Checklist) and executes it automatically:

Verifies prerequisites — Git, jq, Codex CLI installed and authenticated
Creates the pipeline directory — prompts/, schemas/, state/, logs/, inbox/
Generates config.yaml — fills in your repo path, build commands, branch strategy, model
Copies prompt and schema files into place
Generates PIPELINE-SPEC.md — the operational runbook with your actual paths filled in
Wires into its own memory — adds pipeline awareness to AGENTS.md and heartbeat recovery to HEARTBEAT.md

After setup, your clawbot knows how to run the pipeline on any working doc you give it. It also knows how to resume a stalled pipeline after a context reset (via the heartbeat recovery system).

The pipeline

  Working Doc (you write this)
          |
    [1] Setup ──────────── branch created, state.json initialized
          |
    [2] Spec Gen ───────── Codex reads codebase + working doc → implementation spec
          |
    [3] Spec Audit Loop ── Codex audits spec → verdict → fix → re-audit (up to N rounds)
          |
    [4] Implementation ─── Codex implements the spec → actual code
          |
    [4.5] Build Verify ─── build + lint → fix if broken
          |
    [5] Code Audit Loop ── Codex audits code → verdict → fix → re-audit (up to N rounds)
          |
    [6] Finalize ───────── commit, push, summary → you review and merge

Phase 1-2: Spec generation. Codex reads your codebase + working doc and produces an implementation spec. The spec prompt structurally constrains every common LLM failure mode — drift, incompleteness, pattern invention, missing error handling.

Phase 3: Spec audit loop. A separate Codex session audits the spec against the working doc and actual codebase. Produces a structured JSON verdict: IMPLEMENTABLE, NOT_IMPLEMENTABLE_YET, or ESCALATE. If not implementable, your clawbot extracts the issue ledger, feeds it to the spec writer for targeted fixes, then re-audits. Up to N iterations, converging via the ledger system.

Phase 4-4.5: Implementation + build verification. Codex implements the spec end-to-end. Build and lint run automatically. If they fail, the agent gets the errors and fixes them.

Phase 5: Code audit loop. Another separate session audits the actual implementation against the spec. Same convergence system — structured verdicts, issue ledgers, targeted fixes, re-audits. Catches spec deviations, bugs, missing edge cases, security issues.

Phase 6: Finalize. Commit, push, generate summary report, notify you. You review the branch and merge.

🔑 Issue ledger convergence

This is the core innovation — the thing that makes the audit loops actually work.

Without ledgers, auditors find new tangential issues every iteration and the loop never converges (death spiral). With ledgers:

After each audit, issues are extracted into a plain-text ledger with stable IDs
The ledger is fed to the fix step — fix exactly these issues, nothing more
The ledger is fed to the re-audit step — verify these specific issues, don't expand scope
Re-audit marks each as resolved, unresolved, or new
new only counts if the fix introduced it — no scope creep, no tangents

The loop converges because both sides are constrained to the same issue set.

💾 Recovery

Everything is on disk. state.json tracks phase, thread IDs, iteration counts, verdicts, and a resume_hint — a one-sentence orientation for the next session. If your clawbot's context resets at any point (compaction, sleep, crash), the next session reads the state file and picks up exactly where it left off. Wire it into your clawbot's heartbeat and recovery is automatic.

📦 What's in this repo

clawcode/
├── README.md                          # You're here
├── docs/
│   └── framework.md                   # THE FILE — your clawbot reads this to bootstrap everything
│                                      #   Complete, self-contained, 2000+ lines
│                                      #   Setup checklist, phase commands, prompts, schemas — all in one
├── prompts/
│   ├── prompt0-working-doc.md         # Interactive working doc creation helper
│   ├── prompt1-spec-writer.md         # Spec generation prompt
│   ├── prompt2-spec-auditor.md        # Spec audit prompt
│   ├── prompt3-post-audit.md          # Implementation audit prompt
│   └── implementation-primer.md       # Implementation constraints & safeguards
├── schemas/
│   ├── spec-audit-verdict.json        # Structured output: spec verdicts
│   └── impl-audit-verdict.json        # Structured output: code verdicts
├── templates/
│   ├── config.yaml                    # Pipeline configuration template
│   └── PIPELINE-SPEC.md              # Operational runbook template
└── examples/
    └── working-doc-example.md         # Example working doc input

Note: The prompt files, schemas, and templates are pre-extracted for reference. But docs/framework.md contains everything — your clawbot only needs that one file to set up and run the entire pipeline.

🔀 Adapt it to your stack

The pipeline is language-agnostic and project-agnostic. What you customize:

What	Where
Build/lint/test commands	`templates/config.yaml`
Base branch, branch prefix	`templates/config.yaml`
Sandbox restrictions, env quirks	`prompts/implementation-primer.md`
Coding agent CLI commands	Shell commands in `docs/framework.md` Section 7

Works with Node, Python, Rust, Go, Java — anything with a build step. The core architecture is universal:

Phase-gate pipeline (spec → audit → implement → audit → ship)
Issue ledger convergence (the death spiral fix)
Structured verdict schemas with severity gating
Disk-based state management
Heartbeat recovery

🩹 Lessons from production

Battle-tested on a real codebase. Save yourself the debugging time.

exec resume doesn't support -o or --output-schema. Include schemas inline and tell the agent to write files via shell commands. This will bite you first.
Without issue ledgers and clear pass in prompts, audits death-spiral. The auditor finds new tangential issues every round instead of verifying fixes. The ledger system and a clear rubic fixes this. Biggest lesson learned.
Always demand the complete updated spec on revisions. If you say "fix issues," the AI returns a patch. Explicitly demand the full end-to-end spec every time.
Codify your environment constraints. If port binding is blocked or certain commands fail in your sandbox, put it in implementation-primer.md. Otherwise the coding agent discovers it the hard way — wasting an entire iteration.
State is king. If it's not in state.json, it didn't happen. Your future self (after a context reset) depends entirely on what's on disk.
First run will have issues. That's expected. Update your primer and runbook after each run. It gets better fast.

Origin

Built by @OlypsisAli while shipping features on Thesis, Feb 2026.

Born out of the gap between "AI can write code" and "AI can ship production features." ClawCode is the bridge.

License

MIT — use it, fork it, adapt it, ship with it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClawCode

🚀 Quickstart

Recommended stack

🧠 The bigger picture

💥 Why I built this

⚙️ How it works

Your clawbot = the project manager

What your clawbot sets up

The pipeline

🔑 Issue ledger convergence

💾 Recovery

📦 What's in this repo

🔀 Adapt it to your stack

🩹 Lessons from production

Origin

License

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
examples		examples
prompts		prompts
schemas		schemas
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

OlypsisAli/ClawCode

Folders and files

Latest commit

History

Repository files navigation

ClawCode

🚀 Quickstart

Recommended stack

🧠 The bigger picture

💥 Why I built this

⚙️ How it works

Your clawbot = the project manager

What your clawbot sets up

The pipeline

🔑 Issue ledger convergence

💾 Recovery

📦 What's in this repo

🔀 Adapt it to your stack

🩹 Lessons from production

Origin

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages