Skip to content

harness implementation plan

Aryan Iyappan edited this page Apr 28, 2026 · 2 revisions

title: Harness Implementation Plan category: projects tags: [harness, ultimate-pi, implementation, architecture] status: developing created: 2026-04-28 updated: 2026-04-28 sources:


Harness Implementation Plan

Overview

The agentic harness is an 8-layer mandatory pipeline for the ultimate-pi project. Every task flows through all 8 layers. Build order: schemas → memory → spec hardening → planner → execution+grounding → QA → critics → observability → Archon workflow.

For the full concept, see agentic-harness.

Build Phases

Phase What Files
0 Foundation lib/harness-schemas.ts, lib/harness-config.ts, .pi/harness.example.json
1 Memory (L6) Install claude-obsidian skills, scaffold vault Mode B, extensions/harness-knowledge-base.ts
2 Spec Hardening (L1) lib/harness-spec.ts, extensions/harness-spec.ts
3 Planning (L2) lib/harness-planner.ts, extensions/harness-planner.ts
4 Execution+Grounding (L3) lib/harness-executor.ts, extensions/harness-executor.ts
5 Automated QA (L4) lib/harness-qa.ts, extensions/harness-qa.ts
6 Critics (L5) lib/harness-critics.ts, extensions/harness-critics.ts
7 Observability (L6) lib/harness-observability.ts, extensions/harness-observability.ts
8 Archon Integration (L7) .archon/workflows/*.yaml, .archon/commands/harness.md
9 Package integration Update package.json, README, PLAN.md

Key Architecture Decisions

  • ADR-008 (Black-Box QA): Tests from spec only, never from implementation — see adversarial-verification
  • ADR-009 (claude-obsidian Mode B): Replaces Vectra + embeddings with LLM-native search — see persistent-memory

Shared Schemas

All inter-layer data contracts in lib/harness-schemas.ts:

  • HardenedSpec — Layer 1 output (success criteria, anti-criteria, ambiguities, scope)
  • ExecutionPlan / PlanNode — Layer 2 output (DAG with dependencies, risk, verification)
  • GroundingCheckpoint — Layer 3 checkpoints (spec version, drift, state hash)
  • SubtaskResult — Layer 3 output
  • TestSuite / QAResult — Layer 4 output (spec-only test cases, coverage, verdict)
  • CriticReview / CriticFailure — Layer 5 output (verdict, failures, severity)
  • ObservabilitySpec — Layer 6 output (metrics, health checks, alerts)
  • WikiEntryType — Layer 7+8 wiki categories
  • AgentCapability / TaskRouting — Layer 7 orchestration

Config Schema

lib/harness-config.ts — all tunable parameters. No enable/disable for layers. The harness is always on when installed.

Default config in .pi/harness.json.

Token Budget

Layer Tokens/subtask
Spec hardening ~2,000
Planning + review ~5,000
Grounding checkpoints ~500
Automated QA (test gen + run) ~3,500
Critics (2 focus areas) ~4,000
Observability ~1,500
Memory writes ~500 (standard) / ~1,500 (deep)
Total per subtask ~17,500

Typical 5-subtask plan: ~83,500 tokens overhead + coding tokens.

Risk Surface

Major risks and mitigations:

  • AI hallucinates ambiguitymax_ambiguity_retries cap; human force-approve via approve-spec
  • Invalid plan DAG → Automated cycle detection + validation
  • QA tests don't compile → Fallback to schema-only validation if runner fails
  • QA test writer leaks implementation → Architectural enforcement: prompt never includes implementation; spec_only is immutable
  • Critic false positivesconditional_pass doesn't block; max_attack_rounds caps rework
  • Memory grows unboundedmax_entries config; wiki-lint detects orphans; failures never evicted
  • Archon runtime dependency → Harness normalizes terminal states; compensation/resume in own state machine
  • claude-obsidian is external → Pin skill versions; skills are ~50KB text, vendorable
  • LLM-native search cost → hot.md short-circuits most queries at ~500 tokens; deep queries optional

Verification Criteria

Each phase complete when:

  1. TypeScript compiles (npm run check:ts)
  2. Extension loads in pi.dev
  3. Unit tests pass for core logic
  4. Integration test: minimal task through the layer in isolation
  5. Phase 5: generated tests pass; spec-only constraint verified
  6. Phase 8: archon run harness-pipeline completes end-to-end
  7. Workflow status round-trips through Archon JSON

Clone this wiki locally