Skip to content

adversarial verification

Aryan Iyappan edited this page Apr 28, 2026 · 2 revisions

title: Adversarial Verification category: concepts tags: [harness, critics, testing, layer-4, quality] status: developing created: 2026-04-28 updated: 2026-04-28 sources:

  • "harness-implementation-plan" related:
  • "agentic-harness"
  • "spec-hardening"
  • "agentic-harness" layer: "Layer_4" summary: Layer 4 runs adversarial critic agents against every subtask. Critics attack, not review. Multiple critics with different focus areas (correctness, security, performance, spec_compliance). A single critical failure blocks the subtask. provenance: extracted: 0.85 inferred: 0.15 ambiguous: 0.0

Adversarial Verification

Origin Principle

Human code review is partly social and partly quality control. For agents, only quality control matters — and it should be much stronger. The critic agent's explicit role is to find failures. Multiple independent critics with different focus areas are better than one. This layer is mandatory. No code ships without passing adversarial review.

Critic Focus Areas

Focus Attack Vector On by default
Correctness Logic errors, off-by-one, unhandled edges, null derefs, type mismatches Yes
Security Injection, auth bypass, data exposure, missing input validation No
Performance N+1 queries, unbounded loops, memory leaks, blocking async No
Spec compliance Missing functionality, anti-criteria violations, scope creep Yes

Security and performance critics are added for risk-sensitive changes (configurable via focus_areas).

Verdict Semantics

Verdict Meaning Action
pass No failures found Proceed to automated-observability
fail At least one critical/major failure Rework subtask
conditional_pass Only minor issues Proceed with logged caveats

Failure Severity

Severity Blocks? Description
critical Yes Incorrect behavior, data loss, security breach
major Yes Significant defect, must fix before proceeding
minor No Style/cosmetic or unlikely edge case

Retry Logic

Failed subtask → rework → re-review. Up to max_attack_rounds (default: 2). If exhausted, the plan is blocked and escalated to human.

AI Prompt Pattern

All critics use adversarial prompts: "Your ONLY job is to find failures. Do NOT suggest improvements. Do NOT be constructive. ATTACK."

Each critic receives the diff, task description, and relevant criteria. Output is structured JSON with verdict, failures (severity, description, evidence, remediation).

Extension Interface

Type Name Description
Event consumed subtask_completed Begin critic review
Event emitted subtask_verified All critics pass
Event emitted subtask_failed Critical/major failure found
Tool run-critics Run all/specified critics
Tool run-critic-focus Run single focus-area critic
Command /harness-critic-status Recent reviews, pass/fail counts

Config

{
  "critics": {
    "focus_areas": ["correctness", "spec_compliance"],
    "max_attack_rounds": 2
  }
}

Files

  • lib/harness-critics.ts — CriticAgent class, adversarial prompt templates
  • extensions/harness-critics.ts — Extension with review routing

Clone this wiki locally