Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
{
"name": "autodev",
"description": "Autonomous development workflow skills for coding agents",
"version": "6.1.5",
"version": "6.2.0",
"source": "./",
"author": {
"name": "Jon Langevin",
Expand Down
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "autodev",
"description": "Autonomous development workflow skills for coding agents: design, review, planning, execution, monitoring, and retrospectives",
"version": "6.1.5",
"version": "6.2.0",
"author": {
"name": "Jon Langevin",
"email": "jon@gocodealone.com"
Expand Down
2 changes: 1 addition & 1 deletion .cursor-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "autodev",
"displayName": "Autonomous Dev Kit",
"description": "Autonomous development workflow skills for coding agents",
"version": "6.1.5",
"version": "6.2.0",
"author": {
"name": "Jon Langevin",
"email": "jon@gocodealone.com"
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ adversarial review challenges it explicitly.

**Testing**
- **test-driven-development** - RED-GREEN-REFACTOR cycle (includes testing anti-patterns reference)
- **demonstration-fidelity** - A demo/example/showcase must execute the real artifact — no reimplementation, hard-coded output, or different-language fake

**Debugging**
- **systematic-debugging** - 4-phase root cause process (includes root-cause-tracing, defense-in-depth, condition-based-waiting techniques)
Expand Down
9 changes: 9 additions & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Autonomous Dev Kit Release Notes

## v6.2.0 — 2026-05-29

New skill **demonstration-fidelity** + an advisory write-time hook, closing a verification-theater gap: an agent writes real code, then "demonstrates" it with a demo that never executes the real artifact — reimplementing the logic, hard-coding the output, or rewriting it in another language. The demo proves nothing yet is presented as proof.

- **`skills/demonstration-fidelity/SKILL.md`** (host-neutral, load-bearing on every harness): a demonstration MUST execute the real artifact and show output produced by that run. Forbids reimplementation, hard-coded output, stubbing the artifact-under-demonstration, and detached prototypes — regardless of language. Allows substituting a *dependency* at a real interface seam **with disclosure**. Establishes "fidelity, not language sameness" (a real cross-language client crossing a real interface is valid), a 3-question fidelity test, a fake-vs-faithful example, and a rationalization table seeded from RED-baseline transcripts.
- **`hooks/pretool-demo-fidelity-guard`** (advisory, NEVER blocks; Claude + Codex + Cursor via `hooks.json`): on a Write/Edit to a demo-like path, injects a fidelity reminder pointing at the skill. Heuristic is anchored to path *segments* (`demos`/`examples`) + basename prefixes (`demo*`/`example*`/`showcase*`/`quickstart*`) with segment/suffix exclusions (`test`/`spec`/`testdata`/`fixtures`/`vendor` segments, `*_test.*`/`*.spec.*` basenames) — so `example_test.go`/`testdata/` are skipped while `examples/latest-feature-demo.py` still fires. Session dedup keyed on `basename(transcript_path)`; fails **open** (fires) on state I/O failure; honors `SUPERPOWERS_HOOKS_DISABLE=1`.
- **Pipeline wiring:** new `runtime-launch-validation` "Demonstration / example / showcase" change-class row (carving out artifact-stub-forbidden vs. disclosed-dependency-seam-allowed so it does not contradict RLV's "no stub on either end"); a `verification-before-completion` `demo/example works` claim-matrix row; a `finishing-a-development-branch` Step 1b demo note; `using-autodev` cross-cutting listing; README + `tests/cross-llm-coverage.md` rows.
- **Tests:** 22 `tests/hook-contracts.sh` assertions for the new guard (fires/silent/excluded/dedup/fail-open/disable-env/malformed-stdin/never-blocks). Skill is host-neutral (`skill-content-grep.sh`) and cross-refs resolve (`skill-cross-refs.sh`).

## v6.1.5 — 2026-05-28

SessionStart time-based dedup as defense in depth.
Expand Down
363 changes: 363 additions & 0 deletions docs/plans/2026-05-29-demonstration-fidelity-design.md

Large diffs are not rendered by default.

288 changes: 288 additions & 0 deletions docs/plans/2026-05-29-demonstration-fidelity.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/plans/2026-05-29-demonstration-fidelity.md.scope-lock
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
661d9faf234fdc5e4c8e2de72c6e4db95a0af91c49c47334dd5580ee079cc00f
10 changes: 10 additions & 0 deletions hooks/hooks.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@
"timeout": 10
}
]
},
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "\"${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd\" pretool-demo-fidelity-guard",
"timeout": 10
}
]
}
],
"PostToolUse": [
Expand Down
129 changes: 129 additions & 0 deletions hooks/pretool-demo-fidelity-guard
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#!/usr/bin/env bash
# hooks/pretool-demo-fidelity-guard
# PreToolUse hook (advisory, NEVER blocks): when an agent is about to write a
# demonstration/example artifact, remind it that a demo MUST execute the real
# artifact — no reimplementation, no hard-coded output, no stubbing the thing
# being demonstrated. Substituting a *dependency* at a real interface seam is
# allowed only if disclosed. See skills/demonstration-fidelity/SKILL.md.
#
# This is a best-effort, filename-detectable nudge only. The load-bearing
# defense is the demonstration-fidelity skill (which covers inline / README /
# normally-named / cross-language demos) plus the verification-before-completion
# "demo/example works" claim-matrix row. This hook never blocks and never reads
# file contents.
#
# Detection is anchored to path SEGMENTS + basename suffix globs — never bare
# substrings — so `latest`/`contest`/`attestation`/`inspector`/`spectrum`
# demos are not wrongly excluded while `example_test.go`/`testdata/`/`vendor/`
# are.
#
# Global opt-out: set SUPERPOWERS_HOOKS_DISABLE=1

set -euo pipefail

[ "${SUPERPOWERS_HOOKS_DISABLE:-}" = "1" ] && exit 0

# Require stdin (PreToolUse always sends a JSON payload).
[ -t 0 ] && exit 0
command -v jq >/dev/null 2>&1 || exit 0

hook_input=$(cat || true)
[ -z "$hook_input" ] && exit 0

tool_name=$(printf '%s' "$hook_input" | jq -r '.tool_name // empty' 2>/dev/null || true)
case "$tool_name" in
Write|Edit|MultiEdit) ;;
*) exit 0 ;;
esac

file_path=$(printf '%s' "$hook_input" | jq -r '.tool_input.file_path // empty' 2>/dev/null || true)
[ -z "$file_path" ] && exit 0

# Lowercase so Examples/, Demo*, Showcase match case-insensitively.
lc_path=$(printf '%s' "$file_path" | tr '[:upper:]' '[:lower:]')
base=${lc_path##*/}

# Split into path segments.
IFS='/' read -r -a segs <<< "$lc_path" || true

# ── Exclusion (segment-exact + basename suffix globs; never bare substrings) ─
# "${segs[@]:-}" form so an (unreachable) empty array can't trip `set -u` under
# bash 3.2 (macOS system bash, which run-hook.cmd may exec).
for seg in "${segs[@]:-}"; do
case "$seg" in
test|tests|spec|specs|testdata|fixtures|vendor|node_modules|.git) exit 0 ;;
esac
done
case "$base" in
*_test.*|*.test.*|*.spec.*|*_spec.*) exit 0 ;;
esac

# ── Trigger (segment-exact demos/examples, or basename prefix) ───────────────
fire=0
for seg in "${segs[@]:-}"; do
case "$seg" in
demos|examples) fire=1; break ;;
esac
done
if [ "$fire" -eq 0 ]; then
case "$base" in
demo*|example*|showcase*|quickstart*) fire=1 ;;
esac
fi
[ "$fire" -eq 0 ] && exit 0

# ── Session-scoped dedup ─────────────────────────────────────────────────────
# PreToolUse payloads carry transcript_path, NOT session_id (cf.
# hooks/pre-tool-scope-guard); derive the session key the same way.
transcript_path=$(printf '%s' "$hook_input" | jq -r '.transcript_path // empty' 2>/dev/null || true)
session_key=""
[ -n "$transcript_path" ] && session_key=$(basename "$transcript_path" 2>/dev/null || echo "")

cwd_dir=$(printf '%s' "$hook_input" | jq -r '.cwd // empty' 2>/dev/null || true)
[ -z "$cwd_dir" ] && cwd_dir="${PWD}"

dedup_key=""
if command -v sha256sum >/dev/null 2>&1; then
dedup_key=$(printf '%s' "${session_key}:${file_path}" | sha256sum 2>/dev/null | cut -d' ' -f1 || true)
elif command -v shasum >/dev/null 2>&1; then
dedup_key=$(printf '%s' "${session_key}:${file_path}" | shasum -a 256 2>/dev/null | cut -d' ' -f1 || true)
fi

state_dir="${cwd_dir}/.claude/autodev-state"
state_file="${state_dir}/demo-fidelity-seen"

if [ -n "$dedup_key" ]; then
# Already nudged this session for this path → stay silent.
# (grep used as an `if` condition; errexit does not trip on conditions, so it
# is NOT wrapped in `|| true` — wrapping it would force the condition true.)
if [ -f "$state_file" ] && grep -qxF "$dedup_key" "$state_file" 2>/dev/null; then
exit 0
fi
# Record. Fail-OPEN: any state I/O failure must fall through to EMIT, never
# suppress. Guarded so `set -euo pipefail` cannot fail-CLOSED on an unwritable
# state dir, and so a failed `>>` redirection cannot leak to stderr (the
# group redirect applies before the inner append is attempted).
if mkdir -p "$state_dir" 2>/dev/null; then
{ printf '%s\n' "$dedup_key" >> "$state_file"; } 2>/dev/null || true
fi
fi

reminder=$(cat <<'REMINDER'
<IMPORTANT>
You appear to be writing a demonstration/example artifact. A demo MUST execute the
real artifact and show its actual output. Do NOT reimplement the logic, hard-code
the output, or stub the thing being demonstrated. Substituting a *dependency* at a
real interface seam is allowed only if disclosed. See autodev:demonstration-fidelity.
</IMPORTANT>
REMINDER
)

emit_additional_context() {
local event_name="$1"
local context="$2"
jq -n --arg event "$event_name" --arg context "$context" \
'{hookSpecificOutput:{hookEventName:$event,additionalContext:$context}}'
}

emit_additional_context "PreToolUse" "$reminder"
exit 0
92 changes: 92 additions & 0 deletions skills/demonstration-fidelity/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
name: demonstration-fidelity
description: Use when creating a demo, example, quickstart, showcase, sample, or any artifact meant to prove an implementation works — before writing it. Triggers when about to "show it working", build a proof-of-concept, or generate sample output, especially under time pressure or when the real code is awkward to run. Catches fake demos that reimplement the logic, hard-code the output, or rewrite it in another language instead of executing the real artifact.
---

> Condensed format: load `autodev:condensed-pipeline-writing` to expand shorthand.

# Demonstration Fidelity

## Iron Law

**A demonstration must execute the real artifact, and the output it shows must be produced by that execution.**

A demo, example, quickstart, showcase, screenshot, or "here's it working" proof is a *claim that the code works*. If it doesn't run the real code, the claim is fabricated — however convincing the output looks. This operationalizes `autodev:verification-before-completion` for demo artifacts and is a sibling of `autodev:runtime-launch-validation`.

## Forbidden — regardless of language

- **Reimplementation** — re-coding the artifact's logic in the demo instead of calling it.
- **Hard-coded output** — hand-authoring "expected" output and presenting it as produced output.
- **Stubbing the artifact-under-demonstration** — wiring the demo to a fake *in place of the thing you are demonstrating*.
- **Detached prototype** — a parallel throwaway instead of the shipped entry point.

These prove nothing. They are fake code.

## Allowed — with disclosure

Substituting a **dependency** at a **real interface seam** (data store, external service, clock) so the demo runs locally — **provided** the artifact's own code path runs unchanged (you stubbed a *dependency*, not the artifact) **and** you state it plainly ("data source is an in-memory fixture; the handler is the real one"). This is the `autodev:runtime-launch-validation` posture (ephemeral DB row; Fall-back section). Disclosed seam-substitution is honest; faking the artifact is not.

## Fidelity, not language sameness

Cross-language is **not** the crime. A real client in another language crossing a **real interface** into the running artifact — e.g. a Python client making real HTTP calls to a running Go service — is valid, as long as that crossing is exercised (no stub on either end of *that* boundary). The question is always **"did the real code run to produce this output?"** — never "same language?".

## The 3-question fidelity test

1. **Execution:** does the demo call/import/invoke the real artifact — not a copy of it?
2. **Provenance:** was every value shown produced by that run and captured — not typed by you?
3. **Seams:** if you substituted anything, was it a *dependency* (not the artifact), and did you disclose it?

Any "no" → the demo is fake. Fix it before presenting.

## Example — fake vs. faithful

Artifact: Go `text.Dedupe(s string) string`.

**Fake** (different language, hard-coded — proves nothing):

```python
# demo.py — DO NOT DO THIS
print("BEFORE:\n a\n a\n b")
print("AFTER:\n a\n b") # hand-typed; Dedupe never ran
```

**Faithful** (runs the real function, prints its real return value):

```go
// demo/main.go
package main

import ("fmt"; "example.com/app/text")

func main() {
in := "a\n a\n b"
fmt.Printf("AFTER:\n%s\n", text.Dedupe(in)) // real output, captured by running it
}
```

If the module tooling is awkward, sidestep the *tooling* (throwaway module, ephemeral dependency) — never sidestep *execution*.

## Rationalizations — STOP

| Excuse | Reality |
|---|---|
| "Build/DB tooling is finicky — I'll just print the expected output." | Sidestep the tooling, not the execution. A throwaway module / in-memory dependency runs the real code; printed literals run nothing. |
| "A hard-coded demo looks identical on screen." | Looking identical is the trap. The value of a demo is that the real code produced it. |
| "Quicker to rewrite it in Python/bash for the demo." | Fine only if that script actually calls/crosses into the real artifact. A script printing literals is fake in any language. |
| "The real thing needs a DB/service I can't stand up." | Substitute the *dependency* at a real seam and disclose it; run the real artifact. Never fake the artifact. |
| "It's just for the meeting / illustrative." | A demo presented as proof is a claim — `autodev:verification-before-completion` applies. |
| "I simplified the logic for clarity." | A simplified reimplementation is a different program. Demo the real one. |

## Red flags

- The demo imports nothing from the module under demonstration.
- You typed or pasted the "output" instead of capturing a run.
- The demo is in another language and never crosses a real interface into the artifact.
- "Simulated" / "for demonstration purposes" / "pretend" appears in the demo.
- You have not actually run it and watched the output.

## See also

- `autodev:verification-before-completion` — evidence before any "works/done" claim (its claim matrix has a `demo/example works` row).
- `autodev:runtime-launch-validation` — launch the built artifact; its "Demonstration / example / showcase" change-class row points here.
- `autodev:scope-lock` — "there is no demo mode" for *partial scope* (distinct from fidelity).
2 changes: 2 additions & 0 deletions skills/finishing-a-development-branch/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ If NOT triggered (pure logic refactor, doc-only, test-only): skip this step.

**The launch transcript is required in the PR body when this step triggers.** Without it, the PR is not ready for merge — even if all unit tests pass.

**Demonstration artifacts:** if the change ships any demo/example/showcase/quickstart artifact (in this diff or the PR body), `autodev:demonstration-fidelity` applies — confirm the demo executes the real artifact (no reimplementation, hard-coded output, or different-language fake) before merge.

### Step 1c: Version-Skew Audit (conditional)

**Trigger:** the diff updates a non-dev-only version pin (any "version: vX.Y.Z", "image: foo:vX.Y.Z", or `<package>@vX.Y.Z`) — excludes dev-only tooling pins (linters, formatters) where skew is generally benign.
Expand Down
4 changes: 4 additions & 0 deletions skills/runtime-launch-validation/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ Triggered NOT by:
| Library / SDK | Import into a tiny consumer program, exercise the new public surface | Output, behavior matches docs |
| Plugin / extension | Load it into the host application, exercise a representative call | Host doesn't crash on load; representative call returns |
| Interface boundary change (new method, field, event type, or hook — see `agents/boundary-classes.md` for the canonical boundary-class list) | Launch both sides/participants as applicable; exercise a real interaction across the boundary — not a mock or stub on either end | The receiving side correctly processes the new data/method/event/hook; no fallback silently swallows the new path; failure-signature scrape clean on all participating sides |
| Demonstration / example / showcase artifact (anything built to show a change working) | The real artifact, invoked through its real entry point; capture output from that run | Output is produced by the real code path, not literals; the artifact-under-demonstration is NOT stubbed; any substituted *dependency* sits behind a real interface seam and is disclosed. See `autodev:demonstration-fidelity`. |

When a demonstration *also* exercises a new boundary, both this row and the "Interface boundary change" row apply: stub neither the artifact nor the boundary under test — only a disclosed *dependency* behind the artifact may be substituted.

## Failure-signature scrape

Expand Down Expand Up @@ -95,6 +98,7 @@ The constraint is not an excuse to skip; it's a request for help.
## See also

- `skills/verification-before-completion/SKILL.md` — general evidence-before-assertion principle
- `autodev:demonstration-fidelity` — demo/example/showcase artifacts must execute the real artifact (the "Demonstration" change-class row above)
- `skills/finishing-a-development-branch/SKILL.md` — Step 1b invokes this skill
- `skills/writing-plans/SKILL.md` — related planning guidance for per-change-class verification
- `agents/boundary-classes.md` — canonical definition of interface boundary classes (producer→consumer, caller→callee, sender→handler, plugin→host)
2 changes: 1 addition & 1 deletion skills/using-autodev/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ When multiple skills could apply, use this order:
3. **Pipeline skills auto-chain** — these invoke each other automatically in the autonomous pipeline:
brainstorming → adversarial-design-review (design phase) → writing-plans → adversarial-design-review (plan phase) → alignment-check → **scope-lock** → subagent-driven-development → finishing-a-development-branch → pr-monitoring → post-merge-retrospective

Cross-cutting skills invoked from within the pipeline when conditions trigger: `project-design-guidance` (before designs/plans and during retros when durable guidance changes); `recording-decisions` (when designs/plans make non-trivial trade-offs, including user-approved manifest amendments); `scope-lock` (re-checked at every per-task checkpoint and before PR creation); `condensed-pipeline-writing` (for dense internal design/review/plan artifacts).
Cross-cutting skills invoked from within the pipeline when conditions trigger: `project-design-guidance` (before designs/plans and during retros when durable guidance changes); `recording-decisions` (when designs/plans make non-trivial trade-offs, including user-approved manifest amendments); `scope-lock` (re-checked at every per-task checkpoint and before PR creation); `condensed-pipeline-writing` (for dense internal design/review/plan artifacts); `demonstration-fidelity` (before writing any demo/example/showcase/proof artifact — it must execute the real code, not fake it).

"Let's build X" → brainstorming first, then the pipeline runs autonomously after design approval.
"Fix this bug" → debugging first, then domain-specific skills.
Expand Down
1 change: 1 addition & 0 deletions skills/verification-before-completion/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Skip step = unverified claim.
| agent completed | inspect diff + verify | agent report |
| requirements met | checklist vs plan/design | tests alone |
| lint clean (Go-repo PR) | `golangci-lint run` exit 0 | tests green alone |
| demo/example works | the real artifact executed via the demo produced the shown output (see `autodev:demonstration-fidelity`) | hand-written/hard-coded output, a reimplementation, a different-language fake |

## Red Flags

Expand Down
Loading
Loading