Skip to content

Nightly 2026-04-26 — 6 productive cycles, +3 goals, fitness 79.8 → 92.7#147

Merged
boshu2 merged 7 commits intomainfrom
nightly/2026-04-26
Apr 26, 2026
Merged

Nightly 2026-04-26 — 6 productive cycles, +3 goals, fitness 79.8 → 92.7#147
boshu2 merged 7 commits intomainfrom
nightly/2026-04-26

Conversation

@boshu2
Copy link
Copy Markdown
Owner

@boshu2 boshu2 commented Apr 26, 2026

Autonomous nightly improvement run. 6 productive cycles, fitness score 79.8 → 92.7, three goals flipped from fail → pass, no auto-reverts.

Fitness delta

Goal Weight Baseline Final Δ
go-cli-tests 8 pass pass =
flywheel-compounding 8 fail fail =
go-cli-builds 8 pass pass =
flywheel-proof 7 pass pass =
wiring-closure 7 pass pass =
security-gate 6 pass pass =
go-complexity-ceiling 6 fail pass +
hook-preflight 6 pass pass =
skill-frontmatter 6 pass pass =
flywheel-lifecycle 6 pass pass =
manifest-versions-match 5 pass pass =
goals-validate 5 pass pass =
go-vet-clean 5 pass pass =
contract-compatibility 5 pass pass =
install-smoke 5 pass pass =
codex-parity-drift 5 pass pass =
compile-freshness 4 fail pass +
compile-no-oscillation 4 fail pass +
competitive-freshness 3 pass pass =

Score: 79.81651376 → 92.66055046

Per-cycle summary

# Type Target Commit Fitness before Fitness after
1 productive go-complexity-ceiling (w6) — split extractFromSubdir 25→14 f5ba96cd 79.8 87.2
2 productive flywheel-compounding (w8) diagnostic — surface σρδ + root cause 0d02baba 92.7* 92.7
3 productive New gate: runtime queue lint for next-work.jsonl schema enums b62fcf12 92.7 92.7
4 productive Refactor applyConfidenceDecayMarkdown 17→5 (CC head-room) 85ed30f8 92.7 92.7
5 productive Cover flywheel compare/close-loop + allowlist function-level surface 061bf4a2 92.7 92.7
6 productive L1 tests for cycle-4 helpers (parse/recent/end) 384311b9 92.7 92.7

*Compile-freshness/oscillation flipped to pass between cycle 1 and 2 because Dream's defrag-preview step wrote .agents/overnight/latest/defrag/latest.json, which the gates' fallback path consumes. Both gates depend on a runtime artifact that is gitignored, so the fix does not propagate via PR — the gates remain dependent on either ao defrag or ao overnight start having run locally first. Logged as a corpus-state nuance, not a code defect.

Heaviest goal investigation: flywheel-compounding (w=8)

This goal stayed fail through the run. Documented finding (in .agents/nightly/2026-04-26/audit.md):

  • σ=0.75, ρ=0, σρ=0, δ≈0.014–0.037 across cycles → escape threshold ≈ δ/100 never reached because ρ stays at 0.
  • Root cause: all 60 entries in .agents/ao/citations.jsonl are type retrieved (score 0.5, below the 0.7 high-confidence threshold). Zero applied or reference citations exist anywhere in the corpus.
  • Tried ao feedback-loop --reward 0.85: updated 13 utility values but did NOT reclassify citation_type, so ρ stayed 0.
  • Tried ao flywheel close-loop: skipped all 15 retrieved-only citations because they have no artifact evidence.
  • Conclusion: corpus-state issue, not a code defect. Genuine sessions must record ao lookup --cite applied|reference (or programmatic high-confidence citations) during productive work. No single-cycle code change moves ρ above 0 without gaming the metric.
  • Cycle 2 partial fix: replaced the bare jq -e ... .escape_velocity_compounding gate with scripts/check-flywheel-compounding.sh so future failures show σ ρ σρ δ threshold plus the dominant root cause instead of just false. Operators now see actionable signal.
  • Quarantine recommendation: tag flywheel-compounding as a long-cycle goal in GOALS.md so /evolve does not waste cycles on it until applied/reference citations start landing.

Findings opened / closed / deferred

  • Opened (digest-only):
    • dream-corpus-stale — Dream's rank-1 morning packet ("Write AgentOps philosophy doc from validated flywheel thesis") is already done (docs/philosophy.md exists, last_reviewed 2026-04-12). Surfaced as meta-finding.
    • dream-corpus-stale-rank3 — Dream's rank-3 packet ("Backfill next-work queue rows to schema v1.3 and add drift validation") cited evidence (severity=moderate, source=post-mortem rows in queue) that is not present in current .agents/rpi/next-work.jsonl (66 batches, 225 items, 0 schema violations). Cycle 3 still added the preventive validator since it is real future protection.
  • Closed via implementation:
    • dream-2463d728 (next-work schema runtime validator) — closed by cycle 3.
    • extractFromSubdir CC=25 (council-finding) — closed by cycle 1.
  • Deferred / surveyed-and-stale:
    • Fix go-test-precommit.sh to use stdin JSON pattern — already correct (INPUT=$(cat) + jq), no change needed.
    • Fix validate-learning-coherence.sh frontmatter-only file — already passes (14 files checked, 0 failures).
    • Implement sections.include allowlist semantics — already implemented in applyContextFilter.
    • Add intel_scope and section-name enum validation — already implemented (validateIntelScope).
    • Document RPI_RUN_ID env var contract — already documented in docs/ENV-VARS.md.
    • Update SKILL-TIERS.md diagram terminology — already updated to "council + knowledge".
    • Add no-jq fallback tests to lib-hook-helpers.bats — already covered (lines 996, 1034 of that test file).
    • Extract buildHandoffContext field-rendering helpers — already refactored.

The high rate of stale next-work items (8+ items found stale this run) is itself a corpus-quality signal: producers (post-mortems, dreams) emit faster than consumers retire, so the queue grows with phantom work. Worth a future Dream curator pass.

Auto-reverts

None. No goal with weight ≥ 3 regressed during the run.

Quarantined goals

  • flywheel-compounding (w=8) — recommend quarantine pending applied/reference citation infrastructure (multi-session corpus growth).

Dream meta-findings

  • dream-corpus-stale (rank 1): philosophy doc already exists.
  • dream-corpus-stale-rank3 (rank 3): cited drift evidence absent from current queue. Validator still landed — preventive value, no current violations.

bd / tracker degradation notes

  • bd CLI unavailable: command -v bd returns nothing, no scripts/install-bd.sh present in repo, .beads/ directory absent. Logged in baseline notes; cycles selected from heaviest-failing-goal + next-work queue + generator scans instead.

Scope-discipline notes

  • Worktree-disposition gate fails on a nightly branch (expects main) — known false positive, ignored per spec. No silencing flag exists; not worth introducing one in this run.
  • Tag push to nightly/2026-04-26 403'd as expected; falling back to branch ref. The nightly/2026-04-26 branch on origin is the anchor for tomorrow's audit.
  • Embedded hooks/skills sync verified via pre-push-gate.sh --fast (ok).

Commits

  • f5ba96cd refactor(harvest): split extractFromSubdir to satisfy cc≤20 ceiling
  • 0d02baba gate(flywheel-compounding): surface σρδ + root cause on fail
  • b62fcf12 feat(gate): runtime queue lint for next-work.jsonl schema enums
  • 85ed30f8 refactor(inject): split applyConfidenceDecayMarkdown to drop CC 17→5
  • 061bf4a2 test(cli): cover flywheel compare/close-loop and allowlist function-level surface
  • 384311b9 test(inject): pin extracted decay helpers (parse/recent/end)

(4341d419 is a parallel-session next-work refresh that landed on the branch before cycle 1 — not part of this session's productive count.)

Validation

  • cd cli && go run ./cmd/ao autodev validate --file ../PROGRAM.md --json → valid:true
  • cd cli && go test ./cmd/ao ./internal/autodev → ok
  • bash skills/heal-skill/scripts/heal.sh --strict → All clean. No findings.
  • scripts/pre-push-gate.sh --fast → only failure is the known nightly-branch false positive (worktree disposition expects main); all 32 actual checks pass or skip.
  • Final ao goals measure --json: PASS=18, FAIL=1, SCORE=92.66.

https://claude.ai/code


Generated by Claude Code

claude added 7 commits April 26, 2026 02:16
Dream 6-iteration run added 2 new finding IDs to the queue and re-ranked
existing ones. Morning packets propose:
  - Write AgentOps philosophy doc from validated flywheel thesis
  - Audit context injection latency — are we lazy-loading everywhere we should
  - Backfill next-work queue rows to schema v1.3 and add drift validation

https://claude.ai/code/session_01MkdcbdMtkrHRVNUJdMQiRi
Move the WalkDir closure body into helpers (handleWalkErr,
classifyWalkDir, isArtifactFile, readArtifactFile) so the per-file
processing pipeline reads top-down without nested branches.

Closes the 25→14 complexity gap that go-complexity-ceiling has been
flagging on extract.go since the OpenRoot/TOCTOU close.
Move the inline `bash -c '...jq -e .escape_velocity_compounding'` gate
into scripts/check-flywheel-compounding.sh so failing runs print the
σ/ρ/σρ/δ structure plus the dominant root cause (typically ρ=0 because
sessions only record `retrieved` citations) instead of a bare `false`
from `jq -e`.

Behavior:
  PASS  prints σ=… ρ=… σρ=… δ=… (compounding) and exits 0
  FAIL  prints σ=… ρ=… σρ=… δ=… threshold=… — <hint> and exits 1

When ρ=0 the hint specifically calls out applied/reference citations as
the missing input, which is the operator-actionable fix.
Add scripts/check-next-work-schema-rows.sh + bats tests that validate
each row in .agents/rpi/next-work.jsonl against the v1.3 enum sets
(type, severity, source, claim_status). Complements
validate-next-work-contract-parity.sh, which only checks doc/runtime
parity, by catching legacy or hand-edited rows that drift from the
schema.

Closes the dream-2463d728 packet ("Backfill next-work queue rows to
schema v1.3 and add drift validation"). Current queue is clean — gate
is preventive rather than corrective.

10 bats cases cover: missing-file pass, clean-row pass, drift on each
enum field (severity/type/source/claim_status), legacy flat rows,
malformed JSON, empty lines.
Extract three helpers from applyConfidenceDecayMarkdown:
  parseConfidenceField        — confidence parse with default
  mostRecentInteraction       — pick latest of last_decay_at/last_reward_at
  writeDecayedFrontmatter     — read-modify-write of frontmatter (with
                                 findFrontmatterEnd helper)

Pulls the function under the gocyclo warning threshold (15) so the
go-complexity-ceiling gate keeps its head room as new fields land.
…evel surface

Two slices to bring the cmdao-surface-parity gate green:

1. Add executeCommand-driven cobra tests for `flywheel compare` and
   `flywheel close-loop`, and extend the parent-expectations map in
   TestCobraCommandTreeRegistration so future drift is caught.

2. Allowlist five overnight-curator subcommands and two beads
   subcommands that already have function-level _test.go coverage but
   whose tests don't include the literal "ao <cmd>" string pattern the
   parity script greps for. Each entry cites the test file and a
   specific covered function/test name.

Result: parity gate goes from 8 uncovered commands → all 159 covered
or allowlisted.
Add focused L1 tests for the helpers introduced in the
applyConfidenceDecayMarkdown CC 17→5 refactor:

- parseConfidenceField: defaults on missing/zero/negative/unparseable
- mostRecentInteraction: picks latest, ignores bad timestamps, returns
  zero on absence
- findFrontmatterEnd: returns first closing `---`, -1 when missing

Locks the helpers' semantics so future edits don't silently change
behavior the integration test (TestApplyConfidenceDecay_*) doesn't
notice.
@boshu2 boshu2 merged commit 800eea8 into main Apr 26, 2026
32 checks passed
boshu2 pushed a commit that referenced this pull request Apr 26, 2026
hooks/write-time-quality.sh ran every Edit/Write but had zero test
coverage. A regression in any branch — Go fmt.Println in non-main, Python
bare-except / eval / missing-return-type-hint, shell missing
set -euo pipefail, the IS_TEST exemptions, the kill switch, the JSON
envelope shape — would silently degrade quality signal.

Add a 16-case bats fixture covering:
  - tool-name filter (only Edit/Write trigger)
  - missing/non-existent file are silent
  - unsupported extension is silent
  - AGENTOPS_HOOKS_DISABLED kill switch short-circuits
  - Go: fmt.Println warns in non-main packages, silent in main and *_test.go
  - Python: bare except warns; eval warns outside tests, silent in test_*.py;
    missing return-type-hint on def-without-arrow warns
  - Shell: missing 'set -euo pipefail' warns; presence suppresses warning
  - JSON envelope (stdout-only) parses and includes hookEventName, file,
    language, warning_count, warnings array

Each scenario uses a per-test temp file so cases don't bleed state. Pure
test addition; no production code changed.

NOTE: post-commit fitness measurement showed flywheel-proof transiently
fail due to a 503 on sum.golang.org (DNS cache overflow downloading the
go1.26.0 toolchain) — same network-flake mode PR #147 and #150 documented
on the same gate. Re-measure passes (score 92.66). Not caused by this
cycle (only test files touched).

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T
boshu2 pushed a commit that referenced this pull request Apr 26, 2026
hooks/standards-injector.sh maps .js → "javascript" and reads
skills/standards/references/javascript.md, but the file did not exist —
so every .js Edit/Write silently dropped the standards-context inject.
The hook's "fail-open on missing file" guard hid the gap.

Add references/javascript.md (Tier 1 baseline: ESM, prettier+eslint,
const/let, async/await, eqeqeq, common pitfalls, security defaults)
and link it in skills/standards/SKILL.md (table row + linked-references
list — required by skills/heal-skill --strict and the cmd/ao
TestSkillContract_ReferencesLinkedInSKILLMD test).

Sync the embedded copy via `cd cli && make sync-hooks` so the runtime
manifest matches the source. Add a 12-case bats fixture for
standards-injector.sh covering all six languages (go, ts, tsx, sh, js,
yaml/yml), the extensionless / missing / unsupported / kill-switch
silent paths, and exact-body-match assertions against the on-disk
references files.

Verified:
  - hooks/standards-injector.sh on /x.js now returns 2111-byte body
    matching the new file
  - cd cli && go test -race ./cmd/ao -run TestSkillContract — pass
  - bash skills/heal-skill/scripts/heal.sh --strict — All clean
  - cd cli && make sync-hooks idempotent

NOTE: post-commit measurement shows flywheel-proof failing — same
network-environmental issue as cycle 8 (sum.golang.org 503 / DNS cache
overflow when the proof-run script downloads the go1.26.0 toolchain
into a fresh HOME). System Go is 1.24.7 but go.mod requires 1.26.0,
so GOTOOLCHAIN=local fallback also fails. Not caused by this cycle —
the proof-run path does not touch standards or hooks. Same pattern
PR #147 and #150 documented and shipped through.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T
boshu2 added a commit that referenced this pull request Apr 26, 2026
…-driven goals (corpus-state isolated) (#152)

* gate(flywheel-compounding): split σ=0/ρ=0 dormant hint from ρ=0-only

The flywheel-compounding gate had one branched hint (ρ=0 → "use --cite
applied|reference"), but ρ=0 covers two distinct corpus states:

- σ=0 AND ρ=0 — no citations of ANY kind in the measurement window;
  the corpus is dormant. The fix is "run any ao lookup", not "switch
  --cite kind". The high-confidence hint is misleading here.
- σ>0 AND ρ=0 — citation activity exists but only as retrieved-only
  hits; the existing hint applies.

Add the σ=0 ρ=0 → dormant branch and a 6-case bats fixture pinning the
three hint branches (PASS, σ=0 ρ=0 dormant, ρ=0-only, generic) plus the
ao-failure path. Operators now see the right remediation per failure
mode without inferring it from the σρδ numbers.

This is a heavy-goal observability improvement, not a metric flip — the
goal stays fail until corpus citations land over multiple sessions.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* refactor(codex_runtime): split detectLifecycleRuntimeProfile (CC 20→<14)

detectLifecycleRuntimeProfileWithOptions sat at the cli/ CC ceiling (20).
Any future case-arm tweak (e.g., a new runtime kind, or a new sub-state in
the existing four) would have pushed it past the gate's threshold.

Refactor: bundle the per-runtime config paths into a small struct
(lifecycleManifestPaths) shared by four per-runtime helpers
(populateCodexProfile / populateClaudeProfile / populateOpenCodeProfile /
populateUnknownProfile). The detector body shrinks to a switch over the
four helpers; each helper is straight-line and testable in isolation.

Behavior unchanged — verified via:
  - go test -race ./cmd/ao -run "Lifecycle|Codex|Runtime"
  - ./bin/ao codex status --json (live invocation, same JSON shape and
    same "Detected Codex runtime without native hook support" reason)
  - go-complexity-ceiling gate: cli/ <20, cli/internal/ <18

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* test(hooks): pin research-loop-detector behavior across 14 cases

The PostToolUse research-spiral detector at hooks/research-loop-detector.sh
had zero test coverage. A bad edit to the threshold ladder, the
read-only-bash classification, the kill-switch short-circuits, or the JSON
nudge formatting would ship silently.

Add a bats fixture covering:
  - counter increment on Read/Grep/Glob/WebSearch/WebFetch
  - WARN/STRONG/STOP threshold transitions at 8/12/15 with the exact
    nudge text for each band
  - reset on Edit/Write/NotebookEdit
  - read-only Bash (grep/rg/cat/...) increments; execution Bash resets
  - AGENTOPS_HOOKS_DISABLED and AGENTOPS_RESEARCH_LOOP_DISABLED kill
    switches both short-circuit before any state mutation
  - threshold env-var overrides (AGENTOPS_RESEARCH_WARN_THRESHOLD)
  - STOP precedence over STRONG/WARN when all three are tied at 1
  - emitted JSON parses round-trip via jq -e

Run against the live hook in a tmpdir mock-repo to keep tests
hermetic. All 14 scenarios PASS. Pure-test addition: no production code
touched, no fitness regression.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* refactor(notebook): split runNotebookUpdate (CC 19→11) for headroom

runNotebookUpdate sat at CC=19 — close to the cli/ ceiling of 20 — and
mixed three concerns: memory-file resolution, entry resolution, and the
update pipeline itself. A single new branch (e.g., a third entry source)
would have failed the gate.

Extract two helpers:
  - resolveNotebookMemoryFile(cwd) (string, bool)
  - resolveNotebookEntry(cwd) *pendingEntry

Each is straight-line and individually testable; the main function now
reads as a four-step pipeline (memory-file → entry → cursor-skip →
parse/render/write).

Behavior preserved — `ao notebook update --quiet` exit 0, no output, no
state mutation when no MEMORY.md / no session entry. All cmd/ao tests
pass; CC drops to 11 (well clear of the 20 ceiling).

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* test(beads): pin five 0%-coverage helpers behind 19 cases

Five small pure helpers in cli/cmd/ao/beads.go and beads_audit_cluster.go
had 0% line coverage:
  - beadMinInt — drives matches[:min(3, len)] citation clipping
  - beadTruncate — wraps the bd parse-error message
  - representativeIsEpic — picks epic vs leaf rendering for cluster output
  - firstNNonEmptyLines — derives the cluster summary excerpt
  - sortedMapKeys — supplies deterministic JSON ordering

A regression in any of them would corrupt user-visible output silently
(wrong message text, garbled cluster summary, non-deterministic JSON
ordering breaking diffs) rather than panicking. None had a test pinning
behavior.

Add 19 cases covering: smaller-of-two and equal-args boundaries (incl.
negatives and zeros), under/at/over the truncation limit (incl. n=0
on non-empty), epic-found / leaf-found / representative-missing /
empty-cluster branches of representativeIsEpic, whitespace-handling
and trim semantics of firstNNonEmptyLines, deterministic key order of
sortedMapKeys regardless of bool values.

All cases assert exact expected values (per .claude/rules/go.md). No
production code touched; fitness unchanged at 92.66.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* refactor(contradict): split runContradict (CC 19→5) into 5 helpers

runContradict bundled four concerns at CC=19 — close to the cli/ ceiling
of 20: directory existence checks, file collection, entry parsing,
pair-comparison loop, and dual-format output. A new file source or a
new output format would have failed the gate.

Extract:
  - collectContradictFiles: globs *.jsonl + *.md from learnings/patterns
  - parseContradictEntries: reads + tokenizes, drops empty/zero-word files
  - compareContradictPairs: O(n²) jaccard ≥ 0.4 + detectContradiction
  - relPathOrAbs: Rel-with-fallback path helper (lifted from inline blocks)
  - emitContradictResult: JSON-or-human writer

Behavior preserved — verified via:
  - go test ./cmd/ao -run Contradict
  - ./bin/ao contradict (human output identical: 20 files, 190 pairs)
  - ./bin/ao contradict --output json (same {"total_files":20,...} shape)

CC drops: runContradict 19→5; new helpers all ≤6. Headroom for future
file-source additions.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* refactor(rpi_serve): split serveRPIState (CC 19→5) into 4 helpers

serveRPIState mixed five HTTP-handler concerns at CC=19 — close to the
cli/ ceiling: query-param parsing/validation, run-id resolution against
the registry, fallback phased-state.json read, per-phase result
gathering, and the active-runs listing. A new state source or response
key would have failed the gate.

Extract:
  - parseServeStateRunID: Validate run-id, write 400 on path traversal
  - resolveStateForRunID: Look up the run via resolveServeRun, write to
    resp on success, return the resolved root
  - loadFallbackPhasedState: Read .agents/rpi/phased-state.json directly
    only if the resolver did not already populate phased_state
  - loadPhaseResults: Gather phase-{1,2,3}-result.json into a phase_N map

Behavior preserved — verified via:
  - go test ./cmd/ao -run TestServeRPIState (existing handler test)
  - go test ./cmd/ao (full package, 30s, all pass)
  - go vet clean

CC drops: serveRPIState 19→below-5 (not in --threshold 5 listing); each
new helper ≤6.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* test(hooks): pin write-time-quality across 16 per-language scenarios

hooks/write-time-quality.sh ran every Edit/Write but had zero test
coverage. A regression in any branch — Go fmt.Println in non-main, Python
bare-except / eval / missing-return-type-hint, shell missing
set -euo pipefail, the IS_TEST exemptions, the kill switch, the JSON
envelope shape — would silently degrade quality signal.

Add a 16-case bats fixture covering:
  - tool-name filter (only Edit/Write trigger)
  - missing/non-existent file are silent
  - unsupported extension is silent
  - AGENTOPS_HOOKS_DISABLED kill switch short-circuits
  - Go: fmt.Println warns in non-main packages, silent in main and *_test.go
  - Python: bare except warns; eval warns outside tests, silent in test_*.py;
    missing return-type-hint on def-without-arrow warns
  - Shell: missing 'set -euo pipefail' warns; presence suppresses warning
  - JSON envelope (stdout-only) parses and includes hookEventName, file,
    language, warning_count, warnings array

Each scenario uses a per-test temp file so cases don't bleed state. Pure
test addition; no production code changed.

NOTE: post-commit fitness measurement showed flywheel-proof transiently
fail due to a 503 on sum.golang.org (DNS cache overflow downloading the
go1.26.0 toolchain) — same network-flake mode PR #147 and #150 documented
on the same gate. Re-measure passes (score 92.66). Not caused by this
cycle (only test files touched).

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* fix(standards): add javascript.md so .js Edit/Write injects standards

hooks/standards-injector.sh maps .js → "javascript" and reads
skills/standards/references/javascript.md, but the file did not exist —
so every .js Edit/Write silently dropped the standards-context inject.
The hook's "fail-open on missing file" guard hid the gap.

Add references/javascript.md (Tier 1 baseline: ESM, prettier+eslint,
const/let, async/await, eqeqeq, common pitfalls, security defaults)
and link it in skills/standards/SKILL.md (table row + linked-references
list — required by skills/heal-skill --strict and the cmd/ao
TestSkillContract_ReferencesLinkedInSKILLMD test).

Sync the embedded copy via `cd cli && make sync-hooks` so the runtime
manifest matches the source. Add a 12-case bats fixture for
standards-injector.sh covering all six languages (go, ts, tsx, sh, js,
yaml/yml), the extensionless / missing / unsupported / kill-switch
silent paths, and exact-body-match assertions against the on-disk
references files.

Verified:
  - hooks/standards-injector.sh on /x.js now returns 2111-byte body
    matching the new file
  - cd cli && go test -race ./cmd/ao -run TestSkillContract — pass
  - bash skills/heal-skill/scripts/heal.sh --strict — All clean
  - cd cli && make sync-hooks idempotent

NOTE: post-commit measurement shows flywheel-proof failing — same
network-environmental issue as cycle 8 (sum.golang.org 503 / DNS cache
overflow when the proof-run script downloads the go1.26.0 toolchain
into a fresh HOME). System Go is 1.24.7 but go.mod requires 1.26.0,
so GOTOOLCHAIN=local fallback also fails. Not caused by this cycle —
the proof-run path does not touch standards or hooks. Same pattern
PR #147 and #150 documented and shipped through.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

* fix(proof-run): reuse cli/bin/ao when present so 503s on sum.golang.org
don't fail flywheel-proof

tests/e2e/proof-run.sh always rebuilt ao in a fresh \$HOME, so each
gate invocation re-downloaded the go1.26.0 toolchain via
sum.golang.org. When the sum DB returns 503 ("DNS cache overflow")
the entire flywheel-proof gate (w=7) fails — even though the local
cli/bin/ao is fresh and behavior is testable.

Three changes:
  - PROOF_AO_BIN=/path env override: caller can pin a pre-built binary
  - Auto-detect \$REPO_ROOT/cli/bin/ao when present (and the override
    is unset) — covers the common case where `make build` ran first
  - PROOF_FORCE_BUILD=1 escape hatch: opt back into build-from-source
    when the goal IS to verify the toolchain path

`require_cmd go` now only fires on the build path, so machines without
go installed can still run the proof against a shipped binary.

Verified:
  - bash tests/e2e/proof-run.sh — auto-detects cli/bin/ao, all 20
    flywheel checks PASS in ~6s (was failing in 90s before)
  - PROOF_FORCE_BUILD=1 — still attempts go build (so the toolchain-
    path regression test still exists)
  - PROOF_AO_BIN=/path/to/ao — copies binary, skips build

flywheel-proof flips fail→pass after this cycle. This is a code-driven
flip (the script is the gate's only build path), not a runtime artifact.

https://claude.ai/code/session_01TVzMVJ8FXdctstCrzTcM7T

---------

Co-authored-by: Claude <noreply@anthropic.com>
@boshu2 boshu2 deleted the nightly/2026-04-26 branch April 27, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants