chore: update all dependencies to latest versions#2
Merged
Conversation
- @changesets/cli 2.27.10 → 2.31.0 - @eslint/js 9.x → 10.0.1 - @types/node 22.x → 25.6.0 - @vitest/coverage-v8 2.x → 4.1.4 - eslint 9.x → 10.2.1 - eslint-config-prettier 9.x → 10.1.8 - lint-staged 15.x → 16.4.0 - prettier 3.4 → 3.8.3 - typescript 6.0.0 → 6.0.3 - typescript-eslint 8.18 → 8.58.2 - vite 6.x → 8.0.8 - vite-plugin-dts 4.4 → 4.5.4 - vitest 2.x → 4.1.4 All 288 tests pass, typecheck clean, lint clean, build succeeds. https://claude.ai/code/session_01RnaVtvGe6LYDDntRUAxuyA
Luis85
pushed a commit
that referenced
this pull request
Apr 23, 2026
- §5.1: split training seed from agent.rng via demo-local trainRng to preserve tick-replay determinism (seed mutating agent.rng mid-session would perturb subsequent ticks) - §3.2: commit to the hand-rolled TfjsSnapshot split as the public contract; note §10.1's native-format alternative as an amendment path, not a silent pivot - §3.2: one-line rationale for unbounded In/Out generics (no tfjs equivalent of BrainJsNetworkData worth importing) - §4.3 #2: locate the LCG + Fisher-Yates shuffle as module-local helpers in TfjsReasoner.ts (not exported, not shared) - §9: reframe graphify update as an author-side chore per CLAUDE.md graphify section, not a repo script Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Luis85
pushed a commit
that referenced
this pull request
Apr 24, 2026
Two real round-6 findings: - **P1: flush() must re-check inflight after the await.** The earlier fix marked flush-driven training in-flight, but the initial `if (this.inflight !== null) await this.inflight;` only ran once. While `flush()` was awaiting batch #N, that batch's drain-tail could schedule batch #N+1 and set `this.inflight` again — `flush()` would then call `runTrain()` in parallel. Loop instead: keep awaiting `inflight` until it stays null. - **P2: honour bufferCapacity: Infinity.** Docs already advertised `Infinity` as the unbounded escape hatch, but `Number.isFinite` rejected it and the learner silently fell back to the derived default. Three-way coercion: `Infinity` passes through, NaN / -Infinity fall back to the default, finite values get truncated + clamped to ≥ 0. Two regression tests: - Capacity = Infinity buffers 250 outcomes without dropping any. - Three-batch interleave: auto-batch #1 in flight, drain-tail schedules #2, flush() must wait through both before resolving null — never firing a parallel train#3.
Luis85
pushed a commit
that referenced
this pull request
Apr 24, 2026
…arden Incorporates the 2026-04-24 review findings as a mandatory "remediation" track that lands BEFORE CI / demo / lib work: - PR #1 fix/agent-restore-replace-modifiers (MAJOR — restore must replace, not merge, modifier state). - PR #2 fix/localstorage-store-keyspace-collision (MAJOR — split data/ from meta/ in the localStorage key namespace, add legacy migration so existing browsers don't lose saved pets). - PR #3 fix/pick-default-store-throwing-localstorage (MAJOR — guard the localStorage probe so SecurityError-throwing getters don't crash store selection). - PR #4 fix/fs-store-deterministic-list-order (MINOR — sort the readdir output via localeCompare for cross-platform stability). Renumbers the existing tracks (CI/demo/lib) to follow the remediation block and updates all internal cross-references. Adds a "Workflow" section codifying the per-session loop: independent branches, batch open all PRs, then multi-pass Codex sweep until 👍, resolve review threads, owner merges. Same loop captured in `MEMORY.md → feedback_pr_workflow.md`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
Luis85
pushed a commit
that referenced
this pull request
Apr 24, 2026
Codex P1 #1 (line 159): migration skipped entirely when the injected backend lacked length/key, but StorageLike only requires getItem / setItem / removeItem. Persistent custom adapters that satisfy only the required contract (node-localstorage-style, custom IndexedDB shims) would keep legacy snapshots at the old `{prefix}{key}` path while load() / list() read the new data/ + meta/index paths — data unreachable post-upgrade. Restructure migrateLegacyKeys into two discovery paths: - Legacy-index lookup. Always runs. Reads the known legacy path directly via getItem (no iteration needed) and migrates every user key the index lists. Covers custom adapters. - Orphan scan. Runs only when the backend exposes length + key(i). Picks up entries whose registration in the legacy index was lost (the original v1 collision bug). Filter on new-layout subpaths keeps it re-entrant. Codex P1 #2 (line 182): an empty prefix would make startsWith(prefix) true for every storage key, so migration could rewrite and delete unrelated application data on first construction after upgrade. Reject empty prefix at the constructor boundary — fail loudly before any storage write. Size budget: dist/index.js gzip grew to 35.36 KB with the restructured migration, over the previous 35 KB limit. Bump the budget to 50 KB per owner guidance so CI stops gating on the wafer-thin margin. Current usage 35.09 KB / 50 KB. Regression tests added: - NonIterableStorage (getItem/setItem/removeItem only) with legacy index migrates end-to-end. Legacy paths cleared; new paths present. - Empty prefix throws in constructor with a clear message. - Empty-prefix guard does not corrupt pre-existing unrelated storage data — the throw fires before any mutation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Luis85
added a commit
that referenced
this pull request
Apr 26, 2026
* test(cognition): verify MistreevousReasoner.reset clears BT state
Existing reset() already matched the port contract — adds JSDoc pointing
at the port + a unit test asserting BT RUNNING → reset → READY. No
behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(cognition): verify JsSonReasoner.reset restores initial beliefs
Existing reset() already matched the port contract — adds JSDoc pointing
at the port + a unit test asserting mutated beliefs → reset → initial
beliefs. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(cognition): document BrainJsReasoner's reset opt-out
Adds a class-level JSDoc paragraph explaining why this adapter does not
implement Reasoner.reset(): stateless at selection time, consumer-owned
weights. The kernel's optional-chain reset?.() handles the absence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(changeset): 0.9.4 Reasoner.reset harmonization
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): validate reasoner.reset is callable at setReasoner time
Addresses Codex review on #53. Optional chaining guards null/undefined
but throws if reset is a truthy non-function (e.g. a JS consumer
assigning reset: 'foo'). Reaching that throw inside restore() would
leave the agent partially rehydrated. Move the check to setReasoner to
fail fast at swap-time, matching the existing selectIntention guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(demo): extend brain.js stub with train/toJSON + stable run
Prepares the stub for 0.9.3's training-persistence tests. run() now
returns a stable [0.5] so construct() and urgency-gate logic are
testable without the native peer. train() records the last pair batch;
toJSON() returns a deterministic sentinel. No behavior change for
existing tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(demo): mode-gated Train button for learning mode
Mounts <button id='train-network' hidden> inside #cognition-switcher.
Visibility toggles with the selected cognition mode — shown only when
'learning' is active. Click handler lands in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(demo): wire Train button to generate pairs + persist network
Click handler generates 30 synthetic (needs → urgency) pairs from the
demo's seeded RNG, runs network.train() with 100 iterations, and
writes network.toJSON() to agentonomous/<agentId>/brainjs-network.
Button disables + shows 'Training…' during the synchronous train call
(via a microtask yield so the DOM paints before the blocking loop) and
reverts on completion. Status span flashes 'Trained ✓' on success.
Extends the test stub with NeuralNetwork.last + lastTrainPairs() /
lastFromJSON() so tests can inspect what learning.ts pushed through
the peer without piping the network through application code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(demo): hydrate learning-mode network from localStorage
construct() checks agentonomous/<agentId>/brainjs-network first and
falls back to the bundled learning.network.json default asset when the
key is absent or unparseable. Corrupt stored values silently revert
to default — the Train button regenerates valid state on next click.
agentId is injected via a module-scoped setLearningAgentId() setter
called from main.ts after the agent is created. Keeps the
CognitionModeSpec.construct() signature unchanged — no other mode
needs the agent id.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(demo): Reset also wipes the trained brainjs-network
resetSimulation now removes agentonomous/<agentId>/brainjs-network
alongside the snapshot + index keys. Reset stays a single "fresh
start" concept — the next learning-mode construct() falls back to
the bundled default network asset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(demo): urgency-gate interpret() in learning mode
Network scalar output is now wired into intention selection as an
urgency gate: the pet idles this tick when the network's score falls
below URGENCY_THRESHOLD (0.35). Visible demo effect — trained and
untrained networks produce different idle rates, making training
observable in the trace view. Threshold is empirical; may be tuned
during manual smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(demo): address review nits in learning-mode comments
- URGENCY_THRESHOLD JSDoc now gives direction for manual-smoke tuning
(up toward 0.5 if post-train idle rate stays flat; down toward 0.2
if the pet rarely acts).
- agentIdForHydration JSDoc drops the plan-file reference (comments
referencing tasks rot once the plan is archived) and keeps just
the why.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(demo): recover learning mode when stored network fails fromJSON
The previous hydration path guarded JSON.parse but not fromJSON
itself, so a parseable-but-schema-invalid stored payload (manual
edit, prior format, partial migration) would reject construct()
and leave the switcher stuck with Learning mode disabled until the
user clicked Reset. Fall back to the bundled default asset inside
construct() so Learning mode stays selectable across such payloads —
localStorage is a user-editable boundary where validation is
warranted.
Test stub gains a one-shot throwOnNextFromJSON flag so the recovery
path is exercised deterministically without shipping a synthetic
bad-payload fixture.
Addresses Codex review on PR #54.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): unify AgentFacade.publishEvent with internal publish path
facade().publishEvent wrote straight to eventBus.publish, which bypassed
both emittedThisTick (trace inclusion) and autoSaveTracker.observeEvent
(autosave triggers). Reactive handlers and module onInstall hooks that
published events produced traces that disagreed with what subscribers
saw, and their events never triggered event-gated autosaves.
Route through _internalPublish — same path the skill context already
uses. No public API change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(fixes): add stability fixes
* chore(persistence): remove dead AgentSnapshot.pendingEvents field
The field was declared at AgentSnapshot.ts:84 but nothing in src/
populated or read it. Dropping it (and the now-unused DomainEvent
import) aligns the public type with reality. A regression test in
tests/unit/persistence/AgentSnapshot-shape.test.ts pins the absent
key so a future change can't quietly resurrect it without an
implementation.
Type-level breaking change — shipped as minor. Wire format is
byte-identical: the field was optional and JSON.stringify(snapshot)
already omitted it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(persistence): reversible percent-encoding for FsSnapshotStore keys
The previous sanitizeKey replaced every non-[A-Za-z0-9._-] character
with '_', so 'user/1', 'user_1', and 'user 1' collided to the same
file — silent data loss on save, ambiguous key recovery from list().
encodeKey/decodeKey now use reversible percent-encoding (UTF-8
byte-wise %XX) and are exported for direct unit testing. pathFor()
encodes; list() decodes. First dedicated test file for the store
covers round-trip, collision avoidance, UTF-8, and end-to-end
save/load/list/delete via an in-memory FsAdapter stub.
Breaking on-disk format for Node consumers with existing snapshots;
documented in the changeset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(persistence): FsSnapshotStore.list() tolerates undecodable filenames
Post-#57, list() mapped decodeKey (decodeURIComponent) over every
`.json` basename. decodeURIComponent throws URIError on any malformed
%XX sequence, so a single foreign file in a shared snapshot directory
(e.g. `bad%ZZ.json`) would reject the whole call — a regression from
the pre-#57 implementation which never decoded at all.
list() now catches decode errors per entry and skips the offending
file. Such names can't round-trip through key-based load() anyway;
surfacing them would just hand callers an unusable key. One new test
pins the skip-on-malformed behavior.
Addresses Codex review P2 feedback on #57.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(skills): fail fast on duplicate SkillRegistry.register()
register() now throws DuplicateSkillError when a skill with the same
id is already registered. replace() is the new explicit API for
intentional overrides. Silent overwrites were the most common source
of "my skill works in isolation but not when I add module X" bugs —
surfacing the conflict at registration time makes the fix obvious.
createAgent's module-skill auto-install loop now guards with
skills.has(id), so consumer-pre-registered skills take precedence
over module defaults. Demo main.ts drops redundant pre-registration
of defaultPetInteractionModule.skills — createAgent installs them
automatically.
First dedicated test file for SkillRegistry covers register() throw,
registerAll() partial-registration on duplicate, replace() overwrite,
and error-payload shape.
Breaking for consumers relying on silent overwrite. Migration path
(replace() or drop redundant pre-registration) documented in
changeset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(agent): fail fast on module-vs-module skill id collisions
The earlier has()-guard treated every pre-existing id uniformly, so
two modules contributing the same skill silently settled on
"first module wins" — exactly the kind of silent conflict this PR
exists to surface.
Snapshot the consumer's pre-registered ids before the module loop
runs; the skip branch matches only those. Module-contributed
duplicates fall through to the unguarded register() call, which
throws DuplicateSkillError.
Three new createAgent tests cover:
- consumer pre-registered skill wins over module skill with same id
- two modules with the same skill id throw DuplicateSkillError
- one module listing the same skill twice throws
Addresses Codex P1 review feedback on PR #59.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add Graphify to ignore
* docs(spec): design for tfjs cognition adapter swap
Captures the brainstorm decisions for replacing the brain.js-backed
Learning mode with a TensorFlow.js adapter: module layout, plain-JS
public API, determinism/backend policy, persistence format, demo
baseline, test strategy, file delta, and known verification points.
Approved for handoff to writing-plans.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plans): relocate plans to docs/plans with date-prefix naming
Move .claude/plans/* to docs/plans/YYYY-MM-DD-<slug>.md using
each file's first-commit date, dropping the 0.9.x version prefix.
Overrides the superpowers writing-plans default location via a new
"Plans & specs location" section in CLAUDE.md. Rewrites internal
cross-refs in the four affected plan files plus one code comment
(src/agent/Agent.ts) and one spec header (tfjs adapter design).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(spec): address reviewer pass 2 on tfjs adapter design
- §5.1: split training seed from agent.rng via demo-local trainRng to
preserve tick-replay determinism (seed mutating agent.rng mid-session
would perturb subsequent ticks)
- §3.2: commit to the hand-rolled TfjsSnapshot split as the public
contract; note §10.1's native-format alternative as an amendment path,
not a silent pivot
- §3.2: one-line rationale for unbounded In/Out generics (no tfjs
equivalent of BrainJsNetworkData worth importing)
- §4.3 #2: locate the LCG + Fisher-Yates shuffle as module-local helpers
in TfjsReasoner.ts (not exported, not shared)
- §9: reframe graphify update as an author-side chore per CLAUDE.md
graphify section, not a repo script
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* prep and fixes before switching to tensorflow instead of brainjs
* docs(spec): correct reset() decision for TfjsReasoner
Reviewer pass 1 asked TfjsReasoner to implement reset() restoring
construct-time weights. Re-reading Reasoner.ts shows this contradicts
the interface contract: reset() is for ephemeral between-tick state
only, and "trained network weights MUST be preserved." BrainJsReasoner
already opts out for the same reason.
TfjsReasoner now follows the same precedent — no reset() method. §7.1
test flipped from "reset() restores weights" to asserting reset is
undefined so any future accidental addition re-opens the decision
loudly. §6.3/§6.6 size-budget rationale tightened to mention train /
toJSON / fromJSON / base64 codec instead of the removed reset() state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plan): implementation plan for tfjs cognition adapter
Eight-chunk TDD plan covering topic-branch setup, snapshot codec,
inference core, deterministic training, persistence, library wiring,
demo migration, and final brainjs cleanup + changeset + verify. Each
chunk ends with a gate-green commit; final chunk opens the PR.
Derived from docs/specs/2026-04-24-tfjs-cognition-adapter-design.md
(pass-3 approved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plan): address pass-1 reviewer feedback on tfjs plan
- Chunk 1: replace hardcoded "399 tests" baseline with a capture-and-
reference step so the plan stays current as develop moves.
- Chunk 2: note LE-endian assumption on TfjsSnapshot's base64 payload;
fix the 400 vs 404 test-count arithmetic.
- Chunk 3: drop the `void encodeWeights; void decodeWeights;` lint
workaround; use a type-only import instead and add the runtime
imports in Chunk 5.
- Chunk 4: broaden the determinism-fallback note to cover both the
finalLoss and the history.loss deep-equal assertions together.
- Chunk 6: add missing Task 6.1b — vitest subpath alias for
agentonomous/cognition/adapters/tfjs.
- Chunk 7: promote the topology-verification REPL to an explicit
checkbox step with cleanup; replace `cat | head` with a prose Read-
tool instruction.
- Chunk 8: add three missing vite.config.ts cleanups (brainjs
ambientDtsEntries block, brainjs subpath vitest alias, header comment
block); add Task 8.4a to delete the un-consumed old brainjs
changeset; add a `gh auth status` pre-flight ahead of `gh pr create`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(deps): add tensorflow/tfjs-core + layers + backend-cpu
Preparing to swap the cognition/adapters/brainjs subpath for a tfjs-
backed TfjsReasoner (see docs/specs/2026-04-24-tfjs-cognition-adapter-
design.md). brain.js / brain.d.ts / BrainJsReasoner still present in
this commit; they are removed in a later chunk of the same PR.
* chore(demo): add tfjs deps alongside brain.js (transitional)
* feat(cognition/adapters/tfjs): add TfjsSnapshot + base64 weight codec
Pure-JS round-trip for Float32Array[] <-> base64 with a shape manifest.
Used by the TfjsReasoner's toJSON/fromJSON pair and by the demo's
bundled learning.network.json. No tfjs dependency at this layer —
tested directly via Float32Array inputs.
* feat(cognition/adapters/tfjs): TfjsReasoner inference core
Constructor + selectIntention + getModel + dispose + the backend-
mismatch error class. Training and persistence (train/toJSON/fromJSON)
are stubbed as rejecting promises / throwing errors until Chunks 4
and 5 fill them in.
* feat(cognition/adapters/tfjs): train() + seeded Fisher-Yates pre-shuffle
Seeded LCG + in-place Fisher-Yates avoid tfjs's Math.random-based
built-in shuffle. model.fit runs with { shuffle: false } so the same
pairs + same seed produce per-epoch loss trajectories stable to ~3
decimal places on the CPU backend. learningRate option is accepted but
ignored — the consumer-compiled model's optimizer owns the actual LR.
* feat(cognition/adapters/tfjs): toJSON + async fromJSON round-trip
toJSON snapshots topology + flattened Float32Array weights + shape
manifest via the base64 codec. fromJSON awaits tf.setBackend when the
requested backend differs from the current global (mapping failures
to TfjsBackendNotRegisteredError), rebuilds the Sequential via
tf.models.modelFromJSON, and re-applies weights.
* build(cognition/adapters/tfjs): wire subpath export + size budget
External packages list picks up @tensorflow/tfjs-{core,layers}; lib
entry emits dist/cognition/adapters/tfjs/index.js alongside the other
adapters; vitest alias maps agentonomous/cognition/adapters/tfjs →
src; package.json exports the subpath; size-limit enforces a 4 KB
gzip budget (actual 3.6 KB).
* feat(demo): rewire Learning mode to TfjsReasoner
- learning.ts: lazy-load tfjs-backend-cpu + the tfjs adapter; hydrate
from localStorage (tfjs-network key) or the bundled baseline;
fallback on corrupt or schema-invalid snapshot; compile the rebuilt
Sequential with SGD+MSE so the Train button can call .train()
- learning.network.json: rewritten in TfjsSnapshot format (same
coefficients: kernel [-1,-0.8,-0.6,-0.7,-0.9], bias 0); unit-tested
via a sigmoid round-trip
- cognitionSwitcher.ts: demo-local trainRng decoupled from agent.rng
(preserves tick-replay determinism); dispose() outgoing reasoner on
mode swap and on mount dispose; Train handler now calls reasoner.train
+ reasoner.toJSON and writes JSON to tfjs-network key
- ui.ts: Reset clears the tfjs-network key instead of brainjs-network
- tsconfig.json: add tfjs subpath to the paths map
- learningMode.train.test.ts: rewritten against real tfjs CPU backend;
asserts the persisted snapshot has { version:1, weights, weightsShapes }
* chore(cognition/adapters/brainjs): remove — replaced by tfjs
Final cleanup of the brainjs adapter after the TfjsReasoner swap:
- deleted src/cognition/adapters/brainjs/ (3 files)
- deleted tests/unit/cognition/adapters/BrainJsReasoner.test.ts
- deleted tests/examples/stubs/brain-js.ts
- removed brain.js from peerDependencies / peerDependenciesMeta
- removed the brainjs lib entry + ambient-dts copy entry + vitest
alias block + brainjs subpath alias in vite.config.ts
- removed the brainjs subpath from package.json exports + size-limit
- removed brain.js from the demo's devDependencies (139 transitive
packages removed; npm audit reports 0 vulnerabilities on the demo)
- removed the brainjs path from examples/nurture-pet/tsconfig.json
- README / examples/nurture-pet/README updated for the tfjs rename
- .changeset/cognition-adapter-brainjs.md replaced by
.changeset/cognition-adapter-tfjs.md (minor bump + migration guide)
* refactor(cognition/adapters/tfjs): polish pass
- TfjsReasonerOptions interface → type (style: prefer type unless
consumers need to extend)
- drop unused seed field from TfjsReasonerOptions (seed lives on
TrainOptions where it's actually consumed)
- fromJSON: pass the decoded Float32Array straight to tf.tensor with
an explicit 'float32' dtype instead of round-tripping through
Array.from
- fromJSON JSDoc: note that the rebuilt Sequential is uncompiled so
callers who intend to train() compile first
- cognitionSwitcher: replace for (const _id of NEED_IDS) with an
index loop so the unused iteration variable no longer lingers
* fix(cognition/adapters/tfjs): defer dispose during in-flight train
Address PR #60 Codex review:
P1 (cognitionSwitcher): disposing the outgoing reasoner mid-swap
freed model tensors while model.fit was still running against them,
turning the pending train() into an unhandled rejection. Track the
training reasoner + its pending promise; when a mode swap or dispose
targets that reasoner we defer disposeNow() until the promise
settles. Non-training reasoners still dispose immediately.
P2 (toJSON weight tensors): Codex suggested disposing the tensors
returned by model.getWeights() after dataSync. That was incorrect for
our tfjs-layers version — LayersModel.getWeights maps to
LayerVariable.read() which returns the backing tensor itself, not a
clone. Disposing those tensors destroys the model's weight storage
(confirmed by the dispose test regressing). Added a comment pinning
the lifetime contract so the next reviewer doesn't retry the change.
* chore: scrub stale brainjs refs + tighten demo CI path
Lib / docs refs:
- examples/nurture-pet/vite.config.ts: drop brainjs alias, add tfjs
alias (prevents prefix-rewrite hazard the regex guard was defending
against)
- src/cognition/adapters/tfjs/TfjsReasoner.ts: docblock references
only js-son now, not brainjs
- src/cognition/learning/Learner.ts: updated the "e.g.," line to name
tfjs instead of the removed brain.js
- tests/examples/cognitionSwitcher.test.ts: probe-list comments name
@tensorflow/tfjs-core instead of brain.js
- docs/specs/vision.md: peer-optional adapter list swaps
BrainJsLearner for TfjsReasoner
- .changeset/reasoner-reset-harmonization.md: describe TfjsReasoner's
reset-opt-out reasoning (weights must persist) instead of
BrainJsReasoner's (no ephemeral state)
CI improvements now enabled by the cleaner dep tree:
- pages.yml: demo install switches from `npm install` to `npm ci`
(lockfile is stable now that brain.js's 139-package native build
chain is gone)
- ci.yml: new `demo-build` job runs `npm ci + vite build` on the
demo so a broken nurture-pet ships a red check on the topic PR
instead of surfacing only on the demo-branch Pages deploy
* fix(cognition/adapters/tfjs): guard non-array featuresOf + dispose tensor inputs
Address PR #60 Codex second-round review on commit 50606d0:
P1 (object-map features): selectIntention's old `tf.tensor([features])`
path silently fed brain.js-style Record<string, number> into tfjs,
which only accepts number arrays / tensors / TypedArrays and fails
deep in tf-core. Added toInputTensor() with a typed TypeError that
names Object.values's unreliable key order and nudges migrators
toward an explicit feature-key list.
P2 (tf.Tensor input leak): featuresOf ran outside tf.tidy, so a
consumer returning a fresh tf.Tensor leaked one tensor per tick. Now
featuresOf runs INSIDE tidy — any tensor it allocates (or returns
directly) is disposed with the rest of the forward-pass scratch.
Documented the single-use contract so consumers don't cache a
long-lived tensor and have it disposed on first call.
Two new tests cover both paths (the typed-error path and a 20-tick
no-leak check for tensor-valued features).
size-limit: adapter chunk grew 3.71 → 4.17 KB gzip with the guard +
docblock; bumped budget 4 → 5 KB to accommodate.
Plus docs/specs/2026-04-24-post-tfjs-improvements.md — roadmap of
library, demo, and CI/build work unblocked by the brainjs removal.
Groups items by value / cost / unblocked-by, proposes a sequencing
order, notes what stays out of scope.
* fix(config): quiet IDE red marks on vite.config.ts + demo tsconfig
vite.config.ts: type `test` via `vitest/config`'s `defineConfig` so
TS resolves the vitest UserConfig overload ("Object literal may only
specify known properties, and 'test' does not exist in type
'UserConfigExport'"). `Plugin` still comes from `vite`.
examples/nurture-pet/tsconfig.json: drop `baseUrl: "."`. TS 7.0
deprecates it and the `paths` entries already use paths relative to
the tsconfig, which is what the resolver falls back to when baseUrl
is absent under `moduleResolution: Bundler`. No behaviour change.
* docs(specs): log pre-existing demo js-son-agent ambient gap + this PR's fixes
Adds a Section 3A "Pre-existing tech debt" to the post-tfjs
improvements roadmap so the errors the IDE surfaces don't look like
fallout from the brainjs→tfjs migration:
- 3A.1 demo js-son-agent TS7016 — ambient shim lives in the root
workspace but the demo tsconfig include can't see it. Three fix
options sketched (tsconfig include / local copy / paths → dist).
- 3A.2 vite.config.ts test-key typing — fixed on this PR, breadcrumb
kept for the case someone reads the doc pre-merge.
- 3A.3 demo tsconfig baseUrl deprecation — same, fixed on this PR.
Recommended-order slot added: 3A.1 ahead of every other follow-up
(one-line tsconfig change, XS cost, unblocks a clean demo local
typecheck).
* add grahpifyignore
* fix(examples/nurture-pet): include js-son ambient shim in demo tsconfig
Pulls the adapter's ambient `declare module 'js-son-agent'` into the
demo's compile scope so `npx tsc --noEmit` runs clean from the demo
directory. Closes `docs/specs/2026-04-24-post-tfjs-improvements.md`
§3A.1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plans): refresh v1 comprehensive plan post-tfjs
- Mark 0.9.4 (Reasoner.reset harmonization) as shipped.
- Retire 0.9.3 (brain.js training persistence) — superseded by the
tfjs adapter swap (PR #60) which owns train + persist natively.
- Swap the `brainjs` subpath in the 1.0.3 export-freeze list for
`tfjs` to match the actual exports map.
- Update the cognition-switch chapter row, the sequencing table,
and the plan-chunking table to reflect shipped state; point the
0.9.x follow-ups at `docs/specs/2026-04-24-post-tfjs-improvements.md`.
- Retarget the training-dataset open question at `TfjsReasoner`.
- Historical brain.js mentions stay where they explain the
migration path.
No code change; alignment-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(agent): rename _internalPublish / _internalDie → publishEvent / routeDeath
Drops the leading-underscore convention on the two `@internal` hooks
`Agent` exposes for helper classes under `src/agent/internal/`. Both
methods remain `@internal` (not re-exported from `src/index.ts`); the
`@internal` TSDoc tag + barrel discipline are the contract.
- `Agent._internalPublish(event)` → `Agent.publishEvent(event)`
- `Agent._internalDie(...)` → `Agent.routeDeath(...)`
- All 13 call sites under `src/agent/internal/` + the facade proxy in
`Agent.facade()` updated.
- `STYLE_GUIDE.md` naming rule rewritten around `@internal`.
- One test comment updated (no test-code change).
Pre-work for 1.0.3 "narrow the public surface"; major-bump changeset
covers the breaking rename for consumers who reached past the barrel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ports): LlmProviderPort + MockLlmProvider
v1.0.2 from the comprehensive plan — freezes the minimum LLM
provider contract so Phase B can slot concrete adapters
(`AnthropicLlmProvider`, `OpenAiLlmProvider`) in without a breaking
change.
Surface (all under `src/ports/`):
- `LlmProviderPort.complete(messages, options) → Promise<LlmCompletion>`.
Completion only; streaming + tool-use + structured output land in
Phase B as additive methods.
- `LlmMessage` with optional `LlmCacheHint` (opaque key; adapter
translates to Anthropic `cache_control: ephemeral` / OpenAI
prompt-caching / in-memory memoisation).
- `LlmBudget` — input / output token caps + USD-cent spend cap.
Adapters throw the existing `BudgetExceededError` before calling
upstream when a populated limit would be exceeded.
- `LlmUsage { inputTokens, outputTokens, costCents?, cached? }` +
`LlmCompletion { text, usage, model, stopReason? }`.
- `MockLlmProvider` — deterministic, no-network playback with
scripted responses, `'queue'` (default, positional) and
`'match-or-error'` dispatch modes, budget enforcement, abort-
signal handling, and crude `ceil(chars/4)` per-message token
estimation for tests.
11 new unit tests assert deterministic replay, budget rejections,
dispatch modes, abort behaviour, and cached-flag propagation.
Core bundle gzip grew 32.50 → 33.58 kB — still under the 35 kB
budget, but closer. Flagged in the v1 plan risks table. If PR #65
(narrow surface) nets more savings, budget room returns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(agent): narrow public surface for the 1.0 freeze
v1.0.3 from the comprehensive plan.
Removed from `src/index.ts`:
- `AgentDependencies` type. The `Agent` class stays exported as the
`createAgent` return type; the dependency bag is now internal —
consumers compose via `createAgent(config)`. Tests still reach the
interface via relative imports.
Marked `@experimental` (public, reshape risk flagged in TSDoc):
- `AgentModule` + `ReactiveHandler` — reshape with the 1.1 composable
kernel (`requires` / `provides` / `hooks` ordering, `serialize` /
`restore`).
- `Needs`, `Modifiers`, `AgeModel` class direct constructors — wrapped
by per-subsystem modules in 1.1.
Per the v1 plan §1.0.3, reshaping an `@experimental` symbol is a
**minor** bump (not major); adding the tag to an existing symbol is
also a minor bump — no runtime behaviour changes here.
Also adds `tests/unit/exports.test.ts`, a CI guard asserting the
five-subpath export contract in `package.json` (core / excalibur /
mistreevous / js-son / tfjs) so accidental renames break CI before
they land on `develop`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: audit public surface JSDoc for the 1.0 freeze
v1.0.4 from the comprehensive plan.
- Add a concept-line header to the tfjs adapter's `index.ts` so it
matches the mistreevous / js-son / excalibur pattern.
- Broaden the barrel section notes for Events, Tuning, and Control
modes so identifiers are self-explanatory in IntelliSense.
- Rewrite the `AgentFacade` JSDoc: replace the stale M2/M3/M4/M10
milestone references with a description of the three call sites
(skill execute, reactive handler, module install) and the
intentional asymmetry with `SkillContext`.
No runtime change; docs-only. No changeset needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(examples/nurture-pet): loss-delta toast + Untrain button
Closes `docs/specs/2026-04-24-post-tfjs-improvements.md` §2.3 and §2.5.
Demo-only PR — no library change.
- §2.3 Loss-delta toast: after Train completes, flash "Trained ✓ —
loss 0.42 → 0.08" using the `history.loss` series + `finalLoss`
that `TfjsReasoner.train()` already returns. Falls back to the
bare "Trained ✓" when the training result is sparse.
- §2.5 Untrain button: sits next to Train, becomes visible only in
Learning mode. Clears `agentonomous/<agentId>/tfjs-network` from
localStorage and re-runs the learning-mode `construct()` to
rehydrate from the bundled baseline. Leaves the rest of the
agent's persisted state alone — this is not a full reset.
- Re-uses the existing `disposeIfOwned` + `changeEpoch` guards so a
user swapping modes mid-reset doesn't end up with the stale
reasoner or a leaked tensor pool.
Test coverage in `tests/examples/learningMode.train.test.ts`:
- Untrain button shares visibility with Train across mode switches.
- Clicking Untrain removes the persisted snapshot key and triggers
a fresh `setReasoner` call.
412 vitest tests pass; bundle budgets unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(cognition/adapters/tfjs): TfjsLearner closes the Learner seam
Closes `docs/specs/2026-04-24-post-tfjs-improvements.md` §1.1 — the
first real `Learner` implementation turns Stage 8 (score) of the tick
pipeline into a working reinforcement seam.
Exposed via the `agentonomous/cognition/adapters/tfjs` subpath:
- `TfjsLearner` — buffers `LearningOutcome`s in a FIFO ring, batches
them into `reasoner.train(pairs, { epochs, seed })` calls every
`batchSize` outcomes. Background training runs off the tick loop
via a Promise chain; `score()` never blocks.
- `TfjsLearnerOptions<In, Out>` — `reasoner`, `toTrainingPair`
projection, `batchSize`, `bufferCapacity`, `epochs`, `trainSeed`,
`onBatchTrained` hook, `onTrainError` hook.
- `TrainableReasoner<In, Out>` — minimum-surface view the learner
uses, so tests can substitute a fake without spinning up tfjs.
- `flush()` force-trains the partial buffer; `dispose()` stops new
outcomes without cancelling in-flight training; `isTraining()` +
`bufferedCount()` are observable for demos.
Determinism contract: no RNG, no `Date.now()`, no `setTimeout`.
`trainSeed` is a stable consumer-supplied value (default `1`) — never
`Math.random()` — so under `SeededRng` + `ManualClock` the sequence
of `LearningOutcome`s, batch boundaries, and weight updates are all
reproducible.
10 new unit tests in `tests/unit/cognition/adapters/TfjsLearner.test.ts`
cover: buffering below batchSize, background-train firing exactly
once at batchSize, null-projection skip, option forwarding, FIFO
eviction at bufferCapacity, flush()/empty, error surface via
onTrainError, dispose(), deterministic replay.
Adapter bundle grew gzip 4.17 → 5.72 kB; raised the size-limit
budget from 5 kB → 7 kB with headroom for the multi-output softmax
work in §1.2 next.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plans): address Codex review on #64
- Re-point 0.9.5's Depends-on column from the obsolete 0.9.3 row to
the shipped 0.9.4 (matches the stated "docs polish runs after the
reasoner-reset harmonisation landed" ordering).
- Replace the "next up" flag on 1.0.1 with the actual dependency
wording so the plan-chunking table no longer contradicts the
sequencing-at-a-glance table (1.0.1 still waits on 0.9.0 shipped +
0.9.5 / 0.9.7). Post-tfjs-improvements demo polish runs in
parallel because those items don't touch the 0.9.0 release gates.
* fix(ports): address Codex review on #66 — drop token-count floor
Codex flagged the `max(1, ceil(chars/4))` floor in `estimateTokensFor`:
empty content should produce 0 tokens, not 1, so a script with `text:
''` + `maxOutputTokens: 0` behaves correctly rather than throwing a
spurious `BudgetExceededError`. This is the mock's documented
default (`ceil(chars/4)`), so the floor was a latent drift.
Also adds two regression tests: empty-string inputs report 0 input
and 0 output tokens, and `maxOutputTokens: 0` against an empty
scripted response no longer throws.
* fix(examples/nurture-pet): address Codex review on #69 — Untrain vs in-flight Train race
Codex flagged a race: clicking Untrain while Train's `model.fit` was
still running would (a) wipe the persisted snapshot key, (b) construct
a fresh reasoner, then (c) let Train's trailing `localStorage.setItem`
silently re-persist the trained weights. The UI showed "Reset to
baseline ✓" but a reload hydrated a trained model.
Fix: await `pendingTrain` inside `onUntrainClick` before removing the
key. The training run completes and writes first; then Untrain wipes
what was just written; then a fresh `construct()` rehydrates from the
bundled baseline.
Also tightens the "no learning mode registered" early-return so the
button state is restored instead of stranded on "Resetting…".
* fix(cognition/adapters/tfjs): address Codex review on #70
Two Codex findings on TfjsLearner:
- **P1: batches queued mid-train never flushed.** The scheduler only
fires on `score()`, so if a full batch arrives while an earlier
batch is training and no further `score()` happens, the queued
pairs stay buffered indefinitely. Fix: extract
`maybeScheduleTrain()` and call it from both `score()` and the
tail of `trainBackground()` so consecutive full batches drain
automatically. Regression test asserts exactly two `train()` calls
when four pairs arrive while the first is stalled on a gate.
- **P2: negative batchSize / bufferCapacity could hang the tick
pipeline.** With a negative cap, `while (buffer.length > cap)`
stayed true at length 0 and `score()` spun forever. Clamp
`batchSize` to ≥ 1 and `bufferCapacity` to ≥ 0 in the getters.
Two new tests cover: capacity clamp (pairs shift out on push) and
batchSize clamp (a zero-batchSize configuration still trains one
pair at a time via `flush()`).
All 423 tests pass.
* fix(ports): address Codex review round 2 on #66 — strict dispatch rejects multi-match
Codex's second pass flagged `pickScript` under `match-or-error`: it
was using `Array.find`, so a config where multiple scripts return
`true` for the same request would silently take the first hit
instead of failing fast. For a strict replay-test provider, that
masks misconfigured scripts and produces the wrong completion
without any error.
Fix: swap `find` for `filter`; throw on zero hits (unchanged) AND on
more-than-one hit (new), with a message that names the count so
misconfigured scripts are easy to spot.
Regression test asserts a two-script always-match setup rejects
with `/2 scripts matched/`.
* fix(examples/nurture-pet): address Codex round 2 on #69
Two findings on the Untrain handler:
- **P1 re-flag: lock Untrain out while Train is in flight.** Round 1
already serialised via `await pendingTrain`, but the UI still let
users click Untrain mid-Train. Belt-and-suspenders fix: disable
the Untrain button on Train start, re-enable it on Train
completion. The programmatic `pendingTrain` await stays as a
secondary guard.
- **P2: re-enable buttons even after dispose().** The finally block
previously skipped the re-enable if `disposed` was true; DOM
buttons outlive the closure, so a dispose racing with an in-flight
Untrain left the next mount's buttons stuck on "Resetting…".
Always restore state in finally.
* fix(ports): address Codex round 3 on #66 — defer queue cursor past budget checks
Codex flagged that a budget-rejected request in queue mode still
advanced the cursor, so a first call rejected by `maxOutputTokens`
would consume a scripted entry and a retry would skip to the next
one. Non-deterministic for replay.
Fix: `pickScript` now returns `{ script, commit }` where `commit()`
is the cursor advance. `completeSync` runs the three budget checks
first and only calls `commit()` once they pass. `match-or-error`
dispatch returns a no-op commit since it has no queue state.
Regression test: a first call rejected by `maxOutputTokens: 1` must
leave the queue at cursor 0, so the retry returns script[0] and a
second call returns script[1].
* chore: untrack .claude/scheduled_tasks.lock (local schedule state)
* fix(examples/nurture-pet): address Codex round 3 on #69
Two findings:
- **P1 (re-flagged): hard-gate Untrain while Train is in flight.**
Round 2 used `await pendingTrain` inside the Untrain handler so the
key-removal came after Train's persist step. Codex kept flagging
the race anyway, so swap to an explicit early-return: if
`pendingTrain` is non-null at entry, Untrain refuses. The button
is also disabled on Train start, so the only way to reach the
guard is a programmatic caller or a stale click — refusing is the
safer surface.
- **P2: resync the selector after Untrain installs learning.** The
handler bumps `changeEpoch`, which silently discards any in-flight
`onChange` work. If the user had just selected BT / BDI and
clicked Untrain before that `construct()` resolved, the dropdown
kept showing the non-learning label while the agent was running
learning. Fix: after `agent.setReasoner(...)`, re-point
`select.value`, `status.dataset.mode`, `activeModeId`, and Train
visibility to `'learning'`.
* chore: untrack .claude/scheduled_tasks.lock (local schedule state)
* fix(ports): address Codex round 4 on #66 — request tokens gate maxInputTokens
Codex flagged that `maxInputTokens` was compared against
`script.usage?.inputTokens`, so a script could set `usage.inputTokens:
1` for a long messages payload and sneak past the input cap. Replay
tests accept over-budget requests silently.
Fix: derive the request-side token count (`estimateTokens(messages)`)
independently of the script override. The budget check uses the
request count; the reported `usage.inputTokens` on the returned
completion still honours the script override so tests can pin exact
numbers for the consumer-visible accounting.
Two regression tests:
- Script under-reporting input does not bypass `maxInputTokens`.
- Reported `completion.usage.inputTokens` still reflects the script
override when one is supplied.
* fix(examples/nurture-pet): address Codex round 4 on #69
Three round-4 findings:
- **P1 (re-flagged a 3rd time): consolidate the pendingTrain hard
gate into the top-line guard.** Codex's pattern-matcher kept
reading "this handler only gates on disposed and activeModeId"
even though the next line had `if (pendingTrain) return;`.
Collapsed into a single guard:
`if (!untrainBtn || disposed || pendingTrain !== null) return;`
so the pendingTrain check sits with its siblings.
- **P2 selector resync path:** move the optimistic
selector/status/trainBtn snap to `'learning'` ABOVE the
construct() await. If the user had a non-learning `onChange` in
flight when clicking Untrain (and the epoch bump cancels it), the
dropdown is already back on `'learning'` by the time the user sees
anything — and stays there even if the subsequent `construct()`
rejects (matches the "Untrain intent" UX).
- **P2 stale toast:** `flashStatus` captures the current
`status.textContent` as its restore target. If Train's "Trained ✓
…" toast was still on screen, it would be restored after our own
toast timed out — the status would claim the model is still
trained. Fix: explicitly set `status.textContent = 'active'` before
the flashStatus call so the captured "previous" is the canonical
label.
* fix(cognition/adapters/tfjs): address Codex round 5 on #70
Two P1 findings on `TfjsLearner` — both real:
- **Mark flush-triggered training as in-flight.** `flush()` was
calling `runTrain()` directly, so a concurrent `score()` that
tripped `maybeScheduleTrain()` could kick off `trainBackground()`
in parallel on the same reasoner. That breaks determinism and
risks tfjs backend errors on overlapping `model.fit` calls. Fix:
set `this.inflight` around `flush()`'s train + drain any queued
batches via the same `maybeScheduleTrain()` tail as
`trainBackground()`. `isTraining()` now reports true during
flush-driven training too.
- **Sanitise NaN batchSize / bufferCapacity.** `Math.max(1, NaN)`
propagates NaN, so a `Number(envVar)` parse that yielded NaN
turned `splice(0, NaN)` into a zero-slice batch, the
`buffer.length < NaN` guard into false, and the learner into an
infinite empty-batch loop. Fix: coerce non-finite `batchSize` to
50 and non-finite `bufferCapacity` to the derived default before
clamping; also `Math.trunc` to stay integer-clean.
Two regression tests:
- `flush()` keeps `isTraining()` true across its train await and
blocks concurrent `score()` batches until it settles.
- NaN-valued `batchSize` / `bufferCapacity` fall back to defaults —
one buffered pair stays buffered, `flush()` trains it without
hanging.
* fix(examples/nurture-pet): address Codex round 5 on #69 — set pendingTrain before yielding
Codex flagged a real race: the pre-run `await setTimeout(0)` at the
top of `onTrainClick` yielded control BEFORE `pendingTrain` was set.
A programmatic/stale click on `#untrain-network` dispatched inside
that microtask window would pass Untrain's `pendingTrain !== null`
gate, run Untrain, then Train's trailing `localStorage.setItem(...)`
would re-persist trained weights.
Fix: move the yield INSIDE the `run` body so `pendingTrain =
promise.catch(...)` is assigned before any control hand-off. The
visible "Training…" label still renders on the first paint because
`run` yields at its own top — just after Untrain is already locked
out.
The round-4 P2s about selector resync, stale toast, and the
pendingTrain gate location were already addressed; Codex's repeated
flags on those are known false-positives per the sweep logs and
will not be iterated on further.
* fix(cognition/adapters/tfjs): address Codex round 6 on #70
Two real round-6 findings:
- **P1: flush() must re-check inflight after the await.** The
earlier fix marked flush-driven training in-flight, but the
initial `if (this.inflight !== null) await this.inflight;` only
ran once. While `flush()` was awaiting batch #N, that batch's
drain-tail could schedule batch #N+1 and set `this.inflight`
again — `flush()` would then call `runTrain()` in parallel.
Loop instead: keep awaiting `inflight` until it stays null.
- **P2: honour bufferCapacity: Infinity.** Docs already advertised
`Infinity` as the unbounded escape hatch, but `Number.isFinite`
rejected it and the learner silently fell back to the derived
default. Three-way coercion: `Infinity` passes through, NaN /
-Infinity fall back to the default, finite values get truncated +
clamped to ≥ 0.
Two regression tests:
- Capacity = Infinity buffers 250 outcomes without dropping any.
- Three-batch interleave: auto-batch #1 in flight, drain-tail
schedules #2, flush() must wait through both before resolving
null — never firing a parallel train#3.
* Update graphify gitignore
* docs(plans): polish-and-harden roadmap for next session
Multi-track roadmap covering 12 increments across CI hygiene,
demo polish, and library seams. Sequenced cheap-first; each
increment ships as one PR cut from develop. Major-bump changesets
continue accumulating — 1.0 publish stays held per owner decision.
LLM provider integration is explicitly prep-only: docs + a
MockLlmProvider example exercise the v1.0.2 port surface end-to-end
without shipping a concrete adapter. Anthropic / OpenAI adapters
remain Phase B.
Closes the post-tfjs-improvements §2.x demo polish + §3.x CI
hardening tracks. Cross-references the v1 plan, post-tfjs spec,
mvp-demo spec, and vision doc. Plan-chunking table at the bottom
points each row at its own per-PR plan when scope warrants one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(plans): add Track A remediation + workflow rules to polish-and-harden
Incorporates the 2026-04-24 review findings as a mandatory
"remediation" track that lands BEFORE CI / demo / lib work:
- PR #1 fix/agent-restore-replace-modifiers (MAJOR — restore must
replace, not merge, modifier state).
- PR #2 fix/localstorage-store-keyspace-collision (MAJOR — split
data/ from meta/ in the localStorage key namespace, add legacy
migration so existing browsers don't lose saved pets).
- PR #3 fix/pick-default-store-throwing-localstorage (MAJOR — guard
the localStorage probe so SecurityError-throwing getters don't
crash store selection).
- PR #4 fix/fs-store-deterministic-list-order (MINOR — sort the
readdir output via localeCompare for cross-platform stability).
Renumbers the existing tracks (CI/demo/lib) to follow the
remediation block and updates all internal cross-references. Adds a
"Workflow" section codifying the per-session loop: independent
branches, batch open all PRs, then multi-pass Codex sweep until 👍,
resolve review threads, owner merges. Same loop captured in
`MEMORY.md → feedback_pr_workflow.md`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add graphify graph
* fix(agent): restore replaces modifier state instead of merging
Agent.restore()'s contract says "replaces the relevant state slices",
but the modifiers branch merged by calling this.modifiers.apply(mod)
for each entry in snapshot.modifiers without first clearing the live
collection. Restoring into an already-running agent left stale
modifiers active, so needs decay multipliers and mood biases stacked
on top of whatever the agent was already carrying — violating
snapshot truth.
Clear the modifier collection before applying snapshot.modifiers.
Done unconditionally so a snapshot that omits the modifiers slice
still wipes stale entries on the target. Expired-on-restore boundary
handling (R-16) is unchanged; the ModifierExpired emit for entries
whose expiresAt is <= clock.now() still fires exactly once.
Adds two regression tests: one covering pre-existing modifiers with a
partial-overlap snapshot, one covering a snapshot with no modifiers
slice against an agent carrying a stale modifier.
Also add graphify-out/ to .prettierignore so the generated graph.html
(committed in 637cd66) does not block format:check on every PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(persistence): split localStorage keyspace so index cannot collide with data
LocalStorageSnapshotStore stored both payloads and the O(1) index list
under `{prefix}{key}`, so saving under a user key of
`'__agentonomous/index__'` silently overwrote the index. `list()`
then returned garbage and the snapshot was unreachable.
Split the keyspace into disjoint sub-namespaces:
- `{prefix}__agentonomous/data/{encodeURIComponent(userKey)}` — payloads
- `{prefix}__agentonomous/meta/index` — index
encodeURIComponent on user keys means a colliding string cannot escape
the data subspace. The index payload still holds raw (decoded) keys, so
consumers see their own strings back from list().
Existing entries written under the pre-split layout are migrated once
on construction: legacy `{prefix}{userKey}` payloads move to the new
data path, and `{prefix}__agentonomous/index__` moves to the new meta
path. Migration uses a runtime capability probe for iteration
(length + key(i)); backends that don't expose iteration skip migration
silently — in-memory stubs typically have no legacy data.
Adds tests/unit/persistence/LocalStorageSnapshotStore.test.ts covering:
happy path, evil-key collision, malformed index recovery, URI-special
char round-trip, end-to-end legacy migration, and migration without a
legacy index present.
Bundle impact: dist/index.js gzip 34.16 → 34.76 KB (+0.6 KB within the
35 KB budget).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(persistence): pickDefaultSnapshotStore survives throwing localStorage getter
pickDefaultSnapshotStore()'s feature probe read globalThis.localStorage
without a guard, so environments that expose a throwing getter
(sandboxed third-party iframes, SecurityError, strict private-browsing
modes) saw an uncaught exception before store selection could finish.
Wrap the property access in try/catch and fall back to the
InMemorySnapshotStore path on any thrown access. Matches the existing
construction-time fallback for denied storage quotas.
Adds a regression test that installs a throwing-getter descriptor on
globalThis.localStorage (restored via Object.defineProperty so the
outer afterEach can reach a writable property again) and asserts
pickDefaultSnapshotStore() does not throw and returns an
InMemorySnapshotStore.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(persistence): FsSnapshotStore.list() returns keys in deterministic order
list() returned whatever order the underlying readdir(path) handed
back — ext4 hash order, NTFS MFT order, tmpfs insertion order — so a
Linux CI run and a Windows developer machine could see different
results from the same snapshot directory. Callers relying on
deterministic replay had to sort themselves.
Sort the decoded key list with localeCompare before returning.
O(n log n) added to a cold-path method; negligible on typical key
counts and worth it to give replay/trace callers stable output across
platforms.
Adds a regression test that stubs readdir to return an unsorted
response ('charlie', 'alpha', 'bravo', 'aardvark') and asserts list()
returns the localeCompare-sorted permutation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(agent): gate modifier-clear on snapshot.modifiers presence
Codex P1 flag on #71: the unconditional modifier wipe broke partial-
snapshot semantics. AgentSnapshot fields are optional specifically so
consumers can capture via `include: ['lifecycle']` and restore just
that slice. Wiping modifiers on every restore deleted live modifiers
on the target when the snapshot didn't speak to them — a data-loss
regression inconsistent with how needs / mood / animation gate on
field presence.
Move the clear inside `if (snapshot.modifiers)`. Partial snapshots
now leave unrelated slices untouched on the target; full snapshots
still enforce replace-not-merge for their modifiers slice.
Flip the second test to assert partial-snapshot semantics: a
`snapshot({ include: ['lifecycle'] })` restored into an agent holding
a pre-existing modifier must leave that modifier in place.
Tighten restore() JSDoc to match the gated behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(fs-store): use code-point sort instead of localeCompare for list()
Codex P2 flag on #74: localeCompare uses the process default locale, so
the returned order can still differ between environments when keys
contain non-ASCII characters (machines configured with different LANG /
ICU locales). That undermines the determinism this PR is trying to
guarantee across CI and developer systems.
Switch to a locale-independent code-point comparison (a < b ? -1 : ...).
Result is byte-identical across hosts regardless of process locale.
Add a non-ASCII regression test (cafe / café / caffé / zebra) that pins
the code-point order — comparing 'café' vs 'caffé' at index 3 puts
'caffé' first ('f' 102 < 'é' 233). ICU locales typically order these
the other way via collation, so the test would fail loudly if anyone
swapped back to localeCompare.
Update the changeset to reflect the locale-independent guarantee.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): migrate legacy keys that look like reserved subpaths
Codex P2 flag on #72: pre-split layout was `{prefix}{userKey}`, so
users could legitimately have saved under keys like
`__agentonomous/data/foo` or `__agentonomous/meta/something`. The
migration scan filter that skips entries starting with the new-layout
subpaths (kept for re-entrant safety) wrongly dropped those legacy
entries. After upgrade the data stayed orphaned at the old path while
the new index still listed the key — `load()` returned null and the
snapshot became unreachable.
Fix: when the legacy index is present, union its entries into the
migration set in addition to the scan results. Index-registered keys
migrate regardless of whether they happen to start with a reserved
subpath. The scan filter still applies to orphan-only paths so
subsequent constructions with no legacy index don't re-process the
new-layout entries this code wrote.
Tighten the data-move loop to only call removeItem when getItem found
the entry (a listed-but-missing index key is now a no-op rather than a
phantom remove on the legacy path).
Adds two regression tests:
- Legacy user keys `__agentonomous/data/foo` and
`__agentonomous/meta/dashboard` migrate end-to-end.
- Migration is idempotent — second construction over the same storage
produces byte-identical raw keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): fix two P1 migration flaws + bump size budget
Codex P1 #1 (line 159): migration skipped entirely when the injected
backend lacked length/key, but StorageLike only requires getItem /
setItem / removeItem. Persistent custom adapters that satisfy only the
required contract (node-localstorage-style, custom IndexedDB shims)
would keep legacy snapshots at the old `{prefix}{key}` path while
load() / list() read the new data/ + meta/index paths — data
unreachable post-upgrade.
Restructure migrateLegacyKeys into two discovery paths:
- Legacy-index lookup. Always runs. Reads the known legacy path
directly via getItem (no iteration needed) and migrates every user
key the index lists. Covers custom adapters.
- Orphan scan. Runs only when the backend exposes length + key(i).
Picks up entries whose registration in the legacy index was lost
(the original v1 collision bug). Filter on new-layout subpaths
keeps it re-entrant.
Codex P1 #2 (line 182): an empty prefix would make startsWith(prefix)
true for every storage key, so migration could rewrite and delete
unrelated application data on first construction after upgrade.
Reject empty prefix at the constructor boundary — fail loudly before
any storage write.
Size budget: dist/index.js gzip grew to 35.36 KB with the restructured
migration, over the previous 35 KB limit. Bump the budget to 50 KB
per owner guidance so CI stops gating on the wafer-thin margin.
Current usage 35.09 KB / 50 KB.
Regression tests added:
- NonIterableStorage (getItem/setItem/removeItem only) with legacy
index migrates end-to-end. Legacy paths cleared; new paths present.
- Empty prefix throws in constructor with a clear message.
- Empty-prefix guard does not corrupt pre-existing unrelated storage
data — the throw fires before any mutation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): skip legacy index sentinel during migration
Codex P2 flag on 4f97ce7: a v1 store that hit the original collision
bug (saving under key `__agentonomous/index__`) could leave the
legacy index listing its own sentinel path as a "user key". The prior
code added that entry verbatim to legacyKeys, so the migration loop
copied the index metadata (an array, not a snapshot) into the new
data namespace. `load('__agentonomous/index__')` would then return
malformed data typed as AgentSnapshot and break downstream restore.
Skip LEGACY_INDEX_SUFFIX entries when reading the legacy index. The
sentinel path is not a user key; it can only point at index metadata,
and index metadata doesn't belong in the new data namespace.
Regression test: seed storage with `p/__agentonomous/index__`
listing `['foo', '__agentonomous/index__']`. After migration, `foo`
loads normally, `load('__agentonomous/index__')` returns null, list
contains only `foo`, no data-namespace write exists for the sentinel
encoding, and the legacy index path is cleared.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): add re-entrance sentinel; scan recovers v1 data-subpath keys
Codex P2 flag on 103f32d: the orphan-scan filter skipped legacy
suffixes starting with `__agentonomous/data/`, but that string was a
valid user key in the v1 (pre-split) layout. A pathological v1 store
where the legacy index is missing (the exact recovery path this scan
was meant to handle) would be left with the payload at the old
`{prefix}{key}` location after migration — load() could no longer
reach it.
Root cause: the DATA_PREFIX filter was doing double duty — keeping
the scan re-entrant across constructions AND trying to distinguish
v1-user-keys-that-look-like-v2-layout from actual v2 writes. Those
two concerns are irreconcilable: both shapes are identical.
Split them:
- Re-entrance is now enforced by a sentinel (`__agentonomous/meta/
migrated`) written at the end of every migration pass — including
passes that migrated nothing. Subsequent constructions read the
sentinel at function entry and short-circuit.
- The orphan scan drops the DATA_PREFIX filter entirely, so v1 user
keys shaped like `__agentonomous/data/foo` are recovered on the
initial migration run. The scan still excludes our own metadata
namespace (`__agentonomous/meta/`) — those paths are never user
data.
Also: only write the new meta index when there are entries to store,
so fresh installs don't leave an empty index artifact in storage.
Adds one regression test for Codex's scenario: v1 data-subpath key
present, legacy index missing, migration still moves the payload.
Existing URI-round-trip test updated to skip the full meta/
namespace when asserting encoded-data invariants (the migrated
sentinel lives there too).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): address P1 marker-value-match + P2 malformed UTF-16
Codex findings on a970d4d, both real:
P1 (line 198): the marker-presence check `getItem(META_MIGRATED_KEY)
!== null` treated any non-null value as "migration already done". A
v1 user who saved under that exact path would see their snapshot
mistaken for an already-migrated marker, and migration would skip —
leaving the payload orphaned at the old path while load()/list() read
only the new data/ and meta/index locations.
Fix: match on a distinctive VALUE (MIGRATED_MARKER_VALUE =
`__agentonomous_v2_migrated__`), not mere presence. A v1 user's JSON
snapshot at that path can never equal this sentinel string, so their
snapshot falls through into the migration branch like any other
legacy key. When rawAtMarker exists but isn't the sentinel, the
path is explicitly added to legacyKeys so the v1 payload migrates
before the marker-write at end-of-pass stamps the sentinel there.
P2 (line 134): dataKey() now calls encodeURIComponent(key), which
throws URIError for lone-surrogate strings. Pre-split v1 accepted
such keys verbatim; post-split, save/load/delete could throw
synchronously for them, and migration could crash store init if the
legacy index listed one.
Fix: dataKey() re-throws URIError as a clearer store-specific error
that points at the offending key. save/load/delete wrap dataKey() in
try/catch and return the error via Promise.reject so consumers see
an async-looking API stay async-looking. Migration loop skips
malformed legacy entries instead of crashing — the payload stays at
the legacy path but the rest of the store still initializes.
Regression tests:
- v1 user data at the META_MIGRATED_KEY sentinel path migrates
(value-match protects the legacy payload).
- save/load/delete reject with the store-specific error for
lone-surrogate keys.
- Migration with a malformed-UTF-16 legacy key still initializes;
well-formed entries migrate, the malformed one is skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(review): codebase review findings + ESLint/typedoc guardrails
Consolidates a multi-agent review of the 2026-04-24 codebase into
`docs/plans/2026-04-24-codebase-review-findings.md` and lands Track A
(guardrails) live so the patterns the project relied on as convention
are now enforced by CI.
Track A — lands in this PR
- ESLint: architectural bans (default exports, enums, cross-layer
peer-dep imports in core) via `no-restricted-syntax` +
`no-restricted-imports`.
- ESLint: LOC / complexity caps (`max-lines`, `max-lines-per-function`,
`complexity`, `max-depth`, `max-params`, `max-nested-callbacks`).
Thresholds chosen so current code passes clean at error-level.
- ESLint: quality defaults for agentic contributions (`no-console`,
`eqeqeq`, `no-duplicate-imports`, `switch-exhaustiveness-check`,
`no-explicit-any`, etc.).
- Collapse three `import type` + `import` pairs into single inline
type imports (Agent.ts x2, ExcaliburAnimationBridge.ts x1) — required
by the new `no-duplicate-imports` rule.
- Typedoc: add missing adapter entry points (mistreevous / js-son /
tfjs), add docs build as a parallel CI job, add `npm run docs` to
the `verify` script so local gate mirrors CI. Output continues to
live under `docs/api/` (already gitignored; sits alongside
how-to/plans/specs).
Lint baseline after this PR: 0 errors, 11 warnings (the warnings are
the ratchet targets tracked under Track C of the plan).
Tracks B–E (stale docs, complexity ratchet, src micro-findings,
tooling gaps) are left as a punch list — each a separate topic branch
per CLAUDE.md one-PR-one-branch rule.
* fixup(persistence): abort migration cleanup when unable to enumerate keys
Codex P1 flag on 3dc1aa0: if a custom backend implements only the
StorageLike minimum (no length / key) AND the legacy index payload is
present but not a string array (corruption, or a colliding v1
snapshot), migration produced an empty legacyKeys set but STILL
deleted the legacy index and stamped the migrated marker. Legacy
{prefix}{userKey} entries were left in place but became permanently
unreachable via load() / list() after upgrade.
Fix: before running cleanup, detect the blind-migration case and
abort. When:
- legacy index exists, AND
- legacy index is not parseable as a string array, AND
- backend does not expose iteration (no orphan scan possible)
return early without removing the legacy index and without setting
the marker. A subsequent construction — maybe on an iterable backend,
maybe after the corruption is fixed — can retry migration.
All other combinations proceed as before:
- iterable backend + any legacy index state → orphan scan covers
everything; proceed and finalize.
- non-iterable + parseable legacy index → use the index; proceed.
- non-iterable + no legacy index → fresh install; proceed and
finalize (marker prevents re-scan).
Regression test: NonIterableStorage seeded with a colliding snapshot
at the legacy index path plus an orphan user payload. Constructor
must not touch either artifact, and must not set the marker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fixup(persistence): safeToFinalize gate + drop meta/ scan filter
Two Codex findings on abb557a, both real:
P1 r…
1 task
Luis85
pushed a commit
that referenced
this pull request
Apr 26, 2026
P2 #1 (bundle-trend chunk, line 605 of pre-split plan): the dedupe policy 'skip if last line equals new line minus iso' would drop weeks where bundle sizes are unchanged, turning the JSONL into a change-log instead of a snapshot time series and breaking weekly trend analysis. Fix: dedupe only on (sha, calendar-date) tuple. Re-runs of the same cron firing on the same commit + same UTC day are no-ops; identical payload week-over-week with a new sha still appends a row. Added two new tests to lock in the semantics: - dedupes a same-day same-sha re-run (workflow_dispatch retry) - appends a new row when entries are unchanged but sha differs Updated the implementation pseudocode in Step 3 to show the (sha, date) tuple comparison, and the Acceptance criteria to call out the dedupe-policy assertions explicitly. P2 #2 (mutation chunk, line 982 of pre-split plan): the proposed 'mutation:report': 'open reports/mutation/mutation.html' npm script is macOS-only - 'open' is not on PATH on Linux (xdg-open) or Windows (start), so contributors get 'command not found' when following the plan. Fix: drop the convenience script entirely. The HTML report is the deliverable; CI uploads it as a 30-day artifact (Step 6) so reviewers consume it from the GitHub UI anyway. Added a per-OS command block to the plan documenting macOS / Linux / Windows incantations for contributors who want to open the file locally. Cross-platform launcher (open-cli or process.platform branch) would buy ~3 keystrokes for one extra devDep or script file - not worth it for this surface. Plan-only commits in two of the eight chunk plans; umbrella + remaining 6 chunks unchanged.
Luis85
pushed a commit
that referenced
this pull request
Apr 26, 2026
Codex P2 #1 (line 157): cross-check 4 said 'archive predecessor regardless of tracker state' if a successor link exists, but the hard rule said 'never archive a plan with open roadmap rows, full stop'. The two paths conflicted for superseded plans that intentionally leave unfinished rows. Pick one precedence: an explicit successor link is the SOLE override of the open-row hard rule. Cross-check 4 now states this explicitly, and the hard rule lists the supersede exception verbatim with a link to the cross-check. Codex P2 #2 (line 225): the dry-run path called cat "${BODY_FILE}" but the prompt's own 'zero filesystem side effects in dry-run' rule means a compliant implementation would never have written that file. Refactor both PR-open and failure-issue snippets to keep the assembled body in a ${BODY} shell variable. Dry-run prints ${BODY} via printf — no file write. Non-dry mode still writes the cache file (re-submit-by-hand artifact stays useful) before invoking gh with --body-file. Refs Codex review on #143.
Luis85
pushed a commit
that referenced
this pull request
Apr 26, 2026
Codex P1 #1 (line 160): the run flow jumped from edit-workflow-files (step 3) → actionlint (4) → verify (5) → push (was 6). No git add / git commit between them. Following literally would push the bump branch with no commit, and 'gh pr create' would fail with no diff. Insert an explicit step 6 'commit every applied bump in a single commit' between verify and push. Codex P1 #2 (line 163): step 6 used '--body-file .actions-bump-cache/pr-body-$(date).md' but no prior step wrote that file. Add a step 7 that materialises the in-memory ${BODY} into ${BODY_FILE} via 'mkdir -p .actions-bump-cache && printf' before push. The cache file doubles as the re-submit-by-hand artifact already referenced in Failure handling. Push + PR-open is now step 8 (renumbered). Refs Codex re-review on #142.
Luis85
pushed a commit
that referenced
this pull request
Apr 26, 2026
P1 Codex finding on the merge commit: the prompt's Output policy allowed a PR only when at least one archive move was staged and otherwise forbade both PRs and issues. In runs where every candidate is ambiguous (zero archive moves but non-zero ambiguous flags), the routine had no permitted sink — owner-actionable ambiguous decisions were silently dropped. Add a SECONDARY sink: one triage issue per run under the existing `plan-recon-bot` label, fired only when archive-moves == 0 AND ambiguous-flags > 0. Distinct from the failure-issue path (which fires only on `mv` / `verify` / `push` / `pr-open` errors mid-run); the triage issue means the run completed cleanly but produced no diff to review. Triage-issue spec: title `Ambiguous plan candidates YYYY-MM-DD — <head-sha7>`, label `plan-recon-bot`, body lifts the same `Ambiguous — owner decides` block specced in the PR body, with a preamble + a `<!-- plan-recon:<head-sha7>:ambiguous-only -->` marker. Open command mirrors the failure-issue and PR-open snippets (in-memory body, optional cache file in non-dry-run for re-submit-by-hand on `gh issue create` failure). Same-day idempotency: new Skip-check #3 looks for an existing ambiguous-only triage issue at the current head SHA + run date and exits silently if one is already open. Authored by `$ROUTINE_GH_LOGIN` matches the trust-boundary pattern from #1 (archive-PR skip) and #2 (failure-issue skip). No-op handling tightened to scope it to runs with zero archive moves AND zero ambiguous flags. The Output preamble now spells out PR-vs-triage-issue-vs-no-op as a tri-state policy. Dry-run mode section's `gh issue create` bullet picks up the new triage path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All 288 tests pass, typecheck clean, lint clean, build succeeds.
https://claude.ai/code/session_01RnaVtvGe6LYDDntRUAxuyA