feat(demo,tfjs): N-way softmax learning mode (row 17)#94
Conversation
Replaces the scalar-urgency Learning-mode policy with a 7-way softmax over the active-care skills (feed / clean / play / rest / pet / medicate / scold). The bundled learning.network.json baseline is re-authored at the new [5, 16, 7] topology via a deterministic seed script. interpret picks argmax with an idle-threshold floor; the heuristic candidate fallback retires. Library: JSDoc example on TfjsReasoner showing the N-way pattern (no source change). Minor changeset queued. Closes row 17 of docs/plans/2026-04-25-comprehensive-polish-and-harden.md. Open question 2 resolved: expression skills stay heuristic-reactive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
size-limit report 📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e1080b3cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // All-zero label: pulls the success-column probability down without | ||
| // implying any other skill was the right call. | ||
| return { features, label: new Array<number>(SOFTMAX_DIM).fill(0) }; |
There was a problem hiding this comment.
Avoid zero-vector labels with categorical crossentropy
When details.failed === true, this returns an all-zero target vector, but with categoricalCrossentropy the loss is -Σ y_i log p_i, so y=0 everywhere yields zero loss and zero gradient. In practice, failed outcomes are ignored by training, so the model never learns to decrease confidence for unsuccessful skills. This is a regression in learning signal quality for any run that includes failure samples and should use a valid target distribution/objective for negatives.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9a43185 — now returns null for failed outcomes (skips the buffer slot) instead of emitting an all-zero target. Test labels SkillFailed → skips SkillFailed updated to match. Will revisit when row 18 lifts negative signal into a richer reward field.
| for (let i = 1; i < SOFTMAX_DIM; i++) { | ||
| const v = output[i] ?? 0; |
There was a problem hiding this comment.
Validate softmax width before interpreting model output
This logic silently pads missing output columns with 0 (output[i] ?? 0) instead of verifying the model actually emits 7 logits. Persisted pre-row-17 snapshots are still loadable with a single sigmoid output, so after upgrade they can be misread as a 7-way softmax and frequently map to feed whenever the old scalar exceeds the idle threshold. Add an explicit output-dimension/snapshot-compatibility check so incompatible saved models are rejected and rebuilt from the new baseline.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9a43185 — learningMode.construct() now validates r.getModel().outputs[0].shape against SOFTMAX_DIM (7) and throws on mismatch, so the existing try/catch falls back to the bundled baseline. interpretSoftmax also gains a defensive length guard that idles instead of ?? 0-padding. New test rebuilds a [5, 4, 1] sigmoid snapshot inline, persists it, and asserts the switcher attaches a 7-output reasoner.
…apshots Addresses Codex P1 findings on PR #94: 1. `projectLearningOutcome` no longer returns an all-zero target vector for failed outcomes. Under `categoricalCrossentropy` (`-Σ y_i log p_i`) y=0 everywhere yields zero loss and zero gradient — failures would silently no-op a buffer slot instead of training the model. Skip them at the projection layer; revisit when row 18 lifts negative signal into a proper reward field. 2. `learningMode.construct()` now validates the rebuilt model's output dimension against `SOFTMAX_DIM` before compiling. A persisted pre-row-17 snapshot (single sigmoid output) was structurally compatible with `TfjsReasoner.fromJSON` but its scalar urgency would silently be treated as the `feed` column of a 7-way softmax — producing a "trained" pet that always feeds whenever the old scalar exceeded the idle threshold. Throw on dim mismatch so the existing try/catch falls back to the bundled baseline. 3. `interpretSoftmax` defensive width check: idle (return null) on a wrong-length output rather than `?? 0`-padding, which biased argmax toward column 0. New test: `construct() rebuilds from baseline when a pre-row-17 single-output snapshot is persisted` — builds a [5, 4, 1] sigmoid model inline, persists its toJSON snapshot, mounts the demo, asserts the reasoner the switcher attaches has the expected 7-dim output (i.e. the guard fired and baseline was rebuilt). Updated test: `labels SkillFailed outcomes ...` → `skips SkillFailed outcomes` to match the new projection behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a43185789
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (lastDim !== SOFTMAX_DIM) { | ||
| throw new Error( | ||
| `learning: persisted snapshot has output dim ${String(lastDim)}, expected ${SOFTMAX_DIM} — rebuilding from bundled baseline.`, |
There was a problem hiding this comment.
Dispose rejected reasoner before falling back to baseline
When a persisted snapshot has the wrong output width (for example, an old 1-logit model), hydrate() constructs a TfjsReasoner and then throws on the dimension check, but the newly built model is never disposed before the catch path rebuilds from networkJson. Re-entering Learning mode with an incompatible persisted snapshot repeatedly will leak tfjs model/tensor resources each time. Dispose r before throwing (or validate in a try/finally) so the fallback path does not accumulate leaked models.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 07ff914 - rebuilt reasoner is disposed before the dim-mismatch throw fires, so the catch path no longer leaks a tfjs model per incompatible-snapshot retry.
| units: 16, | ||
| activation: 'sigmoid', | ||
| inputShape: [5], | ||
| kernelInitializer: 'glorotNormal', |
There was a problem hiding this comment.
Seed dense initializers in learning-network generator
The script documents deterministic output from a fixed SEED, but both Dense layers use unseeded glorotNormal initializers here, so initial weights still depend on ambient randomness. That makes repeated runs of seed:learning-network produce different baseline snapshots even with fixed data generation and shuffle: false, which undermines the reproducibility the script and plan notes promise. Pass an explicit seed into each initializer (or equivalent tfjs seeding) to make regeneration stable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 07ff914 - both Dense layers now receive explicit seeds via initializers.glorotNormal seed factory (SEED and SEED+1 so the two weight matrices don't init identically). Regenerated learning.network.json with the seeded init - reproducible byte-for-byte across runs.
Addresses Codex P2 findings on PR #94: 1. `learningMode.construct()` now `r.dispose()`s the rebuilt-but- incompatible reasoner before throwing the dim-mismatch error. The catch path then constructs a fresh baseline. Without disposal, every re-entry into Learning mode with an old persisted snapshot would leak one tfjs model + its weight tensors. 2. `scripts/seed-learning-network.ts` passes explicit `seed` values to each Dense layer's `glorotNormal` initializer (via `initializers.glorotNormal({ seed })`). The script's docstring promised determinism under a fixed `SEED`, but the unseeded default initializers were drawing from `Math.random()` — repeated runs produced different baselines. Each layer gets a distinct seed (`SEED`, `SEED + 1`) so the two columns don't share the same weight matrix at init. 3. `learning.network.json` regenerated under the seeded init. Initial loss 2.026 → final loss 0.906 across 50 epochs on the 200-pair hand-crafted dataset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07ff91442d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (let i = 0; i < NEED_IDS.length; i++) { | ||
| const level = agent.rng.next(); | ||
| features.push(level); | ||
| if (level < min) min = level; | ||
| features.push(agent.rng.next()); | ||
| } | ||
| const urgency = 1 - min; | ||
| return { features, label: [urgency] }; | ||
| return { features, label: featuresToOneHotLabel(features) }; |
There was a problem hiding this comment.
Balance synthetic labels for all seven softmax classes
The Train button now builds pairs from uniformly random feature vectors and immediately labels them with featuresToOneHotLabel, but that mapping makes some classes vanishingly rare (e.g., pet requires all 5 needs > 0.7, so it appears with probability about 0.3^5 ≈ 0.24%, i.e. usually zero samples in a 30-pair run). In practice this means repeated training heavily reinforces only a subset of classes and can wash out underrepresented outputs, which conflicts with the intended 7-way softmax policy seeded in scripts/seed-learning-network.ts via explicit archetypes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9f63926 - replaced uniform-random sampling with stratified round-robin. New generateArchetypeFeatures(rng, skillIdx) emits a feature vector inside the requested class's region; round-robin over 7 classes for 30 pairs gives ~4-5 samples per class. featuresToOneHotLabel retired (generator emits class-correct features by construction). Range bands exclude sibling-archetype boundaries so feed cant drift into pet or scold.
Addresses Codex P2 finding on PR #94 (cognitionSwitcher.ts:428). The previous Train-button data generator drew uniform-random feature vectors and let `featuresToOneHotLabel` derive the class. That left under-represented classes drastically rare in a 30-pair batch: pet: min(needs) > 0.7 → 0.3^5 ≈ 0.24% sample probability scold: hap > 0.8 ∧ en < 0.4 → ~8% sample probability In practice every Train click reinforced the maintenance classes and washed out `pet` / `scold`, conflicting with the bundled baseline's explicit archetype seeding from `scripts/seed-learning-network.ts`. Replaced with stratified round-robin sampling — each pair `i` targets class `i % 7` and `generateArchetypeFeatures(rng, skillIdx)` builds a feature vector inside that class's archetype region (e.g. `pet`'s every need ∈ [0.75, 1.0], `scold`'s `happiness ∈ [0.85, 1.0] ∧ energy ∈ [0, 0.35]`). With 30 pairs / 7 classes ≈ 4–5 samples per class, every softmax column now sees roughly equal training pressure. `featuresToOneHotLabel` retired — no longer needed since the generator emits class-correct features by construction. Range bands deliberately exclude sibling-archetype boundaries so a `feed` sample can't drift into `pet` or `scold` territory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f6392660e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const levels = agent.getState().needs; | ||
| const features = [ | ||
| levels.hunger ?? 0, |
There was a problem hiding this comment.
Record pre-skill needs when building one-hot labels
projectLearningOutcome now builds one-hot action targets from the skill that just succeeded, but it reads features from agent.getState().needs after that skill has already mutated needs. For example, feed increases hunger, so this stores samples equivalent to “high hunger → feed”, which directly conflicts with the seeded/train-button policy shape (“low hunger → feed”) and will push online learning updates in the wrong direction over time. Capture the needs snapshot before skill execution (or pass pre-action features through the outcome) so labels match the state the policy actually decided from.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in fa657ec - CognitionPipeline.invokeSkillAction now snapshots agent.needs at the top of the function, before any branch fires. The snapshot lands on outcome.details.preNeeds for both success and failure paths. Demo projection prefers details.preNeeds and falls back to the live snapshot only defensively. Learner.ts JSDoc + 2 new tests cover the contract.
Addresses Codex P1 finding on PR #94 (learning.ts:261). `projectLearningOutcome` was reading features from `agent.getState().needs` at score-time — *after* the skill had already mutated needs via its `SkillContext`. For `feed`, hunger goes low → high, so the resulting training pair "high hunger → feed" inverts the policy direction the network is supposed to learn. Library-side fix: - `CognitionPipeline.invokeSkillAction` snapshots `agent.needs.list()` into `Record<needId, level>` BEFORE any branch (capability gate, not-registered, throw, err return, success — every Stage-8 score call gets the same snapshot). - The snapshot lands on `LearningOutcome.details.preNeeds`. Omitted when the agent has no `Needs` subsystem. - `Learner.ts` JSDoc updated to document the new field + when to use it instead of the live `agent.getState().needs` snapshot. Demo-side fix: - `projectLearningOutcome` now reads `details.preNeeds` first and falls back to the live snapshot only as a defensive default. JSDoc rewritten around the inversion-risk story. Tests: - New: `captures pre-skill needs in outcome.details.preNeeds for the success branch` — wires a `Needs` subsystem with hunger=0.2, runs a skill that satisfies hunger by +0.5, asserts `preNeeds.hunger ≈ 0.2` (the pre-effect value). - New: `omits preNeeds when the agent has no Needs subsystem` — baseline agent without needs sees `details.preNeeds === undefined`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…tput-softmax # Conflicts: # docs/plans/2026-04-25-comprehensive-polish-and-harden.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 412824f841
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const outShape = r.getModel().outputs[0]?.shape; | ||
| const lastDim = outShape && outShape.length > 0 ? outShape[outShape.length - 1] : null; | ||
| if (lastDim !== SOFTMAX_DIM) { |
There was a problem hiding this comment.
Validate persisted model input width before accepting snapshot
This compatibility guard only checks the output dimension, so a persisted snapshot with 7 outputs but a different input width will pass here and then fail at runtime when featuresFromNeeds provides a 5-element vector to model.predict. In that case Learning mode can repeatedly throw on intention selection instead of falling back to the bundled baseline. Validate the rebuilt model's input shape (expected last input dim 5) alongside SOFTMAX_DIM so incompatible snapshots are rejected early.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 862dcd8. Added FEATURE_DIM=5 constant; hydrate guard now validates both input layer last-dim and output layer last-dim, throwing the merged-shape error so the catch path falls back to the bundled baseline before any runtime predict call.
|
@codex review |
Hydration guard previously checked output dim only. A snapshot with the right SOFTMAX_DIM but wrong input width would pass `fromJSON` and then throw at runtime when `featuresFromNeeds` first handed it a 5-element vector — Learning mode would error on every intention selection instead of falling back to the bundled baseline. Reject on either input or output mismatch so the catch path takes over. Codex P2 finding on 412824f.
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
## Summary
Doc-audit pass over `docs/plans` + `docs/specs`. Three things land
together:
- **`docs/archive/{plans,specs}/`** — new home for plans whose roadmap
rows have all shipped (or whose goals were folded into a successor)
and specs whose design is now reflected in code. Includes a
`README.md` explaining the policy; `CLAUDE.md` documents the
convention.
- **`git mv` 23 plans + 3 specs into the archive.** The active live
set is now the comprehensive polish-and-harden plan plus three
specs (post-tfjs improvements, mvp-demo, vision), each with a
refreshed status banner.
- **Refresh the live comprehensive plan** against current `develop`:
- PR column updated for rows 16/19/20/3/4/22 (now shipped via
PRs #91 / #98 / #104 / #110 / #113 / #111).
- New "Post-roadmap follow-ups" section covers PRs #92 → #125
(review-bot infra, tracker findings, demo + tfjs hotfixes,
tooling).
- Stale prose-baked counts dropped (size budgets now reference
`package.json#size-limit` only).
- Coverage-thresholds section gains a pointer to the sticky PR
comment shipped in PR #124.
## Other doc fixes
- `README.md`: drop the unverifiable "Phase A milestones (M0–M15) are
all green" claim — the milestones don't exist as documented IDs
anywhere; replace with a pointer to the live polish plan.
- `vision.md`: refresh cadence note (was pinned to 2026-04-19 + "next
review at 1.0").
- `2026-04-24-post-tfjs-improvements.md`: mark recommended-order items
that have shipped (PRs #61, #76, #77, #83, #84, #91, #94, #96,
#104, #113), link the active roadmap as the heir.
- `mvp-demo.md`: status banner explaining where active polish work is
now tracked.
## Mechanical
- Update inline cross-refs in `CLAUDE.md`, `eslint.config.js`,
`src/agent/{Agent,AgentModule}.ts`, `tests/unit/exports.test.ts`,
and `docs/daily-reviews/2026-04-25.md` to point at the new
`docs/archive/` paths so links keep resolving.
No code change beyond comment-path updates.
## Test plan
- [x] `npm run verify` green (`format:check` + `lint` + `typecheck` +
`test` + `build` + `docs`). 523 tests pass; the 2 lint warnings
are pre-existing (`CognitionPipeline.invokeSkillAction` complexity
+ `scoreFailure` param count) and on the ratchet menu.
- [x] `git ls-files docs/archive/` shows the moved files; renames are
preserved (`git log --follow` works for any moved file).
- [ ] Codex review: clean, no blockers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Luis Mendez <hallo@luis-mendez.de>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes row 17 of
docs/plans/2026-04-25-comprehensive-polish-and-harden.md. Replaces the demo's scalar-urgency Learning policy with a 7-way softmax over the active-care skills.[5, 16, 7]topology baseline (seeded viascripts/seed-learning-network.ts);interpretpicksargmaxwith an idle-threshold floor; one-hot labels in the Train-button synthetic dataset; loss switched tocategoricalCrossentropy.TfjsReasonershowing the N-way softmax pattern. No code change. Minor bump.meow/sad/sleepy) stay in the heuristic-reactive layer; the softmax is over the 7 active-care skills only.Test plan
npm run verifygreen.interpretSoftmaxpicksargmax, respects the idle-threshold floor, and emits each of the 7 skill ids when its column dominates.learningMode.train.test.tsupdated for the 7-vector output shape;weightsShapestopology contract pinned.TfjsReasonerbundled-baseline test rewritten to assert softmax invariants (length 7, all in [0, 1], sums to ~1) instead of the old scalar sigmoid output.dist/size-limit regressions (tfjs adapter 6.47 KB / 7 KB; main 37.41 KB / 50 KB).Notes for review
scripts/seed-learning-network.tsone-shot is deterministic (literal LCG seed0x5eed_17) but is not wired intoverify— the npm alias isnpm run seed:learning-network, run on demand to refresh the bundled baseline.meanSquaredError->categoricalCrossentropyswap inlearning.ts's model.compile is the right loss for a softmax distribution.learning.ts construct().eslint.config.js+tsconfig.jsonlearned about the newscripts/directory (treated liketests/andexamples/— globals + console + non-null assertions allowed).expressskills) are skipped atprojectLearningOutcomeso they don't pollute the active-care baseline.🤖 Generated with Claude Code