feat(demo): grow learning-mode feature vector to 13 dims (row 18)#96
Conversation
Extends `featuresFromNeeds` from the 5 need levels to 13 dims: - 5 need levels (existing) - 4 mood one-hot dims (`MOOD_KEYS = ['happy', 'sad', 'sleepy', 'playful']`) — off-roster moods collapse to all-zero on this section. - 1 active-modifier count, normalized via `min(count, 5) / 5`. - 3 recent-event counts (`SkillCompleted` / `SkillFailed` / `NeedCritical`) over a 30-tick sliding window, each normalized via the same cap. Mood + recent-event state lives in module-scoped variables populated by the standard event bus: `setLearningAgentId(id)` was renamed to `setLearningAgent(agent)` (pre-1.0 clean break, no compat shim). The new function subscribes to `MoodChanged`, `AgentTicked`, `SkillCompleted`, `SkillFailed`, `NeedCritical` and tears down via `null` invocation. `projectLearningOutcome` now produces 13-dim training pairs at outcome time. Mood + modifier-count come from `agent.getState()`; event counts come from the same module-scoped window. `cognitionSwitcher.ts`'s synthetic Train-button generator appends 8 uniform-`[0, 1]` noise dims to each archetype sample so the network learns to ignore them under the synthetic regime — real predictive weight on those dims comes from the in-game `TfjsLearner` reinforcement loop. `scripts/seed-learning-network.ts` now builds a `[13, 16, 7]` Sequential and pads training pairs with the matching uniform-noise tail; the bundled `learning.network.json` was regenerated. The hydration guard's existing input + output dim check (PR #94) lets old 5-input snapshots fail fast and fall back to the new baseline. Library impact: none — `featuresOf` is consumer-supplied. No changeset. Plan rows 17 + 18 marked shipped in the same diff.
size-limit report 📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 116b20b86c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const state = agent.getState(); | ||
| const moodCategory = state.mood?.category; | ||
| const moodOneHot = MOOD_KEYS.map((key) => (moodCategory === key ? 1 : 0)); | ||
| const modCount = Math.min(state.modifiers.length, COUNT_NORM_CAP) / COUNT_NORM_CAP; | ||
| let completed = 0; |
There was a problem hiding this comment.
Use pre-skill modifier count when projecting training features
projectLearningOutcome labels a sample for the action chosen from pre-skill state, but this path reads state.modifiers.length after the skill has already executed. In this codebase several default skills mutate modifiers (for example FeedSkill adds well-fed, CleanSkill removes dirty, and ScoldSkill adds/removes modifiers), so training inputs can encode post-action modifier state while inference (featuresFromNeeds) uses pre-action ctx.modifiers. That mismatch leaks action effects into inputs and can train a policy that does not match runtime feature semantics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cfdc8c4. featuresFromNeeds now snapshots the 8-dim rich suffix into lastRichFeatureSuffix at Stage 4 (pre-skill); projectLearningOutcome reuses that snapshot at Stage 8 instead of recomputing from agent.getState() (which would encode FeedSkill/CleanSkill/ScoldSkill modifier mutations). Fallback path recomputes from current state only when no featuresFromNeeds call has run yet — test harnesses scoring outcomes without driving the reasoner first.
| case 'SkillCompleted': | ||
| recentEvents.push({ tick: currentTick, kind: 'completed' }); | ||
| return; |
There was a problem hiding this comment.
De-duplicate SkillCompleted events in recent-event features
This listener counts every SkillCompleted event, but successful feed emits two such events in this repo (one from FeedSkill.execute, one from CognitionPipeline.invokeSkillAction). As a result, recent:SkillCompleted increments twice for feed and once for most other skills, so the new feature no longer represents true completion frequency and can bias both prediction and learning toward event-emitter quirks.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cfdc8c4. Added a per-tick dedupe Set keyed by ${kind}:${skillId | needId} cleared on AgentTicked. Default skills that emit SkillCompleted from execute() (FeedSkill etc.) plus the kernel's emission now count once per skillId per tick. Same dedupe applied to SkillFailed (skillId) and NeedCritical (needId).
Two Codex findings on PR #96 (commit 116b20b): P1 — `projectLearningOutcome` was reading `state.modifiers.length` post-skill, but several default skills mutate modifiers during `execute()` (FeedSkill adds `well-fed`, CleanSkill removes `dirty`, ScoldSkill add/remove). Training inputs would encode action effects while inference (`featuresFromNeeds`) used pre-action state. Fix: snapshot the 8-dim rich-feature suffix during `featuresFromNeeds` at Stage 4 (pre-skill) into `lastRichFeatureSuffix`, then reuse it at Stage 8 outcome projection. Fall back to a from-state recompute only when no `featuresFromNeeds` call has run yet (test harnesses). P2 — `recent:SkillCompleted` etc. counted every emission, but the default `FeedSkill` (and others) emit SkillCompleted from inside `execute()` AND the kernel emits a second one from `CognitionPipeline.invokeSkillAction`. Doubled count for affected skills biased the feature toward emitter quirks. Fix: per-tick dedupe set keyed by `${kind}:${skillId | needId}`, cleared on every AgentTicked event.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfdc8c48bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // arrived before any `featuresFromNeeds` call (e.g. the consumer's | ||
| // first invocation of `learner.score` happens without a tick | ||
| // boundary, like the demo's test harness). | ||
| const suffix = lastRichFeatureSuffix ?? recomputeRichFeatureSuffixFromAgent(agent); |
There was a problem hiding this comment.
Snapshot rich suffix per scored outcome
projectLearningOutcome always reuses lastRichFeatureSuffix when it is non-null, but that suffix is only refreshed in featuresFromNeeds (Stage 4). Player interactions are executed from the reactive handler path in Stage 1 (InteractionRequested → invokeSkill) before Stage 4 runs for that tick, so those outcomes are paired with current preNeeds and a stale mood/modifier/event suffix from a previous tick. This creates feature/label misalignment for interaction-driven training samples and can push the learner toward incorrect correlations.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in bbbe6a8. Took a different approach than the Stage-4 cache: added a small library snapshot (details.preModifierCount) in CognitionPipeline.invokeSkillAction, and dropped lastRichFeatureSuffix entirely. projectLearningOutcome now recomputes the suffix at outcome time — mood from current state (MoodModel reconciles Stage 2.7, before skill execution), modifier count from details.preModifierCount, event counts from current recentEvents minus 1 for the matching kind to compensate for self-emission. Player-click paths get the same pre-skill snapshot the reasoner-driven path gets, since the kernel snapshots before invoking the skill regardless of caller. Minor bump changeset added.
| const key = `completed:${e.skillId ?? '?'}`; | ||
| if (processedThisTick.has(key)) return; | ||
| processedThisTick.add(key); |
There was a problem hiding this comment.
Don't collapse distinct same-skill events in one tick
The per-tick dedupe key uses only completed:${skillId} (with the same pattern for failed/critical), so multiple legitimate events of the same kind and id in one tick are treated as duplicates. This happens when users queue repeated interactions quickly or when multiple matching actions run in the same tick, causing recentEvents to undercount true frequency and weakening the new recent-event feature dimensions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Kept the per-tick ${kind}:${skillId|needId} dedupe — the actual case it addresses (FeedSkill etc. emit SkillCompleted from execute() AND the kernel emits a second one) is real and consistent across default skills, while the legitimate-distinct-twice-per-tick case is hypothetical for the demo's pacing. Reasoned in the bbbe6a8 commit message + JSDoc; the new -1 self-emission compensation in projectLearningOutcome relies on this single-increment-per-tick invariant. If a consumer skill genuinely emits multiple distinct SkillCompleted of the same skillId in one tick, the recent-event count under-represents by 1 per pair.
Two Codex findings on PR #96 commit cfdc8c4: P1 — `lastRichFeatureSuffix` was only refreshed in `featuresFromNeeds` (Stage 4). Player-click paths skip the reasoner, so outcomes scored from button clicks reused a stale suffix from a prior tick. Library: `CognitionPipeline.invokeSkillAction` now snapshots `agent.modifiers.list().length` alongside `preNeeds` and includes it on every `LearningOutcome.details` payload as `preModifierCount`. Mirrors the existing `preNeeds` mechanism. Minor bump (additive field). Demo: dropped the `lastRichFeatureSuffix` cache. `projectLearningOutcome` now recomputes the suffix at outcome time: - Mood from `agent.getState().mood?.category` — `MoodModel` reconciles at Stage 2.7 (before Stage 7 skill execution), so this is still pre-skill state for any single-skill invocation. - Modifier count from `details.preModifierCount` (kernel-supplied); falls back to current state for consumer-emitted outcomes. - Event counts from current `recentEvents` window with `completed` decremented by 1 to compensate for the outcome's own emission (dedupe ensures only +1 was added). P2 — Earlier dedupe key collapsed all same-skill events per tick. Kept the dedupe but acknowledged in JSDoc that legitimate distinct same-skill emissions per tick (rare in the demo's pacing) collapse to one increment. Trade-off reasoned: the actual kernel+skill double-emit is consistent for default skills; legitimate-twice-per-tick is hypothetical. The -1 self-emission compensation in `projectLearningOutcome` relies on this single-increment-per-tick invariant.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbbe6a8846
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const key = `completed:${e.skillId ?? '?'}`; | ||
| if (processedThisTick.has(key)) return; | ||
| processedThisTick.add(key); |
There was a problem hiding this comment.
Preserve distinct same-skill events in one tick
This dedupe key collapses all SkillCompleted events for the same skillId within a tick, not just the duplicate kernel+skill emission from a single invocation. In this repo, Agent.dispatchReactiveHandlers processes every queued InteractionRequested event in order (src/agent/Agent.ts), so two quick identical interactions before one tick can legitimately invoke the same skill twice; the second completion is dropped here. That undercounts recentEvents features and skews projectLearningOutcome's pre-skill event-count compensation for later invocations in the same tick.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 77e0979 by changing the counting source. SkillCompleted/SkillFailed are no longer subscribed via agent.subscribe — they're counted via the learner's own outcome stream (buildLearningLearner now wraps toTrainingPair to project first, then call recordOutcomeForFeatureWindow). Kernel invocations produce exactly one outcome each → one count per invocation. dispatchReactiveHandlers's back-to-back same-skill case now counts both correctly; the kernel+skill SkillCompleted double-emit no longer double-counts (only the kernel's score call reaches the learner). Dropped processedThisTick + the -1 self-emission compensation. NeedCritical stays on the subscription path (single emitter).
Codex P2 round 3 on PR #96 commit bbbe6a8: per-tick \`\${kind}:\${skillId}\` dedupe collapses legitimate distinct same-skill events. \`Agent.dispatchReactiveHandlers\` can process two queued \`InteractionRequested\` events in one tick, legitimately invoking the same skill twice; the second completion was dropped. Switched the SkillCompleted/SkillFailed counting source from \`agent.subscribe('SkillCompleted'|'SkillFailed')\` to the learner's own outcome stream. \`buildLearningLearner\` now wraps \`toTrainingPair\` to: 1. Project the outcome (sees pre-this-outcome counts). 2. THEN call \`recordOutcomeForFeatureWindow\` to push the matching kind into \`recentEvents\`. Each kernel invocation produces exactly one outcome → exactly one push. Kernel + skill double-emits don't double-count (only the kernel's score call reaches the learner). Reactive-handler back-to-back invocations DO count separately. Both Codex P2 round 1 (double-emit) and round 3 (legitimate distinct same-skill in one tick) are addressed. Dropped the \`processedThisTick\` Set + the -1 self-emission compensation in \`projectLearningOutcome\` (no longer needed since projection runs before the increment). \`NeedCritical\` stays on the subscription path — \`NeedsTicker\` is the only emitter.
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
## Summary
Doc-audit pass over `docs/plans` + `docs/specs`. Three things land
together:
- **`docs/archive/{plans,specs}/`** — new home for plans whose roadmap
rows have all shipped (or whose goals were folded into a successor)
and specs whose design is now reflected in code. Includes a
`README.md` explaining the policy; `CLAUDE.md` documents the
convention.
- **`git mv` 23 plans + 3 specs into the archive.** The active live
set is now the comprehensive polish-and-harden plan plus three
specs (post-tfjs improvements, mvp-demo, vision), each with a
refreshed status banner.
- **Refresh the live comprehensive plan** against current `develop`:
- PR column updated for rows 16/19/20/3/4/22 (now shipped via
PRs #91 / #98 / #104 / #110 / #113 / #111).
- New "Post-roadmap follow-ups" section covers PRs #92 → #125
(review-bot infra, tracker findings, demo + tfjs hotfixes,
tooling).
- Stale prose-baked counts dropped (size budgets now reference
`package.json#size-limit` only).
- Coverage-thresholds section gains a pointer to the sticky PR
comment shipped in PR #124.
## Other doc fixes
- `README.md`: drop the unverifiable "Phase A milestones (M0–M15) are
all green" claim — the milestones don't exist as documented IDs
anywhere; replace with a pointer to the live polish plan.
- `vision.md`: refresh cadence note (was pinned to 2026-04-19 + "next
review at 1.0").
- `2026-04-24-post-tfjs-improvements.md`: mark recommended-order items
that have shipped (PRs #61, #76, #77, #83, #84, #91, #94, #96,
#104, #113), link the active roadmap as the heir.
- `mvp-demo.md`: status banner explaining where active polish work is
now tracked.
## Mechanical
- Update inline cross-refs in `CLAUDE.md`, `eslint.config.js`,
`src/agent/{Agent,AgentModule}.ts`, `tests/unit/exports.test.ts`,
and `docs/daily-reviews/2026-04-25.md` to point at the new
`docs/archive/` paths so links keep resolving.
No code change beyond comment-path updates.
## Test plan
- [x] `npm run verify` green (`format:check` + `lint` + `typecheck` +
`test` + `build` + `docs`). 523 tests pass; the 2 lint warnings
are pre-existing (`CognitionPipeline.invokeSkillAction` complexity
+ `scoreFailure` param count) and on the ratchet menu.
- [x] `git ls-files docs/archive/` shows the moved files; renames are
preserved (`git log --follow` works for any moved file).
- [ ] Codex review: clean, no blockers.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Luis Mendez <hallo@luis-mendez.de>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
MOOD_KEYS = ['happy','sad','sleepy','playful']) + 1 normalized active-modifier count + 3 normalized recent-event counts (SkillCompleted/SkillFailed/NeedCritical) over a 30-tick window.setLearningAgentId(id)→setLearningAgent(agent)(pre-1.0 clean break, no compat shim). The new function subscribes to the standard event bus to populate module-scoped mood + recent-event state without widening the adapter'shelpersshape.projectLearningOutcomenow produces 13-dim training pairs at outcome time;cognitionSwitcher.ts's synthetic Train-button generator +scripts/seed-learning-network.tsboth pad with uniform-[0, 1]noise so labels stay conditioned on the 5 need dims.examples/nurture-pet/src/cognition/learning.network.jsonregenerated at the new[13, 16, 7]topology. Hydration's existing input + output dim guard (PR feat(demo,tfjs): N-way softmax learning mode (row 17) #94) lets old 5-input snapshots fail fast and fall back to the new baseline.Test plan
npm run verifygreen (format / lint / typecheck / 491 tests / build / docs)weightsShapestopology test updated to[13, 16] / [16] / [16, 7] / [7](tests/examples/learningMode.train.test.ts,tests/unit/cognition/adapters/TfjsReasoner.test.ts)[0.05, 0.6, 0.6, 0.6, 0.6, 0, 0, 0, 0, 0, 0, 0, 0]and still asserts softmax invariants (7-vector, ∈ [0,1], sums to 1)feedback_docs_alongside_prNotes for review
featuresOfis consumer-supplied. No changeset.content,angry,bored,sick,scared) collapse to all-zero on that section. This is intentional — the network treats those as "uninformative" rather than mistakenly grouping them with one of the 4 picked categories.[0, 1]noise on the 8 trailing dims because the labels don't condition on them. This trains the network to ignore those dims under the synthetic regime; predictive weight on mood / modifier / event counts comes only from the in-gameTfjsLearnerreinforcement loop fed by real outcomes.projectLearningOutcome's mood / modifier-count / event-count dims are read at outcome time (post-skill). The need dims still come fromdetails.preNeedsto preserve the row-17 invariant that need-level features reflect pre-skill state. Mood + modifier counts are stable across a single skill invocation; the event-count window includes the outcome's own emission, but that bias is uniform across every training pair so the net gradient direction is unaffected.🤖 Generated with Claude Code