Skip to content

feat(demo): grow learning-mode feature vector to 13 dims (row 18)#96

Merged
Luis85 merged 4 commits into
developfrom
feat/demo-richer-feature-vector
Apr 25, 2026
Merged

feat(demo): grow learning-mode feature vector to 13 dims (row 18)#96
Luis85 merged 4 commits into
developfrom
feat/demo-richer-feature-vector

Conversation

@Luis85
Copy link
Copy Markdown
Owner

@Luis85 Luis85 commented Apr 25, 2026

Summary

  • Extends Learning mode's feature vector from 5 need levels to 13 dims: 5 needs + 4 mood one-hot (MOOD_KEYS = ['happy','sad','sleepy','playful']) + 1 normalized active-modifier count + 3 normalized recent-event counts (SkillCompleted/SkillFailed/NeedCritical) over a 30-tick window.
  • Renames setLearningAgentId(id)setLearningAgent(agent) (pre-1.0 clean break, no compat shim). The new function subscribes to the standard event bus to populate module-scoped mood + recent-event state without widening the adapter's helpers shape.
  • projectLearningOutcome now produces 13-dim training pairs at outcome time; cognitionSwitcher.ts's synthetic Train-button generator + scripts/seed-learning-network.ts both pad with uniform-[0, 1] noise so labels stay conditioned on the 5 need dims.
  • Bundled examples/nurture-pet/src/cognition/learning.network.json regenerated at the new [13, 16, 7] topology. Hydration's existing input + output dim guard (PR feat(demo,tfjs): N-way softmax learning mode (row 17) #94) lets old 5-input snapshots fail fast and fall back to the new baseline.

Test plan

  • npm run verify green (format / lint / typecheck / 491 tests / build / docs)
  • weightsShapes topology test updated to [13, 16] / [16] / [16, 7] / [7] (tests/examples/learningMode.train.test.ts, tests/unit/cognition/adapters/TfjsReasoner.test.ts)
  • Bundled-baseline test feeds a neutral-zero rich tail [0.05, 0.6, 0.6, 0.6, 0.6, 0, 0, 0, 0, 0, 0, 0, 0] and still asserts softmax invariants (7-vector, ∈ [0,1], sums to 1)
  • Plan rows 17 + 18 marked shipped in the same diff per feedback_docs_alongside_pr

Notes for review

  • Library impact: nonefeaturesOf is consumer-supplied. No changeset.
  • The 4 mood one-hot keys cover the demo's most narratively-distinct states; off-roster moods (content, angry, bored, sick, scared) collapse to all-zero on that section. This is intentional — the network treats those as "uninformative" rather than mistakenly grouping them with one of the 4 picked categories.
  • The synthetic Train-button data appends uniform [0, 1] noise on the 8 trailing dims because the labels don't condition on them. This trains the network to ignore those dims under the synthetic regime; predictive weight on mood / modifier / event counts comes only from the in-game TfjsLearner reinforcement loop fed by real outcomes.
  • projectLearningOutcome's mood / modifier-count / event-count dims are read at outcome time (post-skill). The need dims still come from details.preNeeds to preserve the row-17 invariant that need-level features reflect pre-skill state. Mood + modifier counts are stable across a single skill invocation; the event-count window includes the outcome's own emission, but that bias is uniform across every training pair so the net gradient direction is unaffected.

🤖 Generated with Claude Code

Extends `featuresFromNeeds` from the 5 need levels to 13 dims:

- 5 need levels (existing)
- 4 mood one-hot dims (`MOOD_KEYS = ['happy', 'sad', 'sleepy',
  'playful']`) — off-roster moods collapse to all-zero on this section.
- 1 active-modifier count, normalized via `min(count, 5) / 5`.
- 3 recent-event counts (`SkillCompleted` / `SkillFailed` /
  `NeedCritical`) over a 30-tick sliding window, each normalized via
  the same cap.

Mood + recent-event state lives in module-scoped variables populated
by the standard event bus: `setLearningAgentId(id)` was renamed to
`setLearningAgent(agent)` (pre-1.0 clean break, no compat shim). The
new function subscribes to `MoodChanged`, `AgentTicked`,
`SkillCompleted`, `SkillFailed`, `NeedCritical` and tears down via
`null` invocation.

`projectLearningOutcome` now produces 13-dim training pairs at
outcome time. Mood + modifier-count come from `agent.getState()`;
event counts come from the same module-scoped window.
`cognitionSwitcher.ts`'s synthetic Train-button generator appends 8
uniform-`[0, 1]` noise dims to each archetype sample so the network
learns to ignore them under the synthetic regime — real predictive
weight on those dims comes from the in-game `TfjsLearner`
reinforcement loop.

`scripts/seed-learning-network.ts` now builds a `[13, 16, 7]`
Sequential and pads training pairs with the matching uniform-noise
tail; the bundled `learning.network.json` was regenerated. The
hydration guard's existing input + output dim check (PR #94) lets
old 5-input snapshots fail fast and fall back to the new baseline.

Library impact: none — `featuresOf` is consumer-supplied. No
changeset.

Plan rows 17 + 18 marked shipped in the same diff.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 25, 2026

size-limit report 📦

Path Size
dist/index.js (gzip) 37.13 KB (+0.11% 🔺)
dist/integrations/excalibur/index.js (gzip) 1.4 KB (0%)
dist/cognition/adapters/mistreevous/index.js (gzip) 1.14 KB (0%)
dist/cognition/adapters/js-son/index.js (gzip) 1.36 KB (0%)
dist/cognition/adapters/tfjs/index.js (gzip) 6.32 KB (0%)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 116b20b86c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +423 to +427
const state = agent.getState();
const moodCategory = state.mood?.category;
const moodOneHot = MOOD_KEYS.map((key) => (moodCategory === key ? 1 : 0));
const modCount = Math.min(state.modifiers.length, COUNT_NORM_CAP) / COUNT_NORM_CAP;
let completed = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use pre-skill modifier count when projecting training features

projectLearningOutcome labels a sample for the action chosen from pre-skill state, but this path reads state.modifiers.length after the skill has already executed. In this codebase several default skills mutate modifiers (for example FeedSkill adds well-fed, CleanSkill removes dirty, and ScoldSkill adds/removes modifiers), so training inputs can encode post-action modifier state while inference (featuresFromNeeds) uses pre-action ctx.modifiers. That mismatch leaks action effects into inputs and can train a policy that does not match runtime feature semantics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cfdc8c4. featuresFromNeeds now snapshots the 8-dim rich suffix into lastRichFeatureSuffix at Stage 4 (pre-skill); projectLearningOutcome reuses that snapshot at Stage 8 instead of recomputing from agent.getState() (which would encode FeedSkill/CleanSkill/ScoldSkill modifier mutations). Fallback path recomputes from current state only when no featuresFromNeeds call has run yet — test harnesses scoring outcomes without driving the reasoner first.

Comment on lines +158 to +160
case 'SkillCompleted':
recentEvents.push({ tick: currentTick, kind: 'completed' });
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge De-duplicate SkillCompleted events in recent-event features

This listener counts every SkillCompleted event, but successful feed emits two such events in this repo (one from FeedSkill.execute, one from CognitionPipeline.invokeSkillAction). As a result, recent:SkillCompleted increments twice for feed and once for most other skills, so the new feature no longer represents true completion frequency and can bias both prediction and learning toward event-emitter quirks.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cfdc8c4. Added a per-tick dedupe Set keyed by ${kind}:${skillId | needId} cleared on AgentTicked. Default skills that emit SkillCompleted from execute() (FeedSkill etc.) plus the kernel's emission now count once per skillId per tick. Same dedupe applied to SkillFailed (skillId) and NeedCritical (needId).

Two Codex findings on PR #96 (commit 116b20b):

P1 — `projectLearningOutcome` was reading `state.modifiers.length`
post-skill, but several default skills mutate modifiers during
`execute()` (FeedSkill adds `well-fed`, CleanSkill removes `dirty`,
ScoldSkill add/remove). Training inputs would encode action effects
while inference (`featuresFromNeeds`) used pre-action state. Fix:
snapshot the 8-dim rich-feature suffix during `featuresFromNeeds` at
Stage 4 (pre-skill) into `lastRichFeatureSuffix`, then reuse it at
Stage 8 outcome projection. Fall back to a from-state recompute only
when no `featuresFromNeeds` call has run yet (test harnesses).

P2 — `recent:SkillCompleted` etc. counted every emission, but the
default `FeedSkill` (and others) emit SkillCompleted from inside
`execute()` AND the kernel emits a second one from
`CognitionPipeline.invokeSkillAction`. Doubled count for affected
skills biased the feature toward emitter quirks. Fix: per-tick
dedupe set keyed by `${kind}:${skillId | needId}`, cleared on every
AgentTicked event.
@Luis85
Copy link
Copy Markdown
Owner Author

Luis85 commented Apr 25, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfdc8c48bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// arrived before any `featuresFromNeeds` call (e.g. the consumer's
// first invocation of `learner.score` happens without a tick
// boundary, like the demo's test harness).
const suffix = lastRichFeatureSuffix ?? recomputeRichFeatureSuffixFromAgent(agent);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Snapshot rich suffix per scored outcome

projectLearningOutcome always reuses lastRichFeatureSuffix when it is non-null, but that suffix is only refreshed in featuresFromNeeds (Stage 4). Player interactions are executed from the reactive handler path in Stage 1 (InteractionRequestedinvokeSkill) before Stage 4 runs for that tick, so those outcomes are paired with current preNeeds and a stale mood/modifier/event suffix from a previous tick. This creates feature/label misalignment for interaction-driven training samples and can push the learner toward incorrect correlations.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bbbe6a8. Took a different approach than the Stage-4 cache: added a small library snapshot (details.preModifierCount) in CognitionPipeline.invokeSkillAction, and dropped lastRichFeatureSuffix entirely. projectLearningOutcome now recomputes the suffix at outcome time — mood from current state (MoodModel reconciles Stage 2.7, before skill execution), modifier count from details.preModifierCount, event counts from current recentEvents minus 1 for the matching kind to compensate for self-emission. Player-click paths get the same pre-skill snapshot the reasoner-driven path gets, since the kernel snapshots before invoking the skill regardless of caller. Minor bump changeset added.

Comment on lines +187 to +189
const key = `completed:${e.skillId ?? '?'}`;
if (processedThisTick.has(key)) return;
processedThisTick.add(key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't collapse distinct same-skill events in one tick

The per-tick dedupe key uses only completed:${skillId} (with the same pattern for failed/critical), so multiple legitimate events of the same kind and id in one tick are treated as duplicates. This happens when users queue repeated interactions quickly or when multiple matching actions run in the same tick, causing recentEvents to undercount true frequency and weakening the new recent-event feature dimensions.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the per-tick ${kind}:${skillId|needId} dedupe — the actual case it addresses (FeedSkill etc. emit SkillCompleted from execute() AND the kernel emits a second one) is real and consistent across default skills, while the legitimate-distinct-twice-per-tick case is hypothetical for the demo's pacing. Reasoned in the bbbe6a8 commit message + JSDoc; the new -1 self-emission compensation in projectLearningOutcome relies on this single-increment-per-tick invariant. If a consumer skill genuinely emits multiple distinct SkillCompleted of the same skillId in one tick, the recent-event count under-represents by 1 per pair.

Two Codex findings on PR #96 commit cfdc8c4:

P1 — `lastRichFeatureSuffix` was only refreshed in `featuresFromNeeds`
(Stage 4). Player-click paths skip the reasoner, so outcomes scored
from button clicks reused a stale suffix from a prior tick.

Library: `CognitionPipeline.invokeSkillAction` now snapshots
`agent.modifiers.list().length` alongside `preNeeds` and includes it
on every `LearningOutcome.details` payload as `preModifierCount`.
Mirrors the existing `preNeeds` mechanism. Minor bump (additive
field).

Demo: dropped the `lastRichFeatureSuffix` cache. `projectLearningOutcome`
now recomputes the suffix at outcome time:
- Mood from `agent.getState().mood?.category` — `MoodModel`
  reconciles at Stage 2.7 (before Stage 7 skill execution), so this
  is still pre-skill state for any single-skill invocation.
- Modifier count from `details.preModifierCount` (kernel-supplied);
  falls back to current state for consumer-emitted outcomes.
- Event counts from current `recentEvents` window with `completed`
  decremented by 1 to compensate for the outcome's own emission
  (dedupe ensures only +1 was added).

P2 — Earlier dedupe key collapsed all same-skill events per tick.

Kept the dedupe but acknowledged in JSDoc that legitimate distinct
same-skill emissions per tick (rare in the demo's pacing) collapse to
one increment. Trade-off reasoned: the actual kernel+skill double-emit
is consistent for default skills; legitimate-twice-per-tick is
hypothetical. The -1 self-emission compensation in
`projectLearningOutcome` relies on this single-increment-per-tick
invariant.
@Luis85
Copy link
Copy Markdown
Owner Author

Luis85 commented Apr 25, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bbbe6a8846

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +177 to +179
const key = `completed:${e.skillId ?? '?'}`;
if (processedThisTick.has(key)) return;
processedThisTick.add(key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve distinct same-skill events in one tick

This dedupe key collapses all SkillCompleted events for the same skillId within a tick, not just the duplicate kernel+skill emission from a single invocation. In this repo, Agent.dispatchReactiveHandlers processes every queued InteractionRequested event in order (src/agent/Agent.ts), so two quick identical interactions before one tick can legitimately invoke the same skill twice; the second completion is dropped here. That undercounts recentEvents features and skews projectLearningOutcome's pre-skill event-count compensation for later invocations in the same tick.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77e0979 by changing the counting source. SkillCompleted/SkillFailed are no longer subscribed via agent.subscribe — they're counted via the learner's own outcome stream (buildLearningLearner now wraps toTrainingPair to project first, then call recordOutcomeForFeatureWindow). Kernel invocations produce exactly one outcome each → one count per invocation. dispatchReactiveHandlers's back-to-back same-skill case now counts both correctly; the kernel+skill SkillCompleted double-emit no longer double-counts (only the kernel's score call reaches the learner). Dropped processedThisTick + the -1 self-emission compensation. NeedCritical stays on the subscription path (single emitter).

Codex P2 round 3 on PR #96 commit bbbe6a8: per-tick
\`\${kind}:\${skillId}\` dedupe collapses legitimate distinct same-skill
events. \`Agent.dispatchReactiveHandlers\` can process two queued
\`InteractionRequested\` events in one tick, legitimately invoking the
same skill twice; the second completion was dropped.

Switched the SkillCompleted/SkillFailed counting source from
\`agent.subscribe('SkillCompleted'|'SkillFailed')\` to the learner's
own outcome stream. \`buildLearningLearner\` now wraps
\`toTrainingPair\` to:
1. Project the outcome (sees pre-this-outcome counts).
2. THEN call \`recordOutcomeForFeatureWindow\` to push the matching
   kind into \`recentEvents\`.

Each kernel invocation produces exactly one outcome → exactly one
push. Kernel + skill double-emits don't double-count (only the
kernel's score call reaches the learner). Reactive-handler
back-to-back invocations DO count separately. Both Codex P2 round 1
(double-emit) and round 3 (legitimate distinct same-skill in one
tick) are addressed.

Dropped the \`processedThisTick\` Set + the -1 self-emission
compensation in \`projectLearningOutcome\` (no longer needed since
projection runs before the increment).

\`NeedCritical\` stays on the subscription path — \`NeedsTicker\` is
the only emitter.
@Luis85
Copy link
Copy Markdown
Owner Author

Luis85 commented Apr 25, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Luis85 Luis85 merged commit a6b2099 into develop Apr 25, 2026
14 checks passed
@Luis85 Luis85 deleted the feat/demo-richer-feature-vector branch April 25, 2026 14:50
Luis85 added a commit that referenced this pull request Apr 26, 2026
## Summary

Doc-audit pass over `docs/plans` + `docs/specs`. Three things land
together:

- **`docs/archive/{plans,specs}/`** — new home for plans whose roadmap
  rows have all shipped (or whose goals were folded into a successor)
  and specs whose design is now reflected in code. Includes a
  `README.md` explaining the policy; `CLAUDE.md` documents the
  convention.
- **`git mv` 23 plans + 3 specs into the archive.** The active live
  set is now the comprehensive polish-and-harden plan plus three
  specs (post-tfjs improvements, mvp-demo, vision), each with a
  refreshed status banner.
- **Refresh the live comprehensive plan** against current `develop`:
  - PR column updated for rows 16/19/20/3/4/22 (now shipped via
    PRs #91 / #98 / #104 / #110 / #113 / #111).
  - New "Post-roadmap follow-ups" section covers PRs #92#125
    (review-bot infra, tracker findings, demo + tfjs hotfixes,
    tooling).
  - Stale prose-baked counts dropped (size budgets now reference
    `package.json#size-limit` only).
  - Coverage-thresholds section gains a pointer to the sticky PR
    comment shipped in PR #124.

## Other doc fixes

- `README.md`: drop the unverifiable "Phase A milestones (M0–M15) are
  all green" claim — the milestones don't exist as documented IDs
  anywhere; replace with a pointer to the live polish plan.
- `vision.md`: refresh cadence note (was pinned to 2026-04-19 + "next
  review at 1.0").
- `2026-04-24-post-tfjs-improvements.md`: mark recommended-order items
  that have shipped (PRs #61, #76, #77, #83, #84, #91, #94, #96,
  #104, #113), link the active roadmap as the heir.
- `mvp-demo.md`: status banner explaining where active polish work is
  now tracked.

## Mechanical

- Update inline cross-refs in `CLAUDE.md`, `eslint.config.js`,
  `src/agent/{Agent,AgentModule}.ts`, `tests/unit/exports.test.ts`,
  and `docs/daily-reviews/2026-04-25.md` to point at the new
  `docs/archive/` paths so links keep resolving.

No code change beyond comment-path updates.

## Test plan

- [x] `npm run verify` green (`format:check` + `lint` + `typecheck` +
  `test` + `build` + `docs`). 523 tests pass; the 2 lint warnings
  are pre-existing (`CognitionPipeline.invokeSkillAction` complexity
  + `scoreFailure` param count) and on the ratchet menu.
- [x] `git ls-files docs/archive/` shows the moved files; renames are
  preserved (`git log --follow` works for any moved file).
- [ ] Codex review: clean, no blockers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Luis Mendez <hallo@luis-mendez.de>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants