Skip to content

Follow-up: integration tests for evolution surfaces (heartbeat/memory/skills) once wired #27

@theognis1002

Description

@theognis1002

Context

Tracking issue for deferred testing work from #26.

The evolution rework adds archetype-driven personalization to three unprompted surfaces:

  • Heartbeat seed — dominant archetype (1.3×-clear) drives proactive check-in topic selection; sub-threshold falls back to neutral "general productivity" content
  • Memory consolidation — nightly/weekly tasks weight archetype-relevant content when dominant is 1.3×-clear
  • Skill loading priority — archetype-relevant skills get a tiebreaker priority boost under token budget pressure, but never override task-relevant skills

In #26 these are implemented as wiring changes. The pure-logic deep modules (scorer, xp_curve, plan_emission, signature) are unit-tested in that PRD's scope.

Integration tests for the three surfaces are intentionally skipped in #26 because the wires are still physically being connected.

What this issue tracks

Once #26 ships and the three surfaces are wired up end-to-end, write integration tests that exercise the full path:

  1. Heartbeat seed test — given an Ops-dominant user (with archetype clearly above 1.3× threshold), assert the next heartbeat picks an Ops-flavored topic. Given a hybrid user (no archetype clears 1.3×), assert the heartbeat falls back to neutral content. End-to-end through the heartbeat scheduler.
  2. Memory consolidation test — given a session with mixed Ops + Marketing observations and an Ops-dominant user, assert the consolidated long-term memory entries preferentially preserve the Ops content. Given a hybrid user, assert consolidation falls back to recency-only weighting.
  3. Skill loading priority test — given an Ops-dominant user under token-budget pressure, assert Docker/K8s skills load before lower-priority skills. Given a task-relevant signal pointing to a different domain (e.g., user asks about marketing), assert task-relevant skills still load regardless of archetype.

Test quality bar

Per CLAUDE.md:

  • Real code paths, not mock orchestration
  • Each test must have a real failure mode (not tautological)
  • Cover the dominance-clear path AND the sub-threshold fallback path for each surface

Dependency

Blocked by #26 landing. Reopen / promote to active work once that PR merges and the heartbeat / memory consolidation / skill loader changes are in place.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions