Follow-up: integration tests for evolution surfaces (heartbeat/memory/skills) once wired

## Context

Tracking issue for deferred testing work from #26.

The evolution rework adds archetype-driven personalization to three unprompted surfaces:

- **Heartbeat seed** — dominant archetype (1.3×-clear) drives proactive check-in topic selection; sub-threshold falls back to neutral "general productivity" content
- **Memory consolidation** — nightly/weekly tasks weight archetype-relevant content when dominant is 1.3×-clear
- **Skill loading priority** — archetype-relevant skills get a tiebreaker priority boost under token budget pressure, but never override task-relevant skills

In #26 these are implemented as wiring changes. The pure-logic deep modules (`scorer`, `xp_curve`, `plan_emission`, `signature`) are unit-tested in that PRD's scope.

**Integration tests for the three surfaces are intentionally skipped** in #26 because the wires are still physically being connected.

## What this issue tracks

Once #26 ships and the three surfaces are wired up end-to-end, write integration tests that exercise the full path:

1. **Heartbeat seed test** — given an Ops-dominant user (with archetype clearly above 1.3× threshold), assert the next heartbeat picks an Ops-flavored topic. Given a hybrid user (no archetype clears 1.3×), assert the heartbeat falls back to neutral content. End-to-end through the heartbeat scheduler.
2. **Memory consolidation test** — given a session with mixed Ops + Marketing observations and an Ops-dominant user, assert the consolidated long-term memory entries preferentially preserve the Ops content. Given a hybrid user, assert consolidation falls back to recency-only weighting.
3. **Skill loading priority test** — given an Ops-dominant user under token-budget pressure, assert Docker/K8s skills load before lower-priority skills. Given a *task-relevant* signal pointing to a different domain (e.g., user asks about marketing), assert task-relevant skills still load regardless of archetype.

## Test quality bar

Per `CLAUDE.md`:
- Real code paths, not mock orchestration
- Each test must have a real failure mode (not tautological)
- Cover the dominance-clear path AND the sub-threshold fallback path for each surface

## Dependency

Blocked by #26 landing. Reopen / promote to active work once that PR merges and the heartbeat / memory consolidation / skill loader changes are in place.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up: integration tests for evolution surfaces (heartbeat/memory/skills) once wired #27

Context

What this issue tracks

Test quality bar

Dependency

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Follow-up: integration tests for evolution surfaces (heartbeat/memory/skills) once wired #27

Description

Context

What this issue tracks

Test quality bar

Dependency

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions