Skip to content

feat: correlate pipeline LLM calls to episodes in Logs viewer#1618

Merged
MatthewZhuang merged 1 commit into
mem-agent-0424from
mem-agent-0424-zgm
May 5, 2026
Merged

feat: correlate pipeline LLM calls to episodes in Logs viewer#1618
MatthewZhuang merged 1 commit into
mem-agent-0424from
mem-agent-0424-zgm

Conversation

@MatthewZhuang
Copy link
Copy Markdown
Collaborator

Summary

Wires episodeId through every LLM call and api_log writer in the memory pipeline so that system_model_status audit rows can be aggregated under the triggering episode in the Logs viewer's chain view, plus a couple of frontend ergonomics fixes for the same view.

Before this change, ~50% of system_model_status rows showed up as orphan "solo" cards in the chain view — the LLM call sites for L2 induction, L3 abstraction, skill crystallize/evolve, retrieval LLM filter, relation classifier, and intent classifier didn't have an episode in scope at the moment they ran, so the audit row had no correlation key. After this change those rows are grouped with the rest of their episode's pipeline activity.

Backend changes

episodeId propagation into LLM calls

  • session/intent-classifier.ts + session/manager.ts: pre-allocate the episode id before intentClassifier.classify runs, so the resulting session.intent.classify LLM call carries the id. IntentClassifier.classify(text, options?) gets a new optional second parameter; existing single-arg callers (incl. all unit tests) keep working unchanged.
  • session/relation-classifier.ts + session/types.ts + pipeline/orchestrator.ts: extend RelationInput with prevEpisodeId? and pass it from all three orchestrator call sites (open episode, recovered open, closed-prev). relation-classify and relation-arbitrate LLM calls now stamp the previous episode id (semantically: "should we terminate prev?").
  • memory/l2/induce.ts + memory/l2/l2.ts: InduceInput gets triggerEpisodeId?. runL2 forwards input.episodeId so each L2 induction LLM call is tied to the trace's source episode.
  • memory/l3/abstract.ts + memory/l3/l3.ts: AbstractInput gets episodeId?. runL3 derives the trigger episode (most recent contributing episode in the cluster) and threads it into the L3 abstraction LLM call.
  • skill/crystallize.ts + skill/skill.ts + skill/types.ts: CrystallizeInput/SkillCrystallizationDraft get episodeId?. runCrystallize derives the trigger from policy.sourceEpisodeIds (last-then-first) and passes it through to the LLM call.
  • retrieval/llm-filter.ts + retrieval/retrieve.ts: FilterInput gets episodeId?. retrieve already extracts ctx.episodeId; we now forward it (with phase: "retrieve") to the filter LLM call.

api_log writers stamp episodeId for skill/world events

In pipeline/memory-core.ts:

  • New helpers episodeFromPolicy / episodeFromSkill / episodeFromWorldModel resolve an episode id from the relevant entity (using the latest sourceEpisodeIds).
  • skill_generate / skill_evolve writers stamp the resolved episode id into input_json.episodeId.
  • world_model_generate / world_model_evolve writers do the same, with a fallback to l3TriggerEpisodeId (a small piece of state set when L2 fires policy.induced / policy.updated) for the L3 created/updated/failed paths.
  • policy_generate / policy_evolve writers also stamp the L2 event's episode id.

Frontend changes (web/src/views/LogsView.tsx)

  • Embedding heartbeat aggregation: role=embedding system_model_status events fire on every embed call across capture/L2/L3/retrieval and aren't tied to any single episode. They're now grouped into a single synthetic "基础设施心跳" chain (infra:embedding) rendered as a compact card with ok/fallback/error counts, last provider/model, and the latest error if any. Expanding the card still gives you the full per-event timeline.
  • OP-based summary headers: replaced the [摘要模型] / [技能进化模型] style role labels in the chain row summaries with the concrete operation name from out.op (e.g. [skill.crystallize], [l3.abstraction.v1], [session.relation.classify]). Doubled phase prefixes from the backend op naming convention are collapsed for readability (l2.l2.induction.v2l2.induction.v2, retrieval.retrieval.filter.v3retrieval.filter.v3). roleLabel is preserved for the existing detail-panel subtitle so no copy is broken there.

Test plan

  • ruff check adapters/hermes tests/python — passes
  • ruff format --check adapters/hermes tests/python — 5 files already formatted
  • tsc -p tsconfig.json --noEmit — clean
  • tests/unit/session/* — 80/80 passing (intent-classifier 10, session-manager 11, episode-manager 9, relation-classifier 17, heuristics 28, events 5)
  • tests/unit/memory/l2, tests/unit/retrieval, tests/unit/session — 188/189 passing; the single failure is in memory/l2/gain.test.ts (pre-existing floating-point precision drift, reproduces with this change reverted; tracked separately)
  • Deployed via npm pack + install.sh + forkpty hermes restart; viewer bridge.status=connected; Memory provider 'memtensor' registered; new chain view live with embedding heartbeat card and OP-based summaries

Made with Cursor

Stamp episodeId into LLM calls and api_log writers across the pipeline
so `system_model_status` rows are grouped under the triggering episode
in the new chain view, instead of showing up as orphan solo cards.

Backend (episodeId propagation):
- pre-allocate episode id BEFORE intent classifier so its LLM call
  carries the id (session/manager.ts, session/intent-classifier.ts)
- pass prevEpisodeId from orchestrator's three relation-classify call
  sites into the LLM call (session/types.ts, relation-classifier.ts)
- forward triggerEpisodeId to L2 induceDraft and L3 abstractDraft
- derive episodeId from policy/skill/world-model ids in skill_* and
  world_model_* api_log writers (pipeline/memory-core.ts)
- add episodeId to retrieval LLM filter (uses ctx.episodeId from
  retrieve.ts)

Logs viewer (LogsView.tsx):
- group role=embedding system_model_status into a single
  "infrastructure heartbeat" card with ok/fallback/error counts
- replace meaningless [role] tag in event summaries (e.g. "[summary]")
  with the concrete operation name (e.g. "[skill.crystallize]")
- collapse doubled phase prefixes for readability
  (l2.l2.induction.v2 -> l2.induction.v2)

Co-authored-by: Cursor <cursoragent@cursor.com>
@MatthewZhuang MatthewZhuang merged commit 12f9cff into mem-agent-0424 May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant