feat: correlate pipeline LLM calls to episodes in Logs viewer#1618
Merged
Conversation
Stamp episodeId into LLM calls and api_log writers across the pipeline so `system_model_status` rows are grouped under the triggering episode in the new chain view, instead of showing up as orphan solo cards. Backend (episodeId propagation): - pre-allocate episode id BEFORE intent classifier so its LLM call carries the id (session/manager.ts, session/intent-classifier.ts) - pass prevEpisodeId from orchestrator's three relation-classify call sites into the LLM call (session/types.ts, relation-classifier.ts) - forward triggerEpisodeId to L2 induceDraft and L3 abstractDraft - derive episodeId from policy/skill/world-model ids in skill_* and world_model_* api_log writers (pipeline/memory-core.ts) - add episodeId to retrieval LLM filter (uses ctx.episodeId from retrieve.ts) Logs viewer (LogsView.tsx): - group role=embedding system_model_status into a single "infrastructure heartbeat" card with ok/fallback/error counts - replace meaningless [role] tag in event summaries (e.g. "[summary]") with the concrete operation name (e.g. "[skill.crystallize]") - collapse doubled phase prefixes for readability (l2.l2.induction.v2 -> l2.induction.v2) Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires
episodeIdthrough every LLM call andapi_logwriter in the memory pipeline so thatsystem_model_statusaudit rows can be aggregated under the triggering episode in the Logs viewer's chain view, plus a couple of frontend ergonomics fixes for the same view.Before this change, ~50% of
system_model_statusrows showed up as orphan "solo" cards in the chain view — the LLM call sites for L2 induction, L3 abstraction, skill crystallize/evolve, retrieval LLM filter, relation classifier, and intent classifier didn't have an episode in scope at the moment they ran, so the audit row had no correlation key. After this change those rows are grouped with the rest of their episode's pipeline activity.Backend changes
episodeIdpropagation into LLM callssession/intent-classifier.ts+session/manager.ts: pre-allocate the episode id beforeintentClassifier.classifyruns, so the resultingsession.intent.classifyLLM call carries the id.IntentClassifier.classify(text, options?)gets a new optional second parameter; existing single-arg callers (incl. all unit tests) keep working unchanged.session/relation-classifier.ts+session/types.ts+pipeline/orchestrator.ts: extendRelationInputwithprevEpisodeId?and pass it from all three orchestrator call sites (open episode, recovered open, closed-prev).relation-classifyandrelation-arbitrateLLM calls now stamp the previous episode id (semantically: "should we terminate prev?").memory/l2/induce.ts+memory/l2/l2.ts:InduceInputgetstriggerEpisodeId?.runL2forwardsinput.episodeIdso each L2 induction LLM call is tied to the trace's source episode.memory/l3/abstract.ts+memory/l3/l3.ts:AbstractInputgetsepisodeId?.runL3derives the trigger episode (most recent contributing episode in the cluster) and threads it into the L3 abstraction LLM call.skill/crystallize.ts+skill/skill.ts+skill/types.ts:CrystallizeInput/SkillCrystallizationDraftgetepisodeId?.runCrystallizederives the trigger frompolicy.sourceEpisodeIds(last-then-first) and passes it through to the LLM call.retrieval/llm-filter.ts+retrieval/retrieve.ts:FilterInputgetsepisodeId?.retrievealready extractsctx.episodeId; we now forward it (withphase: "retrieve") to the filter LLM call.api_logwriters stampepisodeIdfor skill/world eventsIn
pipeline/memory-core.ts:episodeFromPolicy/episodeFromSkill/episodeFromWorldModelresolve an episode id from the relevant entity (using the latestsourceEpisodeIds).skill_generate/skill_evolvewriters stamp the resolved episode id intoinput_json.episodeId.world_model_generate/world_model_evolvewriters do the same, with a fallback tol3TriggerEpisodeId(a small piece of state set when L2 firespolicy.induced/policy.updated) for the L3 created/updated/failed paths.policy_generate/policy_evolvewriters also stamp the L2 event's episode id.Frontend changes (
web/src/views/LogsView.tsx)role=embeddingsystem_model_statusevents fire on every embed call across capture/L2/L3/retrieval and aren't tied to any single episode. They're now grouped into a single synthetic "基础设施心跳" chain (infra:embedding) rendered as a compact card with ok/fallback/error counts, last provider/model, and the latest error if any. Expanding the card still gives you the full per-event timeline.[摘要模型]/[技能进化模型]style role labels in the chain row summaries with the concrete operation name fromout.op(e.g.[skill.crystallize],[l3.abstraction.v1],[session.relation.classify]). Doubled phase prefixes from the backend op naming convention are collapsed for readability (l2.l2.induction.v2→l2.induction.v2,retrieval.retrieval.filter.v3→retrieval.filter.v3).roleLabelis preserved for the existing detail-panel subtitle so no copy is broken there.Test plan
ruff check adapters/hermes tests/python— passesruff format --check adapters/hermes tests/python— 5 files already formattedtsc -p tsconfig.json --noEmit— cleantests/unit/session/*— 80/80 passing (intent-classifier 10, session-manager 11, episode-manager 9, relation-classifier 17, heuristics 28, events 5)tests/unit/memory/l2,tests/unit/retrieval,tests/unit/session— 188/189 passing; the single failure is inmemory/l2/gain.test.ts(pre-existing floating-point precision drift, reproduces with this change reverted; tracked separately)npm pack+install.sh+ forkpty hermes restart; viewerbridge.status=connected;Memory provider 'memtensor' registered; new chain view live with embedding heartbeat card and OP-based summariesMade with Cursor