Skip to content

Reward pipeline skips abandoned episodes — 98% of closed episodes never scored #1782

@chiefmojo

Description

@chiefmojo

Problem

The reward scoring pipeline processes only ~7 episodes on bootstrap and then goes permanently idle, leaving the vast majority of traces unscored. On a 43 MB database with 3,600 traces, only 45 (1.3%) had r_human scores before fixing.

Root Cause

Three interacting bugs in memory-core.ts:

1. episodeRewardIsDirty() excludes abandoned episodes (line ~1274)

The dirty-check condition only matches episodes with closeReason === "finalized" or recoveryReason === "missed_session_end". But 219 of 224 closed episodes had closeReason: "abandoned", so they silently failed the dirty check on every bootstrap scan.

2. No polling fallback after bootstrap

autoRescoreDirtyClosedEpisodes() is called once during init() and never again. After bootstrap completes, the daemon bridge sits permanently idle with no mechanism to retry missed episodes.

3. Two-bridge isolation (contributing factor)

The viewer daemon bridge (--daemon) and JSON-RPC bridge (--no-viewer) are separate processes with separate in-memory event buses. New captures flow through the JSON-RPC bridge's pipeline; the daemon's reward subscriber never sees those events. The daemon only activates its own pipeline during init().

Why exactly 7 episodes got scored?

Those 7 were open episodes with traces at init-time, processed by recoverOpenEpisodesAsSessionEnd() as "stale." All 219 closed abandoned episodes were silently excluded.

Fix (two changes, both in memory-core.ts)

1. Add closeReason === "abandoned" to episodeRewardIsDirty()

- (meta.closeReason === "finalized" || meta.recoveryReason === "missed_session_end")
+ (meta.closeReason === "finalized" ||
+   meta.closeReason === "abandoned" ||
+   meta.recoveryReason === "missed_session_end")

2. Add periodic rescore timer in init()

const rescoreInterval = setInterval(() => {
  void autoRescoreDirtyClosedEpisodes().catch((err) => {
    log.debug("periodic_rescore.error", {
      err: err instanceof Error ? err.message : String(err),
    });
  });
}, 10 * 60 * 1000);
(rescoreInterval as unknown as { unref?: () => void }).unref?.();

This covers episodes that miss the startup scan (closed after init, or retry of failed reward runs). The 10-minute interval is safe because autoRescoreDirtyClosedEpisodes has its own 30-second dedup guard.

Results After Fix

Before (3,600 traces):

  • r_human scored: 45 (1.3%)
  • Large episodes completely skipped: ep_0f9jsh492n40 (1,367 traces), ep_x95apvvw7gdx (444 traces), ep_kt2ds1afhssq (222 traces)

Within 3 minutes of restart:

  • r_human: 45 -> 257 (212 freshly scored)
  • ep_kt2ds1afhssq (222 traces): scored with rHuman=1.0 in 1.1 seconds
  • Estimated full backlog: 60-90 minutes

Environment

  • Version: v2.0.5
  • DB: 43 MB, WAL mode
  • LLM: qwen/qwen3-235b-a22b-2507 via OpenRouter
  • Config: lightweightMemory.enabled: false

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions