Skip to content

feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts#1516

Merged
hijzy merged 1 commit intoMemTensor:mainfrom
hijzy:feat/memos-local-plugin-turn-grouping
Apr 22, 2026
Merged

feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts#1516
hijzy merged 1 commit intoMemTensor:mainfrom
hijzy:feat/memos-local-plugin-turn-grouping

Conversation

@hijzy
Copy link
Copy Markdown
Collaborator

@hijzy hijzy commented Apr 22, 2026

Summary

This PR continues the v2 Reflect2Evolve plugin work merged in #1515 with three orthogonal improvements that landed together because they share the same trace fixtures and tests:

  1. UI: one user turn = one memory card — frontend collapses sub-step rows by (episodeId, turnId). Algorithm layer (V/α backprop, L2 induction, Tier-2 retrieval, Decision Repair) keeps step-level granularity per V7 §0.1.
  2. Knowledge generation in user's language — every L1/L2/L3/Skill/reflection generation site now detects the dominant language of its evidence and emits a languageSteeringLine so a Chinese user no longer gets half-English memos.
  3. L2 / L3 prompts: hard boundary against driftL2_INDUCTION_PROMPT and L3_ABSTRACTION_PROMPT bumped v1 → v2 with explicit "what NOT to write" guards plus same-fact-two-framings examples to keep procedural ↔ declarative knowledge cleanly separated.

Plus two infrastructure fixes the v2 plugin needed to actually run on better-sqlite3 ≥ v11 (defensive-mode block on sqlite_master) and a documentation alignment doc explaining the 小步/轮/任务 + 经验/环境认知/技能 mental model so future contributors stop conflating UI, storage, and algorithm granularities.

What changed

traces.turn_id + per-turn UI grouping

  • New migration 013-trace-turn-id.sql: adds turn_id INTEGER + idx_traces_episode_turn index.
  • step-extractor.ts stamps every sub-step from the same user message with the user turn's ts as meta.turnId; capture.ts::pickTurnId threads it into traces.turn_id.
  • MemoriesView.tsx introduces MemoryGroup aggregation + <StepList> drawer so a 5-tool turn renders as one card with five collapsible step blocks (each carrying its own V / α / reflection / toolCalls), instead of five sibling cards. Bulk select / delete / share / export operate at card level.
  • DB rows from before this migration get NULL turn_id and fall back to per-row rendering.

Language-aware knowledge generation

  • core/llm/prompts/index.ts: new detectDominantLanguage(samples, {minSignal}) — counts CJK ideographs vs ASCII letters, returns "zh" | "en" | "auto". Allocation-free, runs on every gen call.
  • All five gen sites inject languageSteeringLine:
    • capture/alpha-scorer.ts — reflection-quality reason
    • capture/batch-scorer.ts — per-step batch reflections
    • memory/l2/induce.ts — L2 policy fields
    • memory/l3/abstract.ts — L3 (ℰ, ℐ, C) bullets
    • skill/crystallize.ts — skill body + scope

L2 / L3 boundary prompts (v1 → v2)

  • L2_INDUCTION_PROMPT: new "Boundaries — what NOT to write" section explicitly rejects environment topology / declarative behavioural rules / generic taboos. Includes same-fact-two-framings example (procedural vs declarative for the same underlying truth).
  • L3_ABSTRACTION_PROMPT: bans imperative verbs (do / should / use / install / run) under any of ℰ/ℐ/C. All three example sets rewritten as pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry").
  • Test mock keys updated v1 → v2; historical inducedBy audit strings intentionally left at v1 (they record the prompt version a row was generated under, not a call-time match key).

Retrieval injector heading hierarchy

  • # User's conversation history (from memory system) is H1; ## Memories / ## Skills / ## Environment Knowledge are H2 — restores the visual outline the LLM consumes.

Migration runner: better-sqlite3 ≥ v11 compatibility

  • runMigrations now flips db.raw.unsafeMode(true) at the outer boundary if any pending migration uses PRAGMA writable_schema (resets in finally). Migration 012 (status unification) needs this to swap CHECK constraints in-place; defensive mode otherwise blocked it at runtime.
  • Migration 012 SQL uses single-quote literals with doubled inner quotes (was double-quoted, which strict mode treats as identifiers).

Documentation

  • New docs/GRANULARITY-AND-MEMORY-LAYERS.md (~365 lines, zh-CN) — the foundational mental-model doc that should be read before any other algorithm doc:
    • 小步 / 轮 / 任务 三个交互粒度 + 与代码层的对应
    • 打分粒度(每步 α/V,每任务 R_human,"轮"无独立分)
    • 检索粒度(技能/单步/子任务序列/环境认知,没有"按轮"召回 + 三层判别)
    • "结构性不确定" vs "操作性疑问" 判别表
    • 经验 / 环境认知 / 技能 五者关系
    • §6 "经验 vs 环境认知 边界裁剪":7 条不该合并的理由 + 三种折中方案对比 + 7 维度判别速查 + 同事实多框架对照表 + 反例
  • docs/Reflect2Skill_算法设计核心.md 头部加阅读顺序提示。
  • docs/README.md 索引同步更新。

Algorithm alignment

Per V7 §0.1, the L1 trace is the minimum learning unit and stays step-level — one tool call → one trace, one final reply → one trace. The "one round = one memory" view is purely a frontend display concern using turn_id as a stable group key. Reflection-weighted backprop, cross-task L2 association, error-signature retrieval, and Decision Repair all continue to operate per-step. Documented end-to-end in the new GRANULARITY doc §6.

Test plan

  • npx vitest run tests/unit/capture/step-extractor.test.ts — turnId stamped on every sub-step, multi-tool turn shares one turnId (11/11 pass)
  • npx vitest run tests/unit/memory/l2/ tests/unit/memory/l3/ tests/unit/llm/prompts.test.ts — prompt v2 mock keys + L2/L3 induction (74/74 pass)
  • npx vitest run tests/unit/storage/ — migration 013 applies cleanly (106/106 pass)
  • npx vitest run tests/unit/ — full unit sweep: 802/806 pass; 4 failures are pre-existing on main (mock LLM behavior in reward integration + an outdated capture.lite.done event-list assertion), unchanged by this PR
  • Local install via bash install.sh --version ./memtensor-memos-local-plugin-2.0.0-beta.1.tgz: gateway + viewer come up clean, traces.turn_id column present, migration 013 logged as applied
  • Manual end-to-end: ran a 3-tool query in OpenClaw, verified the memory page shows ONE card with 工具 · 4 步 chip, drawer expands into 4 collapsible step sections with per-step V/α/thinking/tool I-O

Notes

  • No backward compat for the schema change is required — fresh installs run all 13 migrations on first open. Existing local DBs auto-pick up 013 the next time the gateway opens them.
  • Only apps/memos-local-plugin/ is touched. No changes to other packages.

…ledge + L2/L3 boundary prompts

UI: one user turn = one memory card
- New `traces.turn_id INTEGER` column (migration 013) stamped by
  `step-extractor` with the user turn's ts; every sub-step of the same
  user message shares the same turnId.
- `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses
  rows by (episodeId, turnId): one card per turn, role pill chosen by
  group-level rule (any tool → "tool"), aggregate V/α displayed as the
  member-row mean.
- Drawer rewritten as `<StepList>`: every member step renders as a
  collapsible <details> block with its own ts / V / α / agentThinking /
  toolCalls / reflection. First step expanded, rest collapsed so a
  10-tool turn doesn't drown the user.
- Bulk actions (select / delete / share / export) operate on whole
  cards: card checkbox toggles the full set of member ids; delete /
  share / export bulk over `g.ids` so a card never half-disappears.
- Algorithm layer untouched — every L1 trace stays step-level so V/α
  reflection-weighted backprop, L2 incremental association, Tier-2
  error-signature retrieval, and Decision Repair keep their per-step
  granularity (V7 §0.1).

Per-tool reasoning capture (carryover, see PR MemTensor#1515)
- ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the
  drawer's per-step section can show the per-tool intermediate
  thinking and any LLM-assigned per-tool score without a schema change.
- StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal
  threaded through capture.ts → traces.turn_id; `pickTurnId` falls
  back to the trace's own ts so old fixtures still produce singleton
  groups instead of crashing.

Knowledge generation in user's language
- `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples,
  {minSignal})` — counts CJK ideographs + ASCII letters and returns
  "zh" / "en" / "auto" (allocation-free, runs on every gen call).
- All five knowledge-generation sites now emit a `languageSteeringLine`
  system message keyed off their evidence:
    * core/capture/alpha-scorer.ts          ← reflection-quality reason
    * core/capture/batch-scorer.ts          ← per-step batch reflections
    * core/memory/l2/induce.ts              ← L2 policy fields
    * core/memory/l3/abstract.ts            ← L3 (ℰ, ℐ, C) bullets
    * core/skill/crystallize.ts             ← skill body + scope
- Effect: a Chinese-speaking user no longer gets a half-English skill
  card. An English user no longer gets a 中文-mixed reflection.

L2 / L3 prompts: hard boundary against drift
- `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard
  rejects environment topology, declarative behavioural rules, and
  generic taboos. New same-fact-two-framings example shows how to
  re-fold an env fact into a state-level trigger or step-level caveat.
- `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/
  install/run) under any of ℰ/ℐ/C; reworked all three example sets to
  pure declarative ("loading a glibc-linked binary wheel inside Alpine
  raises a dynamic-link error" instead of "if pip fails, install dev
  libs and retry"). Same-fact contrast example included.
- Test mock keys updated v1 → v2 in induce.test.ts /
  l2.integration.test.ts / openclaw-full-chain.test.ts /
  v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings
  intentionally left at v1 — they're metadata recording the prompt
  version a row was generated under, not call-time keys.

Retrieval injector: heading hierarchy
- `# User's conversation history (from memory system)` is now H1, with
  `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the
  injected block has a clean outline in the LLM's context (previously
  the inner sections used H1 too, breaking the visual hierarchy).

Migration runner: SQLite defensive mode
- better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks
  writes to `sqlite_master` even with `PRAGMA writable_schema=ON`.
  Migration 012 (status unification) needs that pragma to swap CHECK
  constraints in-place. `runMigrations` now flips `db.raw.unsafeMode`
  on at the outer boundary if any pending migration uses
  `writable_schema`, then off again in `finally`. Migrations are
  shipped with the plugin (never user input) so this is safe.
- Migration 012 SQL itself rewritten to use single-quote string
  literals with doubled inner quotes (instead of double quotes that
  better-sqlite3 strict mode treats as identifiers).

Documentation
- New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment
  doc explaining: 小步/轮/任务 三个粒度的关系、打分粒度(每步 α/V,
  每任务 R_human,"轮"无独立分)、检索粒度(技能/单步/子任务序列/
  环境认知,没有"按轮"召回)、生成链路(小步→经验→环境认知→技能)、
  以及 §6 "经验 vs 环境认知 边界裁剪" 章节回答"该不该合并"问题:7 条
  反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。
- `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示,引导新人
  先看上面那篇粒度对齐文档。
- `docs/README.md` 索引同步更新,标粗 GRANULARITY-AND-MEMORY-LAYERS。

Tests
- `tests/unit/capture/step-extractor.test.ts`: turnId stability
  assertions across sub-steps; multi-tool turn shares one turnId.
- All other test fixtures' LLM mock keys synchronized with new prompt
  versions; non-mock `inducedBy` audit fields kept at v1 by design.
@hijzy hijzy merged commit cddc252 into MemTensor:main Apr 22, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant