feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts by hijzy · Pull Request #1516 · MemTensor/MemOS

hijzy · 2026-04-22T13:39:09Z

Summary

This PR continues the v2 Reflect2Evolve plugin work merged in #1515 with three orthogonal improvements that landed together because they share the same trace fixtures and tests:

UI: one user turn = one memory card — frontend collapses sub-step rows by (episodeId, turnId). Algorithm layer (V/α backprop, L2 induction, Tier-2 retrieval, Decision Repair) keeps step-level granularity per V7 §0.1.
Knowledge generation in user's language — every L1/L2/L3/Skill/reflection generation site now detects the dominant language of its evidence and emits a languageSteeringLine so a Chinese user no longer gets half-English memos.
L2 / L3 prompts: hard boundary against drift — L2_INDUCTION_PROMPT and L3_ABSTRACTION_PROMPT bumped v1 → v2 with explicit "what NOT to write" guards plus same-fact-two-framings examples to keep procedural ↔ declarative knowledge cleanly separated.

Plus two infrastructure fixes the v2 plugin needed to actually run on better-sqlite3 ≥ v11 (defensive-mode block on sqlite_master) and a documentation alignment doc explaining the 小步/轮/任务 + 经验/环境认知/技能 mental model so future contributors stop conflating UI, storage, and algorithm granularities.

What changed

`traces.turn_id` + per-turn UI grouping

New migration 013-trace-turn-id.sql: adds turn_id INTEGER + idx_traces_episode_turn index.
step-extractor.ts stamps every sub-step from the same user message with the user turn's ts as meta.turnId; capture.ts::pickTurnId threads it into traces.turn_id.
MemoriesView.tsx introduces MemoryGroup aggregation + <StepList> drawer so a 5-tool turn renders as one card with five collapsible step blocks (each carrying its own V / α / reflection / toolCalls), instead of five sibling cards. Bulk select / delete / share / export operate at card level.
DB rows from before this migration get NULL turn_id and fall back to per-row rendering.

Language-aware knowledge generation

core/llm/prompts/index.ts: new detectDominantLanguage(samples, {minSignal}) — counts CJK ideographs vs ASCII letters, returns "zh" | "en" | "auto". Allocation-free, runs on every gen call.
All five gen sites inject languageSteeringLine:
- capture/alpha-scorer.ts — reflection-quality reason
- capture/batch-scorer.ts — per-step batch reflections
- memory/l2/induce.ts — L2 policy fields
- memory/l3/abstract.ts — L3 (ℰ, ℐ, C) bullets
- skill/crystallize.ts — skill body + scope

L2 / L3 boundary prompts (v1 → v2)

L2_INDUCTION_PROMPT: new "Boundaries — what NOT to write" section explicitly rejects environment topology / declarative behavioural rules / generic taboos. Includes same-fact-two-framings example (procedural vs declarative for the same underlying truth).
L3_ABSTRACTION_PROMPT: bans imperative verbs (do / should / use / install / run) under any of ℰ/ℐ/C. All three example sets rewritten as pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry").
Test mock keys updated v1 → v2; historical inducedBy audit strings intentionally left at v1 (they record the prompt version a row was generated under, not a call-time match key).

Retrieval injector heading hierarchy

# User's conversation history (from memory system) is H1; ## Memories / ## Skills / ## Environment Knowledge are H2 — restores the visual outline the LLM consumes.

Migration runner: better-sqlite3 ≥ v11 compatibility

runMigrations now flips db.raw.unsafeMode(true) at the outer boundary if any pending migration uses PRAGMA writable_schema (resets in finally). Migration 012 (status unification) needs this to swap CHECK constraints in-place; defensive mode otherwise blocked it at runtime.
Migration 012 SQL uses single-quote literals with doubled inner quotes (was double-quoted, which strict mode treats as identifiers).

Documentation

New docs/GRANULARITY-AND-MEMORY-LAYERS.md (~365 lines, zh-CN) — the foundational mental-model doc that should be read before any other algorithm doc:
- 小步 / 轮 / 任务三个交互粒度 + 与代码层的对应
- 打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）
- 检索粒度（技能/单步/子任务序列/环境认知，没有"按轮"召回 + 三层判别）
- "结构性不确定" vs "操作性疑问" 判别表
- 经验 / 环境认知 / 技能五者关系
- §6 "经验 vs 环境认知边界裁剪"：7 条不该合并的理由 + 三种折中方案对比 + 7 维度判别速查 + 同事实多框架对照表 + 反例
docs/Reflect2Skill_算法设计核心.md 头部加阅读顺序提示。
docs/README.md 索引同步更新。

Algorithm alignment

Per V7 §0.1, the L1 trace is the minimum learning unit and stays step-level — one tool call → one trace, one final reply → one trace. The "one round = one memory" view is purely a frontend display concern using turn_id as a stable group key. Reflection-weighted backprop, cross-task L2 association, error-signature retrieval, and Decision Repair all continue to operate per-step. Documented end-to-end in the new GRANULARITY doc §6.

Test plan

npx vitest run tests/unit/capture/step-extractor.test.ts — turnId stamped on every sub-step, multi-tool turn shares one turnId (11/11 pass)
npx vitest run tests/unit/memory/l2/ tests/unit/memory/l3/ tests/unit/llm/prompts.test.ts — prompt v2 mock keys + L2/L3 induction (74/74 pass)
npx vitest run tests/unit/storage/ — migration 013 applies cleanly (106/106 pass)
npx vitest run tests/unit/ — full unit sweep: 802/806 pass; 4 failures are pre-existing on main (mock LLM behavior in reward integration + an outdated capture.lite.done event-list assertion), unchanged by this PR
Local install via bash install.sh --version ./memtensor-memos-local-plugin-2.0.0-beta.1.tgz: gateway + viewer come up clean, traces.turn_id column present, migration 013 logged as applied
Manual end-to-end: ran a 3-tool query in OpenClaw, verified the memory page shows ONE card with 工具 · 4 步 chip, drawer expands into 4 collapsible step sections with per-step V/α/thinking/tool I-O

Notes

No backward compat for the schema change is required — fresh installs run all 13 migrations on first open. Existing local DBs auto-pick up 013 the next time the gateway opens them.
Only apps/memos-local-plugin/ is touched. No changes to other packages.

…ledge + L2/L3 boundary prompts UI: one user turn = one memory card - New `traces.turn_id INTEGER` column (migration 013) stamped by `step-extractor` with the user turn's ts; every sub-step of the same user message shares the same turnId. - `MemoryGroup` aggregation in `web/src/views/MemoriesView.tsx` collapses rows by (episodeId, turnId): one card per turn, role pill chosen by group-level rule (any tool → "tool"), aggregate V/α displayed as the member-row mean. - Drawer rewritten as `<StepList>`: every member step renders as a collapsible <details> block with its own ts / V / α / agentThinking / toolCalls / reflection. First step expanded, rest collapsed so a 10-tool turn doesn't drown the user. - Bulk actions (select / delete / share / export) operate on whole cards: card checkbox toggles the full set of member ids; delete / share / export bulk over `g.ids` so a card never half-disappears. - Algorithm layer untouched — every L1 trace stays step-level so V/α reflection-weighted backprop, L2 incremental association, Tier-2 error-signature retrieval, and Decision Repair keep their per-step granularity (V7 §0.1). Per-tool reasoning capture (carryover, see PR MemTensor#1515) - ToolCallDTO carries `value` / `reflection` / `thinkingBefore` so the drawer's per-step section can show the per-tool intermediate thinking and any LLM-assigned per-tool score without a schema change. - StepCandidate.meta.turnId / subStep / subStepIdx / subStepTotal threaded through capture.ts → traces.turn_id; `pickTurnId` falls back to the trace's own ts so old fixtures still produce singleton groups instead of crashing. Knowledge generation in user's language - `core/llm/prompts/index.ts` adds `detectDominantLanguage(samples, {minSignal})` — counts CJK ideographs + ASCII letters and returns "zh" / "en" / "auto" (allocation-free, runs on every gen call). - All five knowledge-generation sites now emit a `languageSteeringLine` system message keyed off their evidence: * core/capture/alpha-scorer.ts ← reflection-quality reason * core/capture/batch-scorer.ts ← per-step batch reflections * core/memory/l2/induce.ts ← L2 policy fields * core/memory/l3/abstract.ts ← L3 (ℰ, ℐ, C) bullets * core/skill/crystallize.ts ← skill body + scope - Effect: a Chinese-speaking user no longer gets a half-English skill card. An English user no longer gets a 中文-mixed reflection. L2 / L3 prompts: hard boundary against drift - `L2_INDUCTION_PROMPT` v1 → v2: explicit "what NOT to write" guard rejects environment topology, declarative behavioural rules, and generic taboos. New same-fact-two-framings example shows how to re-fold an env fact into a state-level trigger or step-level caveat. - `L3_ABSTRACTION_PROMPT` v1 → v2: bans imperative verbs (do/should/use/ install/run) under any of ℰ/ℐ/C; reworked all three example sets to pure declarative ("loading a glibc-linked binary wheel inside Alpine raises a dynamic-link error" instead of "if pip fails, install dev libs and retry"). Same-fact contrast example included. - Test mock keys updated v1 → v2 in induce.test.ts / l2.integration.test.ts / openclaw-full-chain.test.ts / v7-full-chain.e2e.test.ts. Historical `inducedBy` audit strings intentionally left at v1 — they're metadata recording the prompt version a row was generated under, not call-time keys. Retrieval injector: heading hierarchy - `# User's conversation history (from memory system)` is now H1, with `## Memories` / `## Skills` / `## Environment Knowledge` as H2 so the injected block has a clean outline in the LLM's context (previously the inner sections used H1 too, breaking the visual hierarchy). Migration runner: SQLite defensive mode - better-sqlite3 ≥ v11 enables `SQLITE_DBCONFIG_DEFENSIVE` which blocks writes to `sqlite_master` even with `PRAGMA writable_schema=ON`. Migration 012 (status unification) needs that pragma to swap CHECK constraints in-place. `runMigrations` now flips `db.raw.unsafeMode` on at the outer boundary if any pending migration uses `writable_schema`, then off again in `finally`. Migrations are shipped with the plugin (never user input) so this is safe. - Migration 012 SQL itself rewritten to use single-quote string literals with doubled inner quotes (instead of double quotes that better-sqlite3 strict mode treats as identifiers). Documentation - New `docs/GRANULARITY-AND-MEMORY-LAYERS.md` — mental-model alignment doc explaining: 小步/轮/任务三个粒度的关系、打分粒度（每步 α/V，每任务 R_human，"轮"无独立分）、检索粒度（技能/单步/子任务序列/ 环境认知，没有"按轮"召回）、生成链路（小步→经验→环境认知→技能）、以及 §6 "经验 vs 环境认知边界裁剪" 章节回答"该不该合并"问题：7 条反对合并的理由 + 三种折中方案对比 + 同事实多框架对照判别表。 - `docs/Reflect2Skill_算法设计核心.md` 头部加阅读顺序提示，引导新人先看上面那篇粒度对齐文档。 - `docs/README.md` 索引同步更新，标粗 GRANULARITY-AND-MEMORY-LAYERS。 Tests - `tests/unit/capture/step-extractor.test.ts`: turnId stability assertions across sub-steps; multi-tool turn shares one turnId. - All other test fixtures' LLM mock keys synchronized with new prompt versions; non-mock `inducedBy` audit fields kept at v1 by design.

hijzy merged commit cddc252 into MemTensor:main Apr 22, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts#1516

feat(memos-local-plugin): one-round-one-card UI + language-aware knowledge + L2/L3 boundary prompts#1516
hijzy merged 1 commit intoMemTensor:mainfrom
hijzy:feat/memos-local-plugin-turn-grouping

hijzy commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hijzy commented Apr 22, 2026

Summary

What changed

traces.turn_id + per-turn UI grouping

Language-aware knowledge generation

L2 / L3 boundary prompts (v1 → v2)

Retrieval injector heading hierarchy

Migration runner: better-sqlite3 ≥ v11 compatibility

Documentation

Algorithm alignment

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`traces.turn_id` + per-turn UI grouping