Skip to content

bug: embedding maintenance stats misleading after bulk import — skipped traces still counted as missing vectors #1746

@576272658

Description

@576272658

Bug Description

After importing ~30k historical messages, the Embedding Maintenance stats show an extremely misleading "missing" vector count (~620k), even though the import process correctly skipped most content and only added ~7,000 effective memories.

Observed numbers:

Metric Count
Traces in DB 312,185
Effective memories (viewer) ~7,000
Traces with vectors ~2,100
"Missing" vectors ~620,000

The ~620k number comes from (312,185 - 2,100) × 2 (vec_summary + vec_action per trace). But ~227,000 of those traces are short content (tool calls, status messages, "let me check..." type text with user_text < 50 chars AND agent_text < 100 chars) that:

  1. The import process correctly skipped (reported as "skipped" in import stats)
  2. Were never queued into the embedding_retry_queue
  3. Will never be processed by the embedding pipeline

Root Cause

computeEmbeddingMaintenanceStats() in memory-core.js counts all traces with vec_summary IS NULL as "missing", regardless of whether those traces actually need embedding. The import process stores skipped/short traces in the traces table but doesn't embed them — and correctly so. But the stats don't distinguish between "intentionally skipped" and "needs embedding."

How to Reproduce

  1. Import a large historical dataset (e.g. 30k+ messages from a chat export)
  2. Observe import stats: ~7,000 added, ~25,000+ skipped
  3. Check Settings → Embedding Maintenance
  4. See "missing" count of ~620,000 (far exceeding actual memory count)

Suggested Fix

Any of the following would help:

  1. Don't count traces that were intentionally skipped during import — either mark them with a flag (e.g. share_scope = 'skipped') or exclude traces below a content length threshold from the stats.
  2. Show a breakdown in the stats: "X traces skipped/short, Y traces pending embedding" so users understand the gap.
  3. The "Repair missing" button should also skip short/empty traces instead of attempting to embed all 300k+ rows.

Environment

  • memos-local-plugin version: 2.0.4
  • OpenClaw version: 2026.5.12
  • OS: macOS (Apple M4)
  • Embedding model: qwen/qwen3-embedding-8b (via OpenRouter)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-doneAI task completed successfullybugSomething isn't working | 功能异常pluginPlugin/adapter/bridge layer (apps/ directory) | 插件/适配层

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions