Skip to content

fix(memory): prevent infinite loop in chunk_text when overlap rewinds start#3027

Merged
bug-ops merged 1 commit intomainfrom
fix/chunk-text-infinite-loop
Apr 15, 2026
Merged

fix(memory): prevent infinite loop in chunk_text when overlap rewinds start#3027
bug-ops merged 1 commit intomainfrom
fix/chunk-text-infinite-loop

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 15, 2026

Summary

  • chunk_text in zeph-memory had an infinite loop triggered by messages where rfind("\n\n") found a match very early in the 1600-byte sliding window
  • When the match was at offset < 320 bytes, end advanced by fewer bytes than CHUNK_OVERLAP_CHARS; the overlap subtraction produced a new_start behind the current start
  • The old safeguard (if start >= end) did not fire because new_start < start < end, so start regressed and the loop ran forever
  • Fix: start = if new_start > start { new_start } else { end } — always guarantees forward progress

Root Cause Evidence

macOS sample profiler on the live zeph process (CPU: 1126%, RAM: ~1 GB) showed all 12 tokio worker threads stuck in:

embed_and_store_regular_bg → chunk_text → rfind → StrSearcher::next_match_back

SQLite confirmed 12 unembedded messages, including an 85 KB tool_result (line-numbered Rust source). The embed backfill spawned background tasks for all of them at startup; all entered the infinite loop simultaneously, saturating the thread pool and blocking log flushing — explaining both the CPU/RAM spike and the frozen TUI status ("Connecting tools...").

Test plan

  • cargo nextest run -p zeph-memory --lib — 1114 tests pass
  • cargo +nightly fmt --check — clean
  • cargo clippy -p zeph-memory -- -D warnings — clean
  • Live session: start agent with existing DB containing unembedded messages, confirm startup completes, CPU stays normal, TUI advances past "Connecting tools..."

… start

When rfind("\n\n") found a match near the beginning of the sliding window,
`end` advanced by fewer bytes than CHUNK_OVERLAP_CHARS. The subsequent
`end.saturating_sub(CHUNK_OVERLAP_CHARS)` produced a value less than the
current `start`, and `ceil_char_boundary` returned a position behind it.
The old safeguard (`if start >= end`) did not fire because new_start <
start < end, so `start` regressed and the loop ran forever.

Fix: guarantee forward progress by taking `end` when `new_start <= start`.

This was the root cause of 100% CPU / GB RAM consumption at startup: the
embed backfill spawned background tasks for unembedded messages (including
an 85 KB tool_result), all of which entered the infinite loop simultaneously,
saturating all tokio worker threads and blocking log flushing.
@github-actions github-actions Bot added memory zeph-memory crate (SQLite) rust Rust code changes bug Something isn't working size/XS Extra small PR (1-10 lines) labels Apr 15, 2026
@bug-ops bug-ops enabled auto-merge (squash) April 15, 2026 00:47
@bug-ops bug-ops merged commit aa67ed2 into main Apr 15, 2026
30 checks passed
@bug-ops bug-ops deleted the fix/chunk-text-infinite-loop branch April 15, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working memory zeph-memory crate (SQLite) rust Rust code changes size/XS Extra small PR (1-10 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant