Release 1.11.0 · Saiki77/smart-related-notes

v1.11.0: whole-note structure-aware chunking (foundational embedding overhaul)

Notes are no longer truncated to 1500 chars and chunked by blind sentence windows.
The whole note is now split at LOGICAL boundaries into coherent idea-chunks, so a note
is an overall embedding PLUS a set of section-level chunks other notes/queries align
with section-by-section.

splitIntoSections(): parse the raw note into sections at ATX headings (code-fence +
frontmatter aware), carrying a heading breadcrumb; headings are the primary idea
boundary, paragraphs the secondary one. A window never crosses a section/paragraph.
splitToBudget(): hard char guard (MAX_CHUNK_CHARS=480 ≈ 120 tokens) splitting any
window at sentence then whitespace boundaries so the model never silently truncates
a chunk (EN + DE). TARGET_WORDS 60->80.
Whole-note coverage: removed embedCharLimit truncation entirely; chunk-count cap
16->48 (adaptive tiers raised), over-cap keeps every section's first window.
Heading context (new setting, default on): the first chunk of each section embeds
with a "Note > H1 > H2:" breadcrumb prefix (embed input only; raw text kept for
snippets), the LLM-free contextual-retrieval trick, scoped to avoid embedding
collapse. The window is clamped so prefix+window stays within the token budget.
INDEX_VERSION 4->5: one-time full re-embed on update. meanVector + biMax unchanged
(better inputs). Both build() and embedFile() embed via a shared chunkEmbedInput()
helper so the full and incremental paths can't diverge.

Verified by a research-backed design pass + an adversarial review (fixed a high-sev
incremental-path regression and a prefix-budget truncation before shipping).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.11.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!