Skip to content

Tag Index Engine #279

@ElioNeto

Description

@ElioNeto

Tag Index Engine

Build a tag indexing system that extracts #tags from note content and maintains a searchable index using the LSM prefix scan.

Storage Design

Use column family tag: with prefix-based lookups:

cf "tag":
  tag:{tagname}  -> JSON array of note paths that contain this tag

Components

TagIndex::index_tags(note_path, tags: Vec<String>)

  • Diff old vs new tags for the note
  • Add note path to tag:{tag} for new tags
  • Remove note path from tag:{tag} for removed tags

TagIndex::remove_note_tags(note_path)

  • Get all tags for the note from metadata
  • Remove note path from each tag:{tag}

TagIndex::search_by_tag(tag, cursor, limit) -> (Vec<String>, Option<String>)

  • Use Engine::search_prefix("tag:{tag}") with cursor pagination
  • Return note paths sorted lexicographically

TagIndex::list_all_tags() -> Vec<String>

  • Scan tag: prefix, collect unique tag names
  • Support pagination via cursor

Tag Parsing Rules (from markdown)

  • #tag — at word boundary
  • #tag/subtag — nested tags (store as full path)
  • #tag with spaces — not valid (must be alphanumeric)
  • #tag# — trailing hash is not part of tag
  • Ignore tags inside code blocks, inline code, and HTML comments
  • Max tag length: 100 chars
  • Allowed chars: [a-zA-Z0-9_/-]

Acceptance Criteria

  • Tags extracted correctly from markdown content
  • Tags inside code blocks are ignored
  • Tag index is updated atomically on note write
  • search_by_tag returns correct notes with pagination
  • Nested tags (tag/subtag) work as expected
  • Performance: 10K tags indexed in < 50ms
  • Unit tests for all operations

Parent Epic

#275

Metadata

Metadata

Assignees

No one assigned

    Labels

    featnotesNote storage and indexingobsidianObsidian-like note-taking features

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions