fix: Return matched chunk text in search results#601
Conversation
The `Entity.entity_type` column stores the frontmatter `type` value
(note, spec, schema, person) but its name collides with `SearchItemType`
(entity/observation/relation). This caused real bugs where
`search_by_metadata({"entity_type": "spec"})` would fail because the
metadata filter looked in the wrong JSON column.
Changes:
- Alembic migration renames column + index on entity table, updates
search_index JSON metadata (both SQLite and Postgres)
- ORM model, Pydantic schemas, services, repositories, API routers,
MCP tools/clients, CLI commands, and schema inference engine all
updated to use `note_type`
- `SearchQuery.types` renamed to `SearchQuery.note_types` for clarity
- Type alias `EntityType` renamed to `NoteType`
- ~52 test files updated
Unchanged: `SearchItemType` enum, `entity_types` params that filter by
entity/observation/relation, frontmatter YAML `type:` key,
`entity_metadata` column.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
Signed-off-by: phernandez <paul@basicmachines.co>
Thread the actual matching chunk text from vector search through to the API response as a new `matched_chunk` field. Previously, search results always returned the beginning of the note (truncated to 250 chars) regardless of which section matched the query. The vector search pipeline already matches against specific chunks, but the chunk text was discarded — only chunk_key and distance were returned. This fix adds c.chunk_text to the vector query SELECT (both SQLite and Postgres), tracks the best-matching chunk per search_index row, and surfaces it as `matched_chunk` on the SearchResult schema. - SearchIndexRow: new `matched_chunk_text` field - Vector queries: SELECT c.chunk_text alongside existing columns - _search_vector_only: track best_chunk_by_si_id, set on ranked results - SearchResult schema: new optional `matched_chunk` field - Hybrid merge: vector rows overwrite FTS rows, preserving chunk text - FTS-only results: matched_chunk remains None (no vector match) Closes #599. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 3m 0s —— View job PR Review: Return matched chunk text in search resultsReviewing PR #601 against the team checklist...
SummaryThis PR cleanly threads the matched chunk text from vector search through to the API response. The implementation is minimal, backwards-compatible, and well-tested. A few observations below. Code Quality & Standards
Testing & Documentation
Basic Memory Architecture
Specific Code Observations1. Empty string vs. chunk_text = row.get("chunk_text", "")If 2. When 3. Alembic migration file included in diff The migration Security & Performance
Compatibility
CI StatusThe "Pull Request Title" check failed due to the semantic PR title linter ( VerdictLGTM. The implementation is clean, minimal, and well-tested. The only actionable item is the |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 18s —— View job PR Review: Return matched chunk text in search resultsReviewing PR #601 against the team checklist...
SummaryThis PR cleanly threads the matched chunk text from vector search through to the API response. The implementation is minimal, well-scoped, and backwards-compatible. A few observations below. Code Quality & Standards
Testing & Documentation
Minor observation: Basic Memory Architecture
Specific Code Observations1. Empty string default for missing chunk text ( chunk_text = row.get("chunk_text", "")If 2. After the threshold filter removes entries from 3. Alembic migration file included in diff
Security & Performance
Compatibility
CI StatusThe "Pull Request Title" check is failing because the title needs a conventional commit prefix. Consider renaming the PR title to VerdictLGTM. The implementation is clean, minimal, and well-tested. The only actionable item is the |
Summary
matched_chunkfieldmatched_chunk: nullChanges
SearchIndexRow: newmatched_chunk_textfieldc.chunk_textto SELECT_search_vector_only: trackbest_chunk_by_si_id, set on ranked resultsSearchResultschema: new optionalmatched_chunkfieldTest plan
test_matched_chunk_text_populated_on_vector_results(unit)test_search_result_includes_matched_chunk(API router)test_search_result_omits_matched_chunk_when_none(API router)Closes #599.
🤖 Generated with Claude Code