Conversation
- Fix Bug #1: Add file_path to SymbolState for proper file indexing - Fix Bug #3: Make delete() return whether symbol existed - Fix Bug #7: Auto-rebuild symbols_by_file after deserialization - Fix Bug #11: Implement lang filter in SearchHints::matches() - Fix Bug #12: Add serde(default) to deleted and moves fields Refactoring: - Extract SearchHints to src/search.rs (single responsibility) - Add lang_from_extension() public helper - Apply guard clause pattern in matches() SEM-44
SEM-44 Core Overlay Data Structures
OverviewCreate foundational data structures for layered state management. Layered State ModelTDD RequirementsWrite tests BEFORE implementation for:
Base Tests to Expand#[test] fn test_overlay_create_empty() { }
#[test] fn test_overlay_upsert_symbol() { }
#[test] fn test_overlay_delete_symbol() { }
#[test] fn test_overlay_file_move_tracking() { }
#[test] fn test_overlay_serialize_deserialize() { }
#[test] fn test_layer_kind_ordering() { }
#[test] fn test_symbol_state_transitions() { }Deliverables
Additional Scope (from architecture discussion)
Acceptance Criteria
|
There was a problem hiding this comment.
Pull request overview
This pull request introduces a layered overlay system for incremental semantic indexing (SEM-44) and adds search filtering capabilities. The changes improve type safety through trait derives and enhance serialization robustness with default attributes.
Key Changes:
- Added
searchmodule withSearchHintsfor filtering files by extension, directory, pattern, and language, including a comprehensive language mapping function - Added
overlaymodule implementing a 4-layer state management system (Base, Branch, Working, AI) with symbol tracking, file move resolution, and content-addressable hashing - Enhanced schema structs with
PartialEqandEqderives for better comparison support and#[serde(default)]attributes for graceful deserialization
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/search.rs | New module providing SearchHints struct for filtering queries with extension, directory, file pattern, and language filters, plus lang_from_extension helper covering 20+ languages |
| src/overlay.rs | New module implementing SEM-44 layered indexing with LayerKind, SymbolState, Overlay, and LayeredIndex for managing symbol state across Git layers with file move tracking and conflict detection |
| src/schema.rs | Added PartialEq and Eq derives to 7 structs (SymbolInfo, Prop, Argument, StateChange, ControlFlowChange, Location, Call) and #[serde(default)] to vector/option fields for robust deserialization |
| src/lib.rs | Registered new overlay and search modules and re-exported their public types for external use |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn upsert(&mut self, hash: String, state: SymbolState) -> Option<SymbolState> { | ||
| // Update file index if this is an active symbol with a file path | ||
| if let Some(file_path) = state.file_path() { | ||
| self.symbols_by_file | ||
| .entry(file_path.clone()) | ||
| .or_default() | ||
| .push(hash.clone()); | ||
| } | ||
|
|
||
| // Update deleted set | ||
| if state.is_deleted() { | ||
| self.deleted.insert(hash.clone()); | ||
| } else { | ||
| self.deleted.remove(&hash); | ||
| } | ||
|
|
||
| self.meta.touch(); | ||
| self.update_counts(); | ||
| self.symbols.insert(hash, state) | ||
| } |
There was a problem hiding this comment.
The upsert method doesn't clean up old file path entries from symbols_by_file when updating an existing symbol. If a symbol is updated with a different file path (or changed from having a file path to not having one), the old file path entry remains in symbols_by_file, causing stale references.
To fix this, retrieve the old state before inserting the new one, and if it had a file path, remove the hash from that file path's entry in symbols_by_file.
src/search.rs
Outdated
| /// assert!(hints.matches("src/lib.rs")); | ||
| /// assert!(!hints.matches("tests/test.py")); | ||
| /// ``` | ||
| #[derive(Debug, Clone, Default, Serialize, Deserialize)] |
There was a problem hiding this comment.
[nitpick] For consistency with the type safety improvements in this PR (adding PartialEq and Eq to schema structs), consider adding these derives to SearchHints as well. This would improve testability and allow SearchHints instances to be compared, which could be useful for caching or deduplication purposes.
| #[derive(Debug, Clone, Default, Serialize, Deserialize)] | |
| #[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)] |
- Add test_upsert_cleans_up_stale_file_path_on_move: verifies old file path index is cleaned up when symbol moves to new file - Add test_upsert_does_not_duplicate_on_same_file: verifies updating symbol at same path doesn't create duplicate index entries The upsert() fix was already implemented in the previous commit. These tests ensure the behavior is covered by the test suite.
This pull request introduces new search filtering capabilities and improves type safety for schema-related structs. The main change is the addition of the
searchmodule, which provides theSearchHintsstruct and related functions for filtering files and symbols by extension, directory, file pattern, and programming language. Additionally, several schema structs now derivePartialEqandEqfor better comparison support, and their serialization is made more robust by defaulting empty fields.Search functionality improvements:
searchmodule (src/search.rs), which provides theSearchHintsstruct for filtering queries based on extension, directory, file pattern, and language, including utility functions and tests.SearchHintsandlang_from_extensioninsrc/lib.rsfor external use, making search features available to consumers of the crate.searchmodule insrc/lib.rsfor inclusion in the library.Schema and type safety improvements:
PartialEqandEqderives to schema structs such asSymbolInfo,Prop,Argument,StateChange,ControlFlowChange,Location, andCallinsrc/schema.rs, improving comparison and testability. [1] [2] [3] [4] [5] [6] [7]SymbolInfoby adding#[serde(default)]to vector and option fields, ensuring empty fields are handled gracefully during (de)serialization.