Skip to content

Comments

Feature/sem 44#1

Merged
Kadajett merged 4 commits intomainfrom
feature/sem-44
Dec 5, 2025
Merged

Feature/sem 44#1
Kadajett merged 4 commits intomainfrom
feature/sem-44

Conversation

@Kadajett
Copy link
Contributor

@Kadajett Kadajett commented Dec 4, 2025

This pull request introduces new search filtering capabilities and improves type safety for schema-related structs. The main change is the addition of the search module, which provides the SearchHints struct and related functions for filtering files and symbols by extension, directory, file pattern, and programming language. Additionally, several schema structs now derive PartialEq and Eq for better comparison support, and their serialization is made more robust by defaulting empty fields.

Search functionality improvements:

  • Added the new search module (src/search.rs), which provides the SearchHints struct for filtering queries based on extension, directory, file pattern, and language, including utility functions and tests.
  • Re-exported SearchHints and lang_from_extension in src/lib.rs for external use, making search features available to consumers of the crate.
  • Registered the new search module in src/lib.rs for inclusion in the library.

Schema and type safety improvements:

  • Added PartialEq and Eq derives to schema structs such as SymbolInfo, Prop, Argument, StateChange, ControlFlowChange, Location, and Call in src/schema.rs, improving comparison and testability. [1] [2] [3] [4] [5] [6] [7]
  • Improved serialization of fields in SymbolInfo by adding #[serde(default)] to vector and option fields, ensuring empty fields are handled gracefully during (de)serialization.

- Fix Bug #1: Add file_path to SymbolState for proper file indexing
- Fix Bug #3: Make delete() return whether symbol existed
- Fix Bug #7: Auto-rebuild symbols_by_file after deserialization
- Fix Bug #11: Implement lang filter in SearchHints::matches()
- Fix Bug #12: Add serde(default) to deleted and moves fields

Refactoring:
- Extract SearchHints to src/search.rs (single responsibility)
- Add lang_from_extension() public helper
- Apply guard clause pattern in matches()

SEM-44
@linear
Copy link

linear bot commented Dec 4, 2025

SEM-44 Core Overlay Data Structures

Overview

Create foundational data structures for layered state management.

Layered State Model

Layer 0: BASE (main/master)     - Persistent, full sharded index
Layer 1: BRANCH                 - Commits since diverging from base
Layer 2: WORKING                - Uncommitted changes (staged + unstaged)
Layer 3: AI PROPOSED            - In-memory changes not yet on disk

TDD Requirements

Write tests BEFORE implementation for:

  • Overlay creation and serialization
  • Symbol upsert/delete operations
  • File move path resolution

Base Tests to Expand

#[test] fn test_overlay_create_empty() { }
#[test] fn test_overlay_upsert_symbol() { }
#[test] fn test_overlay_delete_symbol() { }
#[test] fn test_overlay_file_move_tracking() { }
#[test] fn test_overlay_serialize_deserialize() { }
#[test] fn test_layer_kind_ordering() { }
#[test] fn test_symbol_state_transitions() { }

Deliverables

  • src/overlay.rs with:
    • Overlay struct for per-layer symbol storage
    • LayeredIndex for managing the full layer stack
    • SymbolState enum (Active, Deleted, Modified)
    • LayerKind enum (Base, Branch, Working, AI)
    • LayerMeta for tracking indexed SHA, merge-base, timestamps
    • FileMove struct for path resolution through renames
  • Unit tests for all overlay operations

Additional Scope (from architecture discussion)

  • Content-addressable symbol hashes (hash by content, not location)
  • Search hints data structures for ext:rs, dir:src, file:*.test.ts, lang:python

Acceptance Criteria

  • Can create/serialize/deserialize overlays
  • Symbol upsert/delete operations work correctly
  • File move tracking resolves paths through all layers
  • LayerKind ordering is correct (AI > Working > Branch > Base)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a layered overlay system for incremental semantic indexing (SEM-44) and adds search filtering capabilities. The changes improve type safety through trait derives and enhance serialization robustness with default attributes.

Key Changes:

  • Added search module with SearchHints for filtering files by extension, directory, pattern, and language, including a comprehensive language mapping function
  • Added overlay module implementing a 4-layer state management system (Base, Branch, Working, AI) with symbol tracking, file move resolution, and content-addressable hashing
  • Enhanced schema structs with PartialEq and Eq derives for better comparison support and #[serde(default)] attributes for graceful deserialization

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/search.rs New module providing SearchHints struct for filtering queries with extension, directory, file pattern, and language filters, plus lang_from_extension helper covering 20+ languages
src/overlay.rs New module implementing SEM-44 layered indexing with LayerKind, SymbolState, Overlay, and LayeredIndex for managing symbol state across Git layers with file move tracking and conflict detection
src/schema.rs Added PartialEq and Eq derives to 7 structs (SymbolInfo, Prop, Argument, StateChange, ControlFlowChange, Location, Call) and #[serde(default)] to vector/option fields for robust deserialization
src/lib.rs Registered new overlay and search modules and re-exported their public types for external use

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 392 to 411
pub fn upsert(&mut self, hash: String, state: SymbolState) -> Option<SymbolState> {
// Update file index if this is an active symbol with a file path
if let Some(file_path) = state.file_path() {
self.symbols_by_file
.entry(file_path.clone())
.or_default()
.push(hash.clone());
}

// Update deleted set
if state.is_deleted() {
self.deleted.insert(hash.clone());
} else {
self.deleted.remove(&hash);
}

self.meta.touch();
self.update_counts();
self.symbols.insert(hash, state)
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upsert method doesn't clean up old file path entries from symbols_by_file when updating an existing symbol. If a symbol is updated with a different file path (or changed from having a file path to not having one), the old file path entry remains in symbols_by_file, causing stale references.

To fix this, retrieve the old state before inserting the new one, and if it had a file path, remove the hash from that file path's entry in symbols_by_file.

Copilot uses AI. Check for mistakes.
src/search.rs Outdated
/// assert!(hints.matches("src/lib.rs"));
/// assert!(!hints.matches("tests/test.py"));
/// ```
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] For consistency with the type safety improvements in this PR (adding PartialEq and Eq to schema structs), consider adding these derives to SearchHints as well. This would improve testability and allow SearchHints instances to be compared, which could be useful for caching or deduplication purposes.

Suggested change
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)]

Copilot uses AI. Check for mistakes.
- Add test_upsert_cleans_up_stale_file_path_on_move: verifies old file
  path index is cleaned up when symbol moves to new file
- Add test_upsert_does_not_duplicate_on_same_file: verifies updating
  symbol at same path doesn't create duplicate index entries

The upsert() fix was already implemented in the previous commit. These
tests ensure the behavior is covered by the test suite.
@Kadajett Kadajett merged commit e1bb445 into main Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant