Skip to content

Comments

Sem 47 sha based drift detection#7

Merged
Kadajett merged 4 commits intomainfrom
sem-47-sha-based-drift-detection
Dec 5, 2025
Merged

Sem 47 sha based drift detection#7
Kadajett merged 4 commits intomainfrom
sem-47-sha-based-drift-detection

Conversation

@Kadajett
Copy link
Contributor

@Kadajett Kadajett commented Dec 5, 2025

  • Implements SHA-based drift detection for the layered index system, replacing time-based staleness checks

    • Adds DriftStatus struct to track indexed vs current SHA, changed files, and drift percentage
    • Adds UpdateStrategy enum with threshold-based strategy selection (Fresh, Incremental, Rebase, FullRebuild)
    • Detects merge-base changes for branch layer rebase scenarios
    • Working layer checks uncommitted changes via git status

    Test plan

    • 13 tests passing (7 TDD required + 6 additional)
    • test_same_sha_reports_fresh - Old project with same SHA reports fresh
    • test_different_sha_reports_stale - Different SHA correctly detected
    • test_strategy_incremental_under_10_files - <10 files → Incremental
    • test_strategy_rebase_under_30_percent - <30% → Rebase
    • test_strategy_full_rebuild_over_30_percent - ≥30% → FullRebuild
    • test_merge_base_change_detected - Merge-base movement detected
    • test_working_layer_checks_uncommitted - Working layer detects uncommitted changes

    Closes SEM-47

Implements drift detection based on git SHA comparison rather than
time-based staleness. Tracks HEAD commit, merge-base changes, and
calculates update strategies based on drift magnitude.

Key features:
- DriftStatus struct with SHA tracking and drift percentage
- UpdateStrategy enum: Fresh, Incremental, Rebase, FullRebuild
- DriftDetector with check_drift() for all layer types
- Merge-base detection for rebase scenarios
- Threshold-based strategy selection (<10 files, <30%, ≥30%)

Includes 13 tests covering TDD requirements:
- Fresh index detection
- Incremental updates for small changes
- Large drift detection with FullRebuild strategy
- Merge-base change detection
- Uncommitted working changes

Part of Phase 2.5 layered index architecture.
@linear
Copy link

linear bot commented Dec 5, 2025

SEM-47 SHA-Based Drift Detection

Overview

Implement drift detection using git SHA comparison instead of timestamps.

Key Insight

Time-based staleness is meaningless:

  • Old project with same SHA = FRESH (nothing changed)
  • 5-minute break with half app rewritten = STALE (everything changed)

Drift Magnitude → Strategy

Drift Strategy
0 files No action
< 10 files Incremental update
< 30% of repo Rebase overlay
30% of repo

| Full rebuild |

TDD Requirements

#[test] fn test_same_sha_reports_fresh() { }
#[test] fn test_different_sha_reports_stale() { }
#[test] fn test_strategy_incremental_under_10_files() { }
#[test] fn test_strategy_rebase_under_30_percent() { }
#[test] fn test_strategy_full_rebuild_over_30_percent() { }
#[test] fn test_merge_base_change_detected() { }
#[test] fn test_working_layer_checks_uncommitted() { }

Deliverables

  • DriftStatus struct:
    • is_stale: bool
    • indexed_sha: String
    • current_sha: String
    • changed_files: Vec<PathBuf>
    • drift_percentage: f64
  • UpdateStrategy enum: Fresh, Incremental(Vec<PathBuf>), Rebase, FullRebuild
  • check_drift(layer: LayerKind) -> DriftStatus
  • Strategy selection based on drift magnitude

Acceptance Criteria

  • Old project with same SHA reports as fresh
  • Detects changed files via git diff
  • Correctly identifies when merge-base moved (branch layer)
  • Working layer checks uncommitted changes via git status
  • Strategy selection matches drift magnitude thresholds

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements SHA-based drift detection for the layered index system, replacing time-based staleness checks. The implementation correctly recognizes that drift should be measured by content changes (SHA differences) rather than time elapsed.

Key changes:

  • Adds DriftStatus struct to track indexed vs current SHA, changed files, drift percentage, and merge-base information for branch layers
  • Introduces UpdateStrategy enum with threshold-based strategy selection (Fresh, Incremental, Rebase, FullRebuild) based on the magnitude of changes
  • Implements layer-specific drift detection for Base, Branch, Working, and AI layers
  • Includes comprehensive test coverage (13 tests) covering all TDD requirements and edge cases

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/drift.rs New module implementing SHA-based drift detection with DriftStatus, UpdateStrategy, DriftDetector, and helper functions. Includes comprehensive tests covering all layer types and update strategies.
src/lib.rs Adds re-exports for drift detection types (DriftDetector, DriftStatus, UpdateStrategy, count_tracked_files) and removes unused SemanticSummary import from benchmark module.
src/benchmark.rs Auto-formatter changes breaking long format! strings across multiple lines and removing unused SemanticSummary import. No functional changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


// Calculate thresholds
let thirty_percent = (total_repo_files as f64 * 0.30).ceil() as usize;

Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case handling could be clearer: When is_stale is true but changed_count == 0, returning Fresh seems contradictory. This can theoretically occur if git reports a SHA change but no file changes (e.g., commits with only metadata changes). Consider adding a comment explaining this edge case:

// Edge case: SHA changed but no files changed (e.g., empty commit, metadata-only change)
// In this case, no actual update is needed despite the SHA difference
if changed_count == 0 {
    UpdateStrategy::Fresh
Suggested change
// Edge case: SHA changed but no files changed (e.g., empty commit, metadata-only change)
// In this case, no actual update is needed despite the SHA difference

Copilot uses AI. Check for mistakes.
Kadajett and others added 2 commits December 5, 2025 09:21
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…mment

- Fix compression ratio to use total-based calculation instead of per-file average
- Rename avg_compression/avg_token_savings to total_compression/total_token_savings
- Add comment explaining edge case when SHA changes but no files changed
@Kadajett Kadajett merged commit bc64a78 into main Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant