Skip to content

fix: add fingerprint and commit-hash dedup#133

Merged
github-actions[bot] merged 1 commit intomasterfrom
fix/commit-hash-dedup
Mar 5, 2026
Merged

fix: add fingerprint and commit-hash dedup#133
github-actions[bot] merged 1 commit intomasterfrom
fix/commit-hash-dedup

Conversation

@jordanpartridge
Copy link
Contributor

@jordanpartridge jordanpartridge commented Mar 5, 2026

Summary

  • Adds fingerprint tag dedup: entries with fingerprint:{hash} tags are matched against existing entries via Qdrant scroll filter before vector similarity check
  • Adds title+commit hash dedup: prevents duplicate CI test snapshots (same title + same commit = same event)
  • Stores commit field in Qdrant payload for future dedup lookups
  • 4 new unit tests covering rejection and pass-through cases

Context

The default project has 4,549 entries, many identical CI test snapshots. This prevents future flooding by catching duplicates at two new layers before the existing vector similarity check.

Test plan

  • Fingerprint tag match throws DuplicateEntryException
  • Title+commit hash match throws DuplicateEntryException
  • Entries with fingerprint but no match proceed normally
  • Commit field is stored in payload
  • All 61 QdrantService tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced duplicate detection using fingerprint-based matching and title+commit hash tracking.
    • Commit metadata is now stored with entries to support more intelligent deduplication checks.
  • Tests

    • Added comprehensive unit tests covering duplicate detection scenarios and edge cases.

…oding

Adds two new dedup checks before vector similarity in QdrantService::upsert():
- Fingerprint tag matching: entries with `fingerprint:{hash}` tags are checked
  against existing entries via Qdrant scroll filter
- Title+commit hash matching: same title + same commit hash = same CI event,
  preventing duplicate snapshots from test runs
- Stores `commit` field in payload for future dedup checks

Includes 4 new unit tests covering both rejection and pass-through cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e828edce-3a0a-4333-b1d9-54271867c1cb

📥 Commits

Reviewing files that changed from the base of the PR and between 29b132f and 7205a97.

📒 Files selected for processing (2)
  • app/Services/QdrantService.php
  • tests/Unit/Services/QdrantServiceTest.php

📝 Walkthrough

Walkthrough

Implements enhanced duplicate detection in QdrantService with fingerprint-based and title+commit-based dedup mechanisms alongside existing content-hash dedup. Adds three private helper methods to support the new dedup logic and stores the commit field in payload during upsert operations. Includes comprehensive unit tests covering all new dedup scenarios.

Changes

Cohort / File(s) Summary
Service Implementation
app/Services/QdrantService.php
Introduces two new dedup mechanisms: fingerprint extraction from tags and title+commit hash matching. Adds helper methods (extractFingerprint, findByFingerprint, findByTitleAndCommit) to query for duplicates before content-hash dedup. Stores commit field in payload during upsert.
Unit Tests
tests/Unit/Services/QdrantServiceTest.php
Comprehensive test coverage for fingerprint-based duplicate detection, title+commit-based duplicate detection, proper upsert flow when no duplicates match, commit field storage in payload, and integration with embedding generation and Qdrant operations.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant QdrantService
    participant Database as Database/Qdrant
    
    Caller->>QdrantService: upsert(entry, project, checkDuplicates=true)
    
    alt checkDuplicates enabled
        QdrantService->>QdrantService: extractFingerprint(tags)
        alt fingerprint exists
            QdrantService->>Database: findByFingerprint(fingerprint, project)
            Database-->>QdrantService: match found
            QdrantService->>Caller: throw DuplicateEntryException ❌
        end
        
        alt commit hash provided
            QdrantService->>Database: findByTitleAndCommit(title, commit, project)
            Database-->>QdrantService: match found
            QdrantService->>Caller: throw DuplicateEntryException ❌
        end
    end
    
    QdrantService->>QdrantService: computeContentHash(entry)
    QdrantService->>Database: upsertPoints(points, collection)
    Database-->>QdrantService: success
    QdrantService->>Caller: return true ✓
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Fingerprints and commits aligned,
Duplicate entries left behind!
Three dedup checks keep data clean,
Qdrant's finest storage scene! ✨
No duplicates shall ever thrive,
Quality control alive! 🎯

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/commit-hash-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

📊 Coverage Report

Metric Coverage Threshold Status
Lines 99.4% 95%

Files Below Threshold

File Coverage Uncovered Lines
app/Enums/ObservationType.php 0% None
app/Exceptions/Qdrant/QdrantException.php 0% None
app/Integrations/Qdrant/Requests/ScrollPoints.php 0% None
app/Services/AgentHealthService.php 0% None
app/Commands/Concerns/ResolvesProject.php 80% 26
app/Commands/KnowledgeSearchCommand.php 90.3% 73, 74, 76, 77, 78... (+2 more)
app/Services/OllamaService.php 92% 76, 152, 159, 167, 171... (+1 more)
app/Services/ProjectDetectorService.php 92% 67, 81

🏆 Synapse Sentinel Gate

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🏆 Sentinel Certified

Tests & Coverage: 0 tests passed
Security Audit: No security vulnerabilities found
Pest Syntax: All test files use describe/it syntax


Add this badge to your README:

[![Sentinel Certified](https://img.shields.io/github/actions/workflow/status/conduit-ui/knowledge/gate.yml?label=Sentinel%20Certified&style=flat-square)](https://github.com/conduit-ui/knowledge/actions/workflows/gate.yml)

@github-actions github-actions bot merged commit 326f84d into master Mar 5, 2026
1 of 2 checks passed
@github-actions github-actions bot deleted the fix/commit-hash-dedup branch March 5, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant