fix(sidecar): preserve duplicate blob paths#21
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThe PR records multiple paths per git blob in sidecar snapshots and populates all matching snapshot entries when blobs are read; it adds a test verifying FSCK with duplicate blobs and introduces Makefile/mage install targets. ChangesBlob Deduplication Support
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/sidecar/reconcile.go (1)
491-497: ⚡ Quick winDeduplicate blob IDs before
git cat-file --batchto avoid repeated payload reads.Line 491 appends every object ID (including duplicates). With many identical files, this can multiply batch I/O and memory for no extra value. Request each object once, then fan out by
objectPaths.♻️ Proposed refactor
files := map[string]reconcile.SnapshotFile{} objects := make([]string, 0) + seenObjects := map[string]struct{}{} objectPaths := map[string][]string{} @@ object := fields[2] files[rel] = reconcile.SnapshotFile{Path: rel, Size: size, Blob: object} - objects = append(objects, object) + if _, seen := seenObjects[object]; !seen { + objects = append(objects, object) + seenObjects[object] = struct{}{} + } objectPaths[object] = append(objectPaths[object], rel) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/sidecar/reconcile.go` around lines 491 - 497, The code currently appends every object ID (including duplicates) to the slice named objects before calling s.catFileBatch, causing repeated reads; change the logic to deduplicate object IDs (e.g., build a unique set/map of object IDs) so s.catFileBatch is invoked only with one entry per unique object, then use the existing objectPaths map to fan out the returned contents to all paths associated with each object; ensure you still populate objectPaths[object] with rel for every file and call s.catFileBatch(ctx, sidecarDir, uniqueObjects).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/sidecar/service_test.go`:
- Around line 222-264: The test function
TestServiceFSCKHandlesDuplicateSidecarBlobs must be converted to use the
repository's subtest pattern: wrap the existing test body in a t.Run("Should
...") closure (e.g., t.Run("Should handle duplicate sidecar blobs", func(t
*testing.T) { ... })) and move all current setup and assertions (setGitIdentity,
newMainRepo, newBareRepo, cfg/bootstrapRepo, writes, service.Sync, service.FSCK,
journal reads and assertions) inside that closure so the top-level
TestServiceFSCKHandlesDuplicateSidecarBlobs becomes a thin wrapper that calls
the subtest; keep the same assertions and references to sidecar.New,
service.Sync, service.FSCK, and state.HydrationJournal.
---
Nitpick comments:
In `@internal/sidecar/reconcile.go`:
- Around line 491-497: The code currently appends every object ID (including
duplicates) to the slice named objects before calling s.catFileBatch, causing
repeated reads; change the logic to deduplicate object IDs (e.g., build a unique
set/map of object IDs) so s.catFileBatch is invoked only with one entry per
unique object, then use the existing objectPaths map to fan out the returned
contents to all paths associated with each object; ensure you still populate
objectPaths[object] with rel for every file and call s.catFileBatch(ctx,
sidecarDir, uniqueObjects).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 514ee61d-8b5f-4745-a82e-e941eb5ed0b5
📒 Files selected for processing (2)
internal/sidecar/reconcile.gointernal/sidecar/service_test.go
Summary
SPEC.mdfiles with identical content.Root Cause
sidecarSnapshotkeyed sidecar object paths asmap[string]string, so when multiple managed files had identical content and Git stored them as the same blob object, the later path overwrote the earlier one. FSCK then hydrated metadata for only one path and could report duplicate-content specs incorrectly.Validation
make verifypassed locally.0 issuesand193 testspassing.Summary by CodeRabbit
Bug Fixes
Tests
Chores