backup: Phase 0b M4-2b impl - S3 collision-rename reversal#928
Conversation
Per design doc PR #913 §'Per-bucket keymap loading'. Adds the read-side helper that inverts the S3 decoder's KEYMAP.jsonl writes, plus the per- record invariant validator. No call sites wired up yet — commit B integrates this into the encodeBucketObjects walk. New (encode_s3_collision.go): - loadBucketKeymap: opens <bucketDir>/KEYMAP.jsonl via the read-side pipeline (root.Lstat -> refuseHardLink -> root.Open). MUST NOT use OpenSidecarFile - that helper is write-side on every platform (O_WRONLY|O_CREATE|O_TRUNC on Windows/fallback, O_WRONLY|O_CREATE + Truncate(0) on Unix) and would erase the file before reading. Codex P2 v913 v3. - readKeymapStrict: shared iterator over KeymapReader. Diverges from LoadKeymap (last-wins) — duplicate Encoded means corrupt dump, fail closed. Claude v913 v2. - validateKeymapRecord: per-record invariants — Kind in closed set {S3LeafData, MetaCollision, SHAFallback}; KindS3LeafData encoded must end in S3LeafDataSuffix; KindMetaCollision original must end in S3MetaSuffixReserved. - validateKeymapReservedRoot: rejects original keys whose first slash-split path component is _orphans or _incomplete_uploads. Whole _ namespace is NOT reserved - legit user keys like _foo / _foo/bar / nested/_orphans/x pass. Codex P2 v913 v1. - verifyKeymapTargetsExist: orphan-record detector (Encoded with no on-disk body). Runs once after load so operator sees the keymap inconsistency directly rather than as a later missing-sidecar error. O(N) Lstats per bucket. - resolveObjectKeyFromRel: lookup helper. Slash-form conversion + keymap lookup; nil keymap or miss returns slash-form rel as-is (no-collision M4-2a path). Tests (encode_s3_collision_test.go): - TestLoadBucketKeymap_HappyPath - three kinds round-trip into map. - TestLoadBucketKeymap_DuplicateEncoded - fail-closed regression for the LoadKeymap-divergence contract (claude v913 v2). - TestLoadBucketKeymap_MalformedJSON - wraps ErrInvalidKeymapRecord. - TestLoadBucketKeymap_MissingFile - (nil, nil) defensive return. - TestValidateKeymapRecord_KindLeafSuffix - 3-case table. - TestValidateKeymapRecord_KindMetaSuffix - good + bad. - TestValidateKeymapRecord_UnknownKind - forward-compat guard (claude v913 v2). - TestValidateKeymapReservedRoot - 9-case table covering boundary cases: _orphans (reject), _foo (accept), _orphansFoo (accept), nested/_orphans (accept), etc. Codex P2 v913 v1. - TestVerifyKeymapTargetsExist - present + orphan subtests. - TestResolveObjectKeyFromRel - 4-case lookup contract. Caller audit: pure additions; no existing call site changed in this commit. The loader / validator are unreachable from the encoder graph until commit B wires them into encodeBucketObjects. go test + golangci-lint clean on ./internal/backup/.
Per design doc PR #913 §'Object-walk integration'. Wires the loader from commit A into encodeBucketObjects and threads the loaded keymap through walkObjects -> walkObjectEntry -> encodeObject. For each on-disk object whose bucket-relative path appears in the keymap, the encoder now emits records under the recovered original S3 key rather than the renamed on-disk path. Sentinel removal: - ErrS3EncodeUnsupportedCollision removed - the encoder now reverses the collision renames it used to fail closed on. All wrap sites internal to this package were inside encodeBucketObjects (the now- replaced fail-closed branch) plus the test deleted below. grep confirms no external callers under /Users/bootjp/src/elastickv2. Integration points: - encodeBucketObjects: when isKeymapCollisionTracker reports true, loadBucketKeymap + verifyKeymapTargetsExist run before the walk. Both errors wrap to ErrS3EncodeInvalidBucket so the CLI's ErrEncodeAdapterData mark routes them to exit-2 unchanged. When the tracker is absent the keymap stays nil and the walk runs the M4-2a path byte-identically. - walkObjects / walkObjectEntry / walkObjectSubdir / encodeObject all gain a 'keymap map[string]KeymapRecord' parameter (nil-safe). encodeObject swaps filepath.ToSlash(objRel) for resolveObjectKeyFromRel(keymap, objRel). - walkObjectEntry: explicit skip for the top-level 'KEYMAP.jsonl' filename when keymap is non-nil. Keeps the single disambiguation-decision contract in isKeymapCollisionTracker - the walker never needs to re-derive whether KEYMAP.jsonl is a tracker (loadBucketKeymap already consumed it). Comment update: - encode_snapshot.go:655 docstring no longer cites ErrS3EncodeUnsupportedCollision (now removed). Substituted ErrS3EncodeReservedPrefixCollision + ErrS3EncodeInvalidBucket which still exist and play the same exit-2-routing example role. Test removal: - TestS3EncodeRejectsKeymapCollisionTracker deleted. The fixture (KEYMAP.jsonl with no sidecar) is exactly what M4-2b now reverses; commit C will add a positive round-trip test on the same shape. Tests preserved unchanged: - TestS3EncodeKeymapObjectRoundTrip - legitimate user 'KEYMAP.jsonl' with sidecar still round-trips (proves isKeymapCollisionTracker's sidecar disambiguation gates correctly; codex P2 v913 v2). - TestS3EncodeLeafDataObjectRoundTrip - a real object whose key ends in .elastickv-leaf-data + sidecar + NO keymap entry continues to take the legacy no-collision path. Caller audit (semantic changes): - walkObjects / walkObjectEntry / walkObjectSubdir / encodeObject - all unexported; single production call chain from encodeBucketObjects. All updated in this commit. - ErrS3EncodeUnsupportedCollision: grep on /Users/bootjp/src/elastickv2 finds zero references after this commit. go test + golangci-lint clean on ./internal/backup/.
Per design doc PR #913 §'Test plan'. Adds the 8 design-doc tests covering the end-to-end encoder behavior: - TestS3EncodeRoundTripsLeafDataCollision - both <key>.elastickv-leaf-data (renamed shorter key) and <key>/sub (longer key) round-trip; emitted !s3|obj|head|<orig> manifest records are under the recovered original S3 keys with no .elastickv-leaf-data suffix leak. - TestS3EncodeRoundTripsMetaCollision - the --rename-collisions=true variant for a user object whose key naturally ends in .elastickv-meta.json is reversed via KindMetaCollision. - TestS3EncodeRejectsKeymapTargetingReservedSubtree - 2-case sub-test matrix on _orphans/foo and _incomplete_uploads/x; integration-level pin for the design's reserved-subtree gate. - TestS3EncodeAcceptsKeymapWithUserUnderscoreKey - codex P2 v913 v1 narrow-reserved-rule: a single underscore prefix (_foo) is a legit user key and round-trips. - TestS3EncodeRejectsOrphanKeymapEntry - integration-level fail-closed for a keymap entry whose Encoded has no on-disk body. - TestS3EncodeRejectsDuplicateKeymapEntry - integration-level fail-closed for two records sharing an Encoded value. - TestS3EncodeRejectsMalformedKeymapJSON - integration-level fail-closed for an unparseable line. - TestS3EncodeMissingKeymapIsValidNoCollisionDump - regression that the M4-2a happy path is unchanged when no KEYMAP.jsonl is present. The integration tests assert at the !s3|obj|head| record-byte level via assertS3ManifestEmittedUnder rather than re-running the decoder. A full decoder round-trip would re-apply the renames (or reject the .elastickv-meta.json-suffixed key without --rename-collisions=true), obscuring whether the encoder recovered the original key. Helper shape adjustments to satisfy unparam: - assertS3ManifestEmittedUnder bakes 'b' as the bucket via the new collisionTestBucket package constant. - writeS3KeymapTracker drops its bucket parameter; uses the same shared constant. Existing tests preserved unchanged (the legitimacy gate from claude v913 v2 codex P2 still applies): - TestS3EncodeKeymapObjectRoundTrip - legitimate user object named KEYMAP.jsonl (with sidecar) still round-trips. - TestS3EncodeLeafDataObjectRoundTrip - a legit object key ending in .elastickv-leaf-data (with sidecar, no keymap entry) keeps the M4-2a no-collision path. go test + golangci-lint clean on ./internal/backup/.
|
Warning Review limit reached
More reviews will be available in 50 minutes and 22 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the M4-2b helper to reverse S3 decoder collision-rename writes by parsing and strictly validating the KEYMAP.jsonl file. It replaces the previous behavior of failing closed upon encountering a collision tracker with a robust recovery mechanism that maps renamed on-disk files back to their original S3 keys. The review feedback suggests two key improvements to validation: ensuring that rec.Encoded does not contain path traversal segments like . or .. to prevent out-of-bucket file probing, and validating that the decoded original key is non-empty to fail closed earlier with a clear error.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func validateKeymapRecord(rec KeymapRecord) error { | ||
| if _, ok := keymapAllowedKinds[rec.Kind]; !ok { | ||
| return errors.Wrapf(ErrInvalidKeymapRecord, | ||
| "unknown kind %q for encoded segment %q", rec.Kind, rec.Encoded) | ||
| } | ||
| originalBytes, err := rec.Original() |
There was a problem hiding this comment.
To prevent path traversal or out-of-bucket file probing, validate that rec.Encoded is a clean, relative path that does not contain any ., .., or empty segments.
func validateKeymapRecord(rec KeymapRecord) error {
if _, ok := keymapAllowedKinds[rec.Kind]; !ok {
return errors.Wrapf(ErrInvalidKeymapRecord,
"unknown kind %q for encoded segment %q", rec.Kind, rec.Encoded)
}
for _, seg := range strings.Split(rec.Encoded, "/") {
if seg == "." || seg == ".." || seg == "" {
return errors.Wrapf(ErrInvalidKeymapRecord, "encoded path %q contains invalid segment %q", rec.Encoded, seg)
}
}
originalBytes, err := rec.Original()| originalBytes, err := rec.Original() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| original := string(originalBytes) | ||
| if err := validateKeymapReservedRoot(rec, original); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
S3 keys cannot be empty. Validating that the decoded original key is non-empty during keymap loading allows the encoder to fail closed earlier with a clear error message, rather than waiting until the object body assembly phase.
originalBytes, err := rec.Original()
if err != nil {
return err
}
original := string(originalBytes)
if original == "" {
return errors.Wrap(ErrInvalidKeymapRecord, "empty original key")
}
if err := validateKeymapReservedRoot(rec, original); err != nil {
return err
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92b14f56c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if tracker { | ||
| return errors.Wrapf(ErrS3EncodeUnsupportedCollision, | ||
| "%s: collision-rename KEYMAP.jsonl present", bucketDir) | ||
| keymap, err = e.loadBucketKeymap(root, bucketDir) |
There was a problem hiding this comment.
Require a regular sidecar before skipping keymap load
When a bucket has a collision tracker and also contains a valid user object under the prefix KEYMAP.jsonl.elastickv-meta.json/, isKeymapCollisionTracker treats that directory as the companion sidecar for a literal KEYMAP.jsonl object, so tracker is false and this new keymap-loading branch is skipped. The walk then tries to encode KEYMAP.jsonl as an object and fails on the directory sidecar, rejecting an otherwise valid collision dump; the sidecar disambiguator needs to require a regular file, not just any entry, before bypassing the tracker path.
Useful? React with 👍 / 👎.
… regular-file sidecar (gemini medium x2 + codex P2) Three findings on PR #928: 1. Gemini medium (encode_s3_collision.go:136): path-traversal guard. rec.Encoded values containing '..', '.', or empty segments would let a hand-crafted keymap escape the bucket sub-root when filepath.Join joins Encoded into the Lstat path. New validateKeymapEncodedPath rejects any non-clean relative path. Pinned by TestValidateKeymapRecord_PathTraversal (6 boundary cases: leading slash, trailing slash, double slash, contains '..' / '.' / middle '..'). 2. Gemini medium (encode_s3_collision.go:143): non-empty original. S3 object keys cannot be empty. Reject the keymap record at load time so the operator sees the problem directly rather than as a downstream object-body assembly error. Pinned by TestValidateKeymapRecord_EmptyOriginal. 3. Codex P2 (encode_s3_objects.go:94): regular-file sidecar. isKeymapCollisionTracker used rootEntryExists to look for a companion sidecar at <bucket>/KEYMAP.jsonl.elastickv-meta.json, but rootEntryExists returns true for any Lstat-able entry including a directory. A user key prefix 'KEYMAP.jsonl.elastickv-meta.json/...' creates a directory at the sidecar path, so the tracker was mis-classified as a user object and keymap loading was skipped. Fix: require the sidecar to be a regular file. Only a real file disambiguates a legitimate user object; any other mode means the tracker classification stands. Pinned by TestS3EncodeTrackerWithDirectorySidecar. Caller audit: - validateKeymapRecord: split into validateKeymapEncodedPath + validateKeymapKindSuffix to keep cyclop under 10. Only callers are loadBucketKeymap (production) + test sites. - rootEntryExists: had only one caller (isKeymapCollisionTracker) which now inlines the Lstat + IsRegular check; the helper was removed. - isKeymapCollisionTracker: same single call site (encodeBucketObjects). Behavior tightened only for the dir-at- sidecar-path edge case; happy path unchanged. go test + golangci-lint clean on ./internal/backup/.
|
@claude review Addresses all 3 review findings on this PR:
Caller audit: |
|
Claude finished @bootjp's task in 3m 4s —— View job PR #928 Review — M4-2b S3 collision-rename reversalTasks:
All three review findings are correctly addressed. Details below. Finding 1: Path traversal guard —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f033e469c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // which is the slash-form S3 key + optional rename | ||
| // suffix). Convert to filepath form before Lstat. | ||
| rel := filepath.Join(bucketDir, filepath.FromSlash(encoded)) | ||
| _, err := root.Lstat(rel) |
There was a problem hiding this comment.
Require keymap targets to be object bodies
When a corrupt KEYMAP entry names an existing directory (for example a prefix created by another object) instead of the renamed object body, this Lstat succeeds and the verifier treats the record as valid. The object walk then recurses into that directory and never calls resolveObjectKeyFromRel for the encoded path, so Encode can finish while silently ignoring the keymap entry; fail closed here by requiring the target to be a regular file/body, not just any existing filesystem entry.
Useful? React with 👍 / 👎.
Summary
Phase 0b M4-2b implementation — S3 collision-rename reversal. The design
doc landed via PR #913 (merged 2026-06-03). This PR carries the
implementation that the doc described, in 3 commits:
c4e87440): keymap loader + per-record invariant validator +orphan detector. New file
encode_s3_collision.goplus 9 unit tests.2de1c94b): integration — dropErrS3EncodeUnsupportedCollision,thread the loaded keymap through
walkObjects→walkObjectEntry→encodeObject. Update the now-stale sentinel reference inencode_snapshot.go:655.92b14f56): 8 end-to-end tests covering the design doc'stest plan table, asserting at the
!s3|obj|head|<original-key>record-byte level via
DecodeLiveEntries.Three kinds reversed
KindKindS3LeafData<key>.elastickv-leaf-data<key>KindMetaCollision*.elastickv-meta.jsonuser key → renamed*.elastickv-meta.jsonkeyKindSHAFallback<sha>__<truncated>Fail-closed invariants (design §"Error contract")
ErrInvalidKeymapRecordEncoded(diverges fromLoadKeymaplast-wins)ErrInvalidKeymapRecord(claude v913 v2)KindS3LeafDataencoded missing.elastickv-leaf-dataErrInvalidKeymapRecordKindMetaCollisionoriginal missing.elastickv-meta.jsonErrInvalidKeymapRecordKind(closed-set forward-compat)ErrInvalidKeymapRecord(claude v913 v2)_orphans/or_incomplete_uploads/(narrow rule)ErrInvalidKeymapRecord(codex P2 v913 v1)ErrInvalidKeymapRecordEvery gate wraps to
ErrS3EncodeInvalidBucketatencodeBucketObjects,which
runAdapterEncodersmarks withErrEncodeAdapterDatafor CLIexit-2 routing.
Read-side safety (codex P2 v913 v3)
loadBucketKeymapopensKEYMAP.jsonlviaroot.Lstat→refuseHardLink→
root.Open. NOTOpenSidecarFile, which is write-side on everyplatform and would truncate the file before reading.
Legitimacy gate preserved (codex P2 v913 v2)
isKeymapCollisionTrackercontinues to distinguish the tracker file froma legitimate user object literally named
KEYMAP.jsonl. Whentracker == false(sidecar present),loadBucketKeymapis not invoked — the filetakes the normal object-body path.
TestS3EncodeKeymapObjectRoundTrip(preserved unchanged from M4-2a) pins this.
5-lens self-review
Orphan detection runs at load time; fail-closed prevents partial
restore.
O(1) lookup per walked body. Matches the Redis
loadKeymapprecedent.decoder's input. Errors fail-closed via existing sentinels routed to
exit-2.
regressions. All gate predicates pinned by ≥1 regression test.
Test plan
go test ./internal/backup/passesgolangci-lint run --config=.golangci.yaml ./internal/backup/0 issues