Skip to content

backup: #35 - validate name/dir consistency in S3 + DDB reverse encoders#912

Merged
bootjp merged 2 commits into
mainfrom
backup/m4-m5-namedir-consistency
Jun 3, 2026
Merged

backup: #35 - validate name/dir consistency in S3 + DDB reverse encoders#912
bootjp merged 2 commits into
mainfrom
backup/m4-m5-namedir-consistency

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 3, 2026

Summary

  • S3 (encodeBucket) and DDB (encodeTable) now check that the on-disk subdirectory matches EncodeSegment([]byte(name)) from _bucket.json / _schema.json before emitting any records.
  • Closes task use raft.TCPTransport #35 (the M4 + M5 follow-up the original session-summary tracked).

Background

The reverse encoders use two identifiers for the same logical entity:

  • On-disk subdir: s3/<bucketDir>/, dynamodb/<tableDir>/ — decoder spells as EncodeSegment([]byte(name)).
  • Embedded name in _bucket.json / _schema.json — encoder uses for MVCC keys (s3keys.BucketMetaKey(pub.Name), base64'd tableName).

Until now, both sides were trusted independently. A tampered or hand-rebuilt dump that renamed the directory without touching the JSON (or vice versa) would silently emit records keyed by the JSON name while pulling object / item bytes from a mismatched filesystem path — a name/dir consistency violation that splits a single restore artifact across two identities.

Fix

After the existing empty-name guard, both encoders verify EncodeSegment([]byte(name)) == <dir>. Mismatch fails closed with the existing per-adapter sentinel (ErrS3EncodeInvalidBucket / ErrDDBEncodeInvalidSchema), and the error message names both the JSON value and the encoded form so the operator can correct the dump.

Caller audit (CLAUDE.md semantic-change rule)

  • encodeBucket (S3): success path gains one new fail-closed check; sole production caller is S3RecordEncoder.Encode.
  • encodeTable (DDB): success path gains one new fail-closed check; sole production caller is DynamoDBEncoder.Encode.
  • Both checks reuse existing per-adapter sentinels, so runAdapterEncoders' errors.Mark(ErrEncodeAdapterData) wrap still routes failures through the CLI's exit-2 classification — no new sentinel needed in classifyEncodeError.

Self-review of code changes (5 lenses)

  1. Data loss — Stricter validation; no commit/apply path touched. No data-loss surface.
  2. Concurrency / distributed failures — Encoder is single-process / single-goroutine on the input tree. No new concurrency surface.
  3. Performance — One extra EncodeSegment call and string compare per bucket / per table. Negligible.
  4. Data consistency — This IS the data-consistency improvement. The check enforces the round-trip invariant <dir> == EncodeSegment(name) for every published record.
  5. Test coverage — Two new tests (one per adapter) pin the rejection. Existing positive tests use EncodeSegment for the dir spelling, so they remain green; existing rejection tests trip earlier guards, also green.

Test plan

  • go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/...
  • golangci-lint --config=.golangci.yaml run
  • Bot review cycle (claude / codex / CodeRabbit)

The S3 and DDB reverse encoders use two different identifiers for
the same logical entity:

  - The on-disk subdirectory name (`s3/<bucketDir>/`,
    `dynamodb/<tableDir>/`), which the decoder spells as
    `EncodeSegment([]byte(name))`.
  - The name embedded inside `_bucket.json` / `_schema.json`, which
    the encoder uses to build MVCC keys
    (`s3keys.BucketMetaKey(pub.Name)`,
    `base64.RawURLEncoding.EncodeToString([]byte(tableName))`).

Until now, the encoder trusted both sides independently. A tampered
or hand-rebuilt dump where the operator renamed the directory
without touching the JSON (or vice versa) would silently emit
records keyed by the JSON name while pulling object/item bytes
from a mismatched filesystem path — a name/dir consistency
violation that splits a single restore artifact across two
different identities.

## Fix

`encodeBucket` (S3) and `encodeTable` (DDB) now check, immediately
after the empty-name / empty-key guards, that
`EncodeSegment([]byte(name)) == <dir>`. Mismatch fails closed with
the existing per-adapter sentinel (`ErrS3EncodeInvalidBucket` /
`ErrDDBEncodeInvalidSchema`), with an error message that names
both the JSON value and the encoded form so an operator can
correct the dump.

## Pinned by

- `TestS3EncodeRejectsNameDirMismatch`: dir `wrong-dir` +
  `_bucket.json` with `name=real` → `ErrS3EncodeInvalidBucket`.
- `TestDDBEncodeRejectsNameDirMismatch`: dir `wrong-dir` +
  `_schema.json` with `table_name=real` → `ErrDDBEncodeInvalidSchema`.

All existing positive tests use `EncodeSegment([]byte(name))` for
the dir name, so they continue to pass. Existing rejection tests
trip earlier guards (empty name, format_version, schema-shape)
before the new consistency check fires, so they also remain green.

## Caller audit per CLAUDE.md semantic-change rule

- `encodeBucket` (S3): success path gains one new fail-closed
  check between the empty-name guard and the meta-record emission.
  Sole production caller is `S3RecordEncoder.Encode`.
- `encodeTable` (DDB): success path gains one new fail-closed
  check between the empty-name guard and the schema-marshal step.
  Sole production caller is `DynamoDBEncoder.Encode`.

Both new checks use the existing per-adapter sentinels, so the
encoder's existing `errors.Mark(ErrEncodeAdapterData)` wrap at
`runAdapterEncoders` still routes the failure through the CLI's
exit-2 (data-correctness) classification — no new sentinel needed
in `classifyEncodeError`.

Tests + lint green.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 20 minutes and 43 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 429563b6-1c4f-419f-913d-6862616c9c41

📥 Commits

Reviewing files that changed from the base of the PR and between 6ffdbb2 and 12d7dfd.

📒 Files selected for processing (4)
  • internal/backup/encode_dynamodb.go
  • internal/backup/encode_dynamodb_test.go
  • internal/backup/encode_s3.go
  • internal/backup/encode_s3_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backup/m4-m5-namedir-consistency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces name/directory consistency checks for both DynamoDB and S3 encoders to ensure that the on-disk directory name matches the encoded table or bucket name, preventing silent consistency violations in tampered or hand-rebuilt dumps. Unit tests have been added to verify that the encoders fail closed on mismatches. All review comments have been filtered out, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 3, 2026

@claude review

This is the implementation PR for follow-up task #35 (name/dir consistency check). Adds a fail-closed validation in both encodeBucket (encode_s3.go) and encodeTable (encode_dynamodb.go) that verifies EncodeSegment(pub.Name) == bucketDir (and the DDB equivalent), so a hand-edited dump where the on-disk dir name doesn't match what the encoded name would produce fails closed via ErrS3EncodeInvalidBucket / ErrDDBEncodeInvalidSchema.

Tests pinned:

  • TestS3EncodeRejectsNameDirMismatch
  • TestDDBEncodeRejectsNameDirMismatch

Gemini posted "no feedback" earlier; no other bot reviews yet.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 3, 2026

Claude finished @bootjp's task in 2m 58s —— View job


Code Review

  • Gather context and understand the request
  • Read changed files
  • Review encode_s3.go changes
  • Review encode_dynamodb.go changes
  • Review test files
  • Post findings

The implementation is correct and the security intent is sound. One pre-existing test is silently broken by the check ordering, and there's a minor redundancy in the error messages.

Finding 1 — TestDDBEncodeRejectsEmptyHashKey no longer exercises its guard (test correctness bug)

encodeTable in encode_dynamodb.go now orders checks as:

  1. empty tableName (line 119)
  2. name/dir consistency (line 130) ← new
  3. empty HashKey (line 138) ← was the target of the test

TestDDBEncodeRejectsEmptyHashKey uses dir="tbl" with table_name:"x" in the JSON. Since EncodeSegment("x") = "x" ≠ "tbl", the new name/dir guard fires first and returns ErrDDBEncodeInvalidSchema — same sentinel, so the test still passes, but it is now testing the name/dir check, not the empty-hash-key check it purports to pin.

The other DDB rejection tests with mismatched dir/table_name pairs are unaffected because they fail earlier — inside readSchema (format-version or rejectDuplicateSchemaNames) before encodeTable ever reaches either check.

Fix — pick one of these:

// Option A: keep table_name="x", correct the dir to match
writeDDBSchema(t, in, EncodeSegment([]byte("x")),
    []byte(`{"format_version":1,"table_name":"x","primary_key":{"hash_key":{"name":""}}}`))

// Option B: keep dir="tbl", align table_name to match
writeDDBSchema(t, in, "tbl",
    []byte(`{"format_version":1,"table_name":"tbl","primary_key":{"hash_key":{"name":""}}}`))

The S3 test suite is clean — all S3 rejection tests either use writeS3Bucket (which sets the dir to EncodeSegment(name)) or fail inside readBucketMeta before the name/dir check is reached.

Fix this →

Finding 2 — Error message repeats bucketDir/tableDir twice (cosmetic nit)

encode_s3.go:139–141 and encode_dynamodb.go:131–133 both produce messages of the form:

wrong-dir/_bucket.json: bucket name "real" encodes to dir "real", not "wrong-dir"

The first %s and the trailing not %q both expand to the same variable (bucketDir / tableDir). The message is understandable, but "encodes to want, not bucketDir" where bucketDir is already the path prefix is redundant. An alternative:

wrong-dir/_bucket.json: bucket name "real" encodes to "real", got dir "wrong-dir"

This is cosmetic-only — not a correctness issue.

Implementation assessment

The core logic is correct:

  • Placement after the empty-name guard and before any record emission is the right fail-fast order.
  • Reusing the existing per-adapter sentinels (ErrS3EncodeInvalidBucket / ErrDDBEncodeInvalidSchema) means classifyEncodeError's errors.Mark(ErrEncodeAdapterData) wrapping continues to route failures through exit-2 without any new sentinel needed — this was explicitly audited in the PR description and the code bears it out.
  • The S3 tests are comprehensive and correctly structured.

One actionable fix needed: update TestDDBEncodeRejectsEmptyHashKey to use a matching dir/table_name pair so it exercises the guard it claims to pin.

Claude's #912 review caught that this test no longer exercises its
target guard after the name/dir consistency check landed:

- Test used dir="tbl" with table_name="x"
- New name/dir check (line 130 of encodeTable) fires first since
  EncodeSegment("x")="x" != "tbl", returning ErrDDBEncodeInvalidSchema
- Empty-hash-key guard at line 138 is never reached
- Same sentinel means the test silently passed while pinning the
  WRONG behavior (a name/dir mismatch test, not the empty-hash-key
  test it documents)

Fix: align dir to EncodeSegment("x") so the name/dir check passes
and the empty-hash-key guard at line 138 is actually exercised.
Added a comment explaining the fixture's gate ordering so a future
guard insertion can't silently break this test again.

No production code change — encoder gates are unchanged; only the
test fixture is realigned.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 3, 2026

@claude review

Fixes the claude review finding on the prior revision:

Finding 1 (TestDDBEncodeRejectsEmptyHashKey silent regression): The test used dir="tbl" with table_name="x". After the new name/dir consistency check at encodeTable L130 (EncodeSegment("x")="x" != "tbl"), the name/dir guard fired first with the same ErrDDBEncodeInvalidSchema sentinel, so the test silently exercised the wrong guard. Now uses dir=EncodeSegment("x") so the gate ordering reaches the empty-hash-key check at L138. Added a comment documenting the gate ordering so a future inserted guard can't silently break this test again.

Finding 2 (cosmetic redundancy): Left as-is per claude's note that it's cosmetic-only.

No production code change — encoder gates are unchanged; only the test fixture is realigned.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 3, 2026

Claude finished @bootjp's task in 1m 38s —— View job


Code Review

  • Gather context and understand the request
  • Read changed files
  • Verify fix for TestDDBEncodeRejectsEmptyHashKey
  • Check remaining dir="tbl" tests are unaffected
  • Confirm S3 tests are clean

Finding 1 (TestDDBEncodeRejectsEmptyHashKey) — Fixed correctly.

The revised fixture at encode_dynamodb_test.go:248 now uses writeDDBSchema(t, in, EncodeSegment([]byte("x")), ...) with table_name:"x". Since EncodeSegment("x") == "x", the name/dir consistency check at encodeTable:130 passes, and the empty-hash-key guard at line 138 fires as intended. The gate-ordering comment at lines 240–244 is a solid precaution.

Remaining dir="tbl" / table_name="x" tests — all safe.

All four tests that still use a mismatched dir="tbl" with table_name="x" trip their guards inside readSchema before encodeTable ever reaches the name/dir check:

  • TestDDBEncodeRejectsUnknownSchemaFormatVersion — fails at readSchema:189 (format_version=99 ≠ 1).
  • TestDDBEncodeRejectsNonRegularSchema — fails at the linfo.Mode().IsRegular() check.
  • TestDDBEncodeRejectsDuplicateAttributeDefinition — fails at rejectDuplicateSchemaNames (duplicate "id").
  • TestDDBEncodeRejectsDuplicateGSIName — fails at rejectDuplicateSchemaNames (duplicate "by-region").

Each test is exercising exactly the guard it names.

S3 tests — unaffected and clean. writeS3Bucket always spells the dir as EncodeSegment(name), so all S3 positive tests pass the name/dir check automatically. S3 rejection tests trip their guards before reaching the check or use an explicitly mismatched dir (the new TestS3EncodeRejectsNameDirMismatch test).


The fix is minimal, correct, and addresses the regression exactly as described. No further issues found. LGTM.

@bootjp bootjp merged commit 1d086f9 into main Jun 3, 2026
9 checks passed
@bootjp bootjp deleted the backup/m4-m5-namedir-consistency branch June 3, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant