backup: DynamoDB encoder for tables and items (Phase 0a)#716
backup: DynamoDB encoder for tables and items (Phase 0a)#716bootjp wants to merge 25 commits intofeat/backup-phase0a-sqsfrom
Conversation
Builds on PR #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder. Snapshot prefixes handled: - !ddb|meta|table|<base64url(table)> -> dynamodb/<table>/_schema.json (DynamoTableSchema proto -> DescribeTable-shaped JSON; cluster- internal fields like key_encoding_version and migrating_from_generation stripped from the public projection because they're not user-visible). - !ddb|item|<base64url(table)>|<gen>|<rest> -> per-item JSON files under dynamodb/<table>/items/. Hash-only tables emit items/<pk>.json; composite-key tables emit items/<pk>/<sk>.json. - !ddb|gsi|... ignored (derivable from items + schema; replaying GSI rows on restore would conflict with the destination's own index maintenance). - !ddb|meta|gen|... ignored (operational counter, not user state). Implementation choices: - Lex order ('i' < 'm') means items arrive before the table schema. Encoder buffers per encoded-table-segment and emits at Finalize once the schema is known, parallel to the SQS encoder's strategy. - Table-segment parsing is unambiguous: base64url alphabet contains no '|', so the first '|' after the prefix is the table/gen separator. No heuristic boundary detection needed. - Item filename derivation reads the hash and range key NAMES from the schema, then looks them up in the item's attributes map. A missing required-key attribute on an item is structural error (it could never have been GetItem-able) and surfaces as ErrDDBInvalidItem on Finalize. - B-attribute (binary) primary keys take EncodeBinarySegment so they cannot collide with hex-shaped string keys; matches the design's "binary keys take b64.<base64url>" rule. - All 10 documented attribute kinds (S, N, B, BOOL, NULL, SS, NS, BS, L, M) are translated to their AWS-DynamoDB-JSON shapes. Empty oneof (a malformed proto value) surfaces as NULL=true so the dump remains deserialisable. - attributeValueToPublic is split into scalar / set / composite helpers so the cyclomatic complexity stays under the package cap. - Bundle mode (--dynamodb-bundle-mode jsonl) is left as a stub: WithBundleJSONL(true) makes Finalize return a clear "not implemented in this PR" error so the master pipeline can surface the capability gap. Per-item layout is the documented default. Tests cover hash-only and composite-key round-trips, binary primary key rendering, orphan items without schema, value-magic rejection on schema and item paths, missing-key-attribute rejection at Finalize, GSI rows ignored, all 10 attribute kinds round-trip through JSON, empty-oneof -> NULL, bundle-mode stub, malformed table-segment key rejection.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a DynamoDB encoder designed to export table schemas and items into a structured filesystem layout for logical backups. The implementation handles the conversion of internal protobuf representations into AWS-compatible JSON formats, supporting various DynamoDB attribute types and secondary indexes. Feedback suggests adopting a fail-fast error handling strategy during the finalization phase when flushing tables and refining the serialization of set attributes to ensure consistent JSON output for empty or nil collections.
| var firstErr error | ||
| for _, st := range d.tables { | ||
| if st.schema == nil { | ||
| d.emitWarn("ddb_orphan_items", | ||
| "encoded_table", st.encoded, | ||
| "buffered_items", len(st.items)) | ||
| continue | ||
| } | ||
| if err := d.flushTable(st); err != nil && firstErr == nil { | ||
| firstErr = err | ||
| } | ||
| } | ||
| return firstErr |
There was a problem hiding this comment.
The current error handling in Finalize correctly implements a lenient path for incomplete entries (orphan items) by emitting a warning instead of failing or dropping them. This aligns with the repository rule to avoid silently dropping entries during serialization to ensure data preservation on disk. For actual flush errors, failing fast is appropriate.
| var firstErr error | |
| for _, st := range d.tables { | |
| if st.schema == nil { | |
| d.emitWarn("ddb_orphan_items", | |
| "encoded_table", st.encoded, | |
| "buffered_items", len(st.items)) | |
| continue | |
| } | |
| if err := d.flushTable(st); err != nil && firstErr == nil { | |
| firstErr = err | |
| } | |
| } | |
| return firstErr | |
| for _, st := range d.tables { | |
| if st.schema == nil { | |
| d.emitWarn("ddb_orphan_items", | |
| "encoded_table", st.encoded, | |
| "buffered_items", len(st.items)) | |
| continue | |
| } | |
| if err := d.flushTable(st); err != nil { | |
| return err | |
| } | |
| } | |
| return nil |
References
- Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | ||
| switch v := av.GetValue().(type) { | ||
| case *pb.DynamoAttributeValue_Ss: | ||
| return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)} | ||
| case *pb.DynamoAttributeValue_Ns: | ||
| return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)} | ||
| case *pb.DynamoAttributeValue_Bs: | ||
| return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)} | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The current implementation of setAttributeValueToPublic can produce null in the JSON output for nil set slices. To avoid state inconsistencies during normalization and ensure entries are correctly preserved, handle nil set slices to produce an empty JSON array [] instead of null.
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | |
| switch v := av.GetValue().(type) { | |
| case *pb.DynamoAttributeValue_Ss: | |
| return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Ns: | |
| return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Bs: | |
| return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)} | |
| } | |
| return nil | |
| } | |
| func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any { | |
| switch v := av.GetValue().(type) { | |
| case *pb.DynamoAttributeValue_Ss: | |
| return map[string]any{"SS": append([]string(nil), v.Ss.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Ns: | |
| return map[string]any{"NS": append([]string(nil), v.Ns.GetValues()...)} | |
| case *pb.DynamoAttributeValue_Bs: | |
| return map[string]any{"BS": append([][]byte(nil), v.Bs.GetValues()...)} | |
| } | |
| return nil | |
| } |
References
- Avoid silently dropping entries during serialization or normalization to ensure they are preserved on disk and avoid state inconsistencies.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63bfe8db9b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| enc := rest[:idx] | ||
| if _, err := base64.RawURLEncoding.DecodeString(enc); err != nil { | ||
| return "", errors.Wrap(ErrDDBMalformedKey, err.Error()) | ||
| } | ||
| return enc, nil |
There was a problem hiding this comment.
Preserve item generation when grouping DynamoDB rows
Item keys encode !ddb|item|<table>|<generation>|..., but this parser drops everything after the first separator and the encoder buffers rows only by table name. In real clusters, old-generation rows can coexist temporarily (e.g., delete/recreate before async cleanup finishes), and those stale rows will be emitted under the current schema; this can silently restore deleted data or make Finalize fail when primary-key names changed across generations. The backup path should retain generation and filter rows against the active schema generation (and any explicitly supported migration generation) instead of merging all generations.
Useful? React with 👍 / 👎.
…at/backup-phase0a-dynamodb
Three issues, all valid. #237 Codex P1 -- preserve item generation when grouping rows. Item keys carry !ddb|item|<table>|<generation>|... but the prior parser dropped the generation and Finalize merged all generations under the active schema. In real clusters, in-flight delete/recreate cleanup leaves stale-gen rows visible in the snapshot for a window; those rows would silently restore as items under the new schema (potentially resurrecting deleted data, or failing Finalize when PK names changed across generations). Fix: - parseDDBItemKey now returns (encodedTable, generation). - ddbTableState.items -> itemsByGen map[uint64][]*pb.DynamoItem. - flushTable filters by st.schema.GetGeneration() and emits a ddb_stale_generation_items warning carrying the count and active gen so the operator can correlate the orphan window. #182 Gemini -- fail-fast on flush errors. Finalize previously deferred the first error and continued through remaining tables. Real flush errors (out of disk, bad permissions) should surface immediately, not be misattributed to a later table whose flush also fails. Orphan-table warnings still continue (those are informational, not errors). #442 Gemini -- empty set serializes as [] not null. setAttributeValueToPublic now uses make+append so nil/empty SS/NS/BS become "{\"SS\":[]}" rather than "{\"SS\":null}". Downstream tools that distinguish "present-but-empty" from "missing" no longer see a misleading null. Tests: TestDDB_StaleGenerationItemsExcludedAndWarned, TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull, TestDDB_ParseItemKeyExtractsGeneration. Existing fixtures updated to set Generation=1 on schemas (matching the item key gen).
|
3 issues addressed in Codex P1 #237 — Preserve item generation when grouping rows Gemini MEDIUM #182 — Fail fast on flush errors Gemini MEDIUM #442 — Empty set → Tests: @claude review |
|
Claude finished @bootjp's task in 3m 30s —— View job Review:
|
| Finding | Severity |
|---|---|
| Codex P1: generation filtering | ✅ Correctly fixed |
| Gemini #182: fail-fast flush errors | ✅ Correctly fixed |
Gemini #442: empty set → [] |
✅ Correctly fixed |
MigratingFromGeneration excluded without distinct warning |
Low — conservative but silent |
Non-deterministic _schema.json ordering |
Low — readability / diffability |
| Single-stale-gen test only | Low — coverage gap |
The core correctness fixes are solid. The two "Low" findings would improve operator experience but don't affect backup safety. Happy to address any of them if you want them in this PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fdc9b36b6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "stale_count", stale, | ||
| "hint", "stale-gen rows are excluded from the dump; restore would otherwise emit them under the new schema") | ||
| } | ||
| active := st.itemsByGen[activeGen] |
There was a problem hiding this comment.
Include migration source generation when emitting DynamoDB items
flushTable now emits only st.itemsByGen[activeGen], but DynamoDB tables can be in a live migration state where schema.generation is new and schema.migrating_from_generation still contains readable data; the read path explicitly falls back to that source generation during migration (see readLogicalItemAt in adapter/dynamodb.go). In that state, this filter silently drops unmigrated rows from backups, so restoring the dump loses logically visible items if a snapshot is taken mid-migration. Fresh evidence in this revision is the hard filter at this line with no migrating_from_generation inclusion path.
Useful? React with 👍 / 👎.
…est' into feat/backup-phase0a-dynamodb
… round 2) Codex P1 #227: during a generation migration, schema.Generation is the new gen and schema.MigratingFromGeneration carries the source. The live read path (adapter/dynamodb.go readLogicalItemAt) falls back to the source for items not yet copied. The previous flushTable filtered to only schema.GetGeneration() and silently dropped unmigrated rows from the dump. flushTable now emits items from BOTH generations. Order matters: the source is emitted FIRST, then the active gen, so when both generations carry the same primary key the active-gen content is the one that survives writeFileAtomic's tmp+rename — matching the live code's "new gen wins on conflict" rule. The stale-generation warning was generalised to take an emit-set (totalStaleItemsExcluding); rows in any other generation are counted into the warning so the operator still sees the orphan window. Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted, TestDDB_NewGenerationWinsOverMigrationSourceForSameKey.
|
Codex P1 #227 addressed in
The stale-generation warning was generalised ( Tests: @claude review |
|
Claude finished @bootjp's task in 4m 21s —— View job Claude finished @bootjp's task —— Review:
|
| Finding | Severity |
|---|---|
| Codex P1 #227: include migration-source-gen items | ✅ Correctly fixed |
Emit order matches live readLogicalItemAt preference |
✅ Verified |
parseDDBItemKey correct for ordered encoding |
✅ Verified |
| Same-PK collision resolves to active gen via tmp+rename | ✅ Verified |
| Three-generation scenario (stale + source + active) untested | Low |
Non-deterministic _schema.json output (carried from prior round) |
Low |
Core correctness is solid. The two Low findings are improvements for operator experience/testability, not correctness gaps.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce5b7dac74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rawName, err := base64.RawURLEncoding.DecodeString(encoded) | ||
| if err != nil { |
There was a problem hiding this comment.
Reject empty table segment in DynamoDB meta keys
HandleTableMeta accepts !ddb|meta|table| as valid because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated/malformed key is treated as table name "" instead of returning ErrDDBMalformedKey. In corrupted snapshots this can route schema output into the wrong scope and mask key corruption that Phase 0 is supposed to surface loudly.
Useful? React with 👍 / 👎.
| genEnd := strings.IndexByte(afterTable, '|') | ||
| if genEnd <= 0 { | ||
| return "", 0, errors.Wrapf(ErrDDBMalformedKey, | ||
| "item key missing gen/rest separator: %q", key) | ||
| } |
There was a problem hiding this comment.
Reject item keys with no payload after generation
parseDDBItemKey only checks that a | exists after the generation, but it does not require bytes after that separator, so keys like !ddb|item|<table>|7| are accepted. This lets truncated item keys bypass malformed-key detection and be emitted using only value-side attributes, which can hide snapshot corruption instead of failing fast.
Useful? React with 👍 / 👎.
…est' into feat/backup-phase0a-dynamodb
Two Codex P2 strict-validation follow-ups. #117 -- empty table-meta segment. HandleTableMeta accepted `!ddb|meta|table|` (no encoded segment) because base64.RawURLEncoding.DecodeString("") returns empty bytes without error, so the schema would route under the empty table name. Now rejected with ErrDDBMalformedKey before the JSON decode. TestDDB_RejectsTableMetaKeyWithEmptySegment. #303 -- truncated item key. parseDDBItemKey accepted `!ddb|item|<table>|7|` (gen separator present, no primary-key payload). The gen-end check was "genEnd > 0" which a trailing `|` satisfies. Added a follow-up check that genEnd+1 != len(afterTable) so a payload-less key surfaces as ErrDDBMalformedKey rather than emit under value-side attributes only. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
2 Codex P2 strict-validation follow-ups addressed in #117 — Empty table-meta segment. #303 — Truncated item key. @claude review |
|
Claude finished @bootjp's task in 2m 51s —— View job Review:
|
| Finding | Severity |
|---|---|
| Codex P2 #117: empty table-meta segment rejected | ✅ Correctly fixed |
| Codex P2 #303: empty item-key payload rejected | ✅ Correctly fixed |
| Both guards trace correctly for normal + truncated inputs | ✅ Verified |
tableEnd <= 0 + new genEnd+1 guards cover all truncation points |
✅ Verified |
| Three-generation scenario (stale + source + active) untested | Low — carried from round 2 |
Non-deterministic _schema.json output |
Low — carried from round 2 |
This PR is ready to merge from a correctness standpoint. The two Low items are quality-of-life improvements (testability and backup diff stability) that can land in a follow-up.
…est' into feat/backup-phase0a-dynamodb
…' into feat/backup-phase0a-dynamodb
…est' into feat/backup-phase0a-dynamodb
…' into feat/backup-phase0a-dynamodb
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
💡 Codex Reviewelastickv/internal/backup/sqs.go Lines 358 to 360 in 4ae70be
elastickv/internal/backup/sqs.go Lines 189 to 190 in 4ae70be
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…' into feat/backup-phase0a-dynamodb
…est' into feat/backup-phase0a-dynamodb
…' into feat/backup-phase0a-dynamodb
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
💡 Codex Reviewelastickv/internal/backup/sqs.go Lines 185 to 189 in 52dbd32
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…at/backup-phase0a-dynamodb
…est' into feat/backup-phase0a-dynamodb
…' into feat/backup-phase0a-dynamodb
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
💡 Codex Reviewelastickv/internal/backup/sqs.go Lines 189 to 190 in 9076ef9
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…' into feat/backup-phase0a-dynamodb
…at/backup-phase0a-dynamodb
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d67baf8be5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case *pb.DynamoAttributeValue_N: | ||
| return EncodeSegment([]byte(v.N)), nil | ||
| case *pb.DynamoAttributeValue_B: |
There was a problem hiding this comment.
Canonicalize numeric key segments before naming item files
The filename for numeric primary-key attributes is derived from the raw literal (EncodeSegment([]byte(v.N))), but DynamoDB key equality is numeric (e.g., "1" and "1.0" are the same logical key). In migration mode this breaks the intended “active generation wins” behavior: if source and active rows use equivalent numeric literals with different text forms, they are written to different paths and both survive in the backup, so restore can replay duplicate/stale logical rows. This appears when a table has N hash/range keys and mixed numeric string representations across generations.
Useful? React with 👍 / 👎.
| for name, gsi := range s.GetGlobalSecondaryIndexes() { | ||
| g := publicGSI{ |
There was a problem hiding this comment.
Stabilize schema array order before writing _schema.json
schemaToPublic builds global_secondary_indexes and attribute_definitions by ranging over Go maps, which have non-deterministic iteration order. That means identical snapshots can produce different _schema.json byte output across runs, undermining deterministic backup artifacts and creating noisy diffs even when data is unchanged. Sort keys before appending these arrays.
Useful? React with 👍 / 👎.
…ound 4) Codex round 9 raised two issues on commit d67baf8: 1. P1: ddbKeyAttrToSegment was emitting `EncodeSegment([]byte(v.N))` for numeric primary-key attributes, but DynamoDB N equality is numeric — `"1"` and `"1.0"` (or `"100"` and `"1e2"`, or `"0.5"` and `"5e-1"`) name the same logical item. In migration mode where source and active rows used different decimal text for the same value, both rows survived at distinct paths and restore replayed duplicates. Mirror the live adapter's canonicalNumberString (adapter/dynamodb.go:7651) which uses big.Rat — same canonical form keeps backup filenames in lockstep with the live equality check. 2. P2: schemaToPublic ranged over Go maps for both global_secondary_indexes and attribute_definitions, so identical snapshots produced different `_schema.json` byte output across runs. Sort by name before append. Tests: - TestDDB_CanonicalNumberKeySegment: equivalence pairs ("1"/"1.0", "100"/"1e2", "-0"/"0", "0.5"/"5e-1") collapse to the same key segment. - TestDDB_SchemaJSONIsDeterministic: 32 calls to schemaToPublic on the same schema produce identical attribute_definitions and GSI orders, both matching the documented sort-by-name.
…at/backup-phase0a-dynamodb
|
@claude review Round 4 fixes for Codex round 9 on commit d67baf8 (commit
|
|
Claude finished @bootjp's task in 4m 28s —— View job Review:
|
| Finding | Severity |
|---|---|
Codex P1 (round 9): N key canonicalization via big.Rat |
✅ Correctly fixed |
Codex P2 (round 9): non-deterministic _schema.json output |
✅ Correctly fixed |
canonicalDDBNumber matches live canonicalNumberString byte-for-byte |
✅ Verified |
Rational form "1/2" for non-integers percent-encoded safely by EncodeSegment |
✅ Verified safe |
N value in item body still uses original string (not canonical) |
✅ Correct design |
| Three-generation stale scenario (stale + source + active) untested | Low — carried from round 3 |
| No end-to-end N-key migration-conflict test for P1 fix | Low — unit test present; S-key integration test present |
Core correctness is solid. The two Low findings are additional test coverage improvements; the unit-level proof for N canonicalization is already in TestDDB_CanonicalNumberKeySegment.
…at/backup-phase0a-dynamodb
…' into feat/backup-phase0a-dynamodb
💡 Codex Reviewelastickv/internal/backup/sqs.go Lines 251 to 252 in 81ff799 A truncated key like elastickv/internal/backup/sqs.go Lines 280 to 289 in 81ff799
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…at/backup-phase0a-dynamodb
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34ea5bad26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| return err | ||
| } | ||
| dir := filepath.Join(itemsDir, hashFilename) |
There was a problem hiding this comment.
Reject dot segments before joining composite-key paths
writeDDBItem uses filepath.Join(itemsDir, hashFilename) for composite keys, but hashFilename comes from EncodeSegment, which can legally return "." or ".." for string/number key values. In that case Go path-cleaning escapes or collapses the items/ directory, so an item like hash=".." and range="_schema" is written as <table>/_schema.json and can overwrite the schema file (or other table-level files). This affects any backup containing such key values and can silently corrupt output.
Useful? React with 👍 / 👎.
Codex P1 round 12 (commit 34ea5ba): `writeDDBItem` joined `itemsDir` with the encoded hash/range key filename. EncodeSegment treats `.` and `..` as RFC3986-unreserved and preserves them, so a DynamoDB S/N key value of "." or ".." flows through as a single-character filename. filepath.Join then either collapses `<itemsDir>/.` back to itemsDir or resolves `<itemsDir>/..` to the parent — letting an item like hash=".." range="_schema" overwrite the table-level _schema.json. `refuseDotSegmentFilename` rejects sole-dot encoded segments for both hash and range keys; surfaces ErrDDBInvalidItem so existing callers branching on errors.Is keep their semantics. Test: TestDDB_DotSegmentKeyRejected covers all four shapes (hash="." / hash=".." / range="." / range="..").
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@claude review Round 5 fix for Codex P1 round 12 on commit 34ea5ba (commit
Added
|
|
Claude finished @bootjp's task in 3m 37s —— View job Review:
|
| Case | Schema type | hash | range | guard fired |
|---|---|---|---|---|
hash_dot |
hash-only | "." |
— | hash guard ✓ |
hash_dotdot |
hash-only | ".." |
— | hash guard ✓ |
range_dot |
composite | "ok" |
"." |
range guard ✓ |
range_dotdot |
composite | "ok" |
".." |
range guard ✓ |
All four assert errors.Is(err, ErrDDBInvalidItem). ✓
Carried from prior rounds — still unaddressed (both Low):
-
Three-generation scenario untested.
totalStaleItemsExcludingcorrectly handles multiple non-included gens, but no test exercisesstale_countwith more than one stale generation (e.g., schema gen=7, migrationSource=6, stale debris at gen=5). -
No end-to-end N-key migration conflict test.
TestDDB_CanonicalNumberKeySegmentunit-tests the canonicalization itself, andTestDDB_NewGenerationWinsOverMigrationSourceForSameKeyuses S-type keys. A Finalize-level test with N-type hash keys in both generations would directly reproduce the original round 9 bug.
Summary
| Finding | Severity |
|---|---|
Codex P1 round 12: dot-segment item key rejected via refuseDotSegmentFilename |
✅ Correctly fixed |
| B-type keys cannot produce dot-segments (guard can't fire, not needed) | ✅ Verified safe |
N-type "." / ".." also caught via canonicalDDBNumber fallthrough + guard |
✅ Verified |
Guard placement: fires before any os.MkdirAll or writeFileAtomic |
✅ Verified correct |
Table name segment in flushTable not guarded for dot-segments |
Low — corrupted-snapshot scenario only; real DynamoDB naming rules prevent it |
| Three-generation scenario (stale + source + active) untested | Low — carried from round 3 |
| No Finalize-level N-key migration conflict test | Low — carried from round 4 |
The core fix is solid. The single new Low finding (table name segment) is a defense-in-depth improvement for malformed snapshots; it does not affect backup safety with real DynamoDB data.
Summary
Stacked on top of #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder.
Snapshot prefixes handled:
!ddb|meta|table|<base64url(table)>→dynamodb/<table>/_schema.json(DescribeTable-shaped JSON)!ddb|item|<base64url(table)>|<gen>|<rest>→ per-item JSON underdynamodb/<table>/items/items/<pk>.jsonitems/<pk>/<sk>.jsonitems/b64.<base64url>[/...].json(no collision with hex-shaped string keys)!ddb|gsi|*ignored (derivable; replaying would conflict with destination index maintenance)!ddb|meta|gen|*ignored (operational counter)Why buffer + emit at Finalize
Lex order is
'i' < 'm', so items arrive before the schema. Encoder buffers per encoded-table-segment and emits once the schema is known. Item filename derivation reads the hash/range key NAMES from the schema, then looks them up in the item's attributes map.All 10 attribute kinds covered
S / N / B / BOOL / NULL / SS / NS / BS / L / M translated to their AWS-DynamoDB-JSON shapes. Empty oneof (malformed proto value) surfaces as
NULL=trueso the dump remains deserialisable.attributeValueToPublicis split into scalar/set/composite helpers to keep cyclomatic complexity under the package cap.Bundle mode is stubbed
WithBundleJSONL(true)makes Finalize return a clear "not implemented in this PR" error. Per-item layout (the design's documented default) is what this PR delivers.Test plan
go test -race ./internal/backup/...— pass.golangci-lint run ./internal/backup/...— clean.Stacking
Base:
feat/backup-phase0a-sqs(PR #714).