Skip to content

backup: DynamoDB encoder for tables and items (Phase 0a)#716

Open
bootjp wants to merge 25 commits intofeat/backup-phase0a-sqsfrom
feat/backup-phase0a-dynamodb
Open

backup: DynamoDB encoder for tables and items (Phase 0a)#716
bootjp wants to merge 25 commits intofeat/backup-phase0a-sqsfrom
feat/backup-phase0a-dynamodb

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 30, 2026

Summary

Stacked on top of #714. Adds the DynamoDB encoder for the Phase 0 logical-backup decoder.

Snapshot prefixes handled:

  • !ddb|meta|table|<base64url(table)>dynamodb/<table>/_schema.json (DescribeTable-shaped JSON)
  • !ddb|item|<base64url(table)>|<gen>|<rest> → per-item JSON under dynamodb/<table>/items/
    • Hash-only: items/<pk>.json
    • Composite: items/<pk>/<sk>.json
    • Binary key: items/b64.<base64url>[/...].json (no collision with hex-shaped string keys)
  • !ddb|gsi|* ignored (derivable; replaying would conflict with destination index maintenance)
  • !ddb|meta|gen|* ignored (operational counter)

Why buffer + emit at Finalize

Lex order is 'i' < 'm', so items arrive before the schema. Encoder buffers per encoded-table-segment and emits once the schema is known. Item filename derivation reads the hash/range key NAMES from the schema, then looks them up in the item's attributes map.

All 10 attribute kinds covered

S / N / B / BOOL / NULL / SS / NS / BS / L / M translated to their AWS-DynamoDB-JSON shapes. Empty oneof (malformed proto value) surfaces as NULL=true so the dump remains deserialisable. attributeValueToPublic is split into scalar/set/composite helpers to keep cyclomatic complexity under the package cap.

Bundle mode is stubbed

WithBundleJSONL(true) makes Finalize return a clear "not implemented in this PR" error. Per-item layout (the design's documented default) is what this PR delivers.

Test plan

  • go test -race ./internal/backup/... — pass.
  • golangci-lint run ./internal/backup/... — clean.
  • 11 tests covering hash-only / composite round-trip, binary primary key, orphan items, magic-prefix rejection on schema and item paths, missing-key-attribute rejection, GSI noop, all 10 attribute kinds round-trip, empty-oneof behavior, bundle-mode stub, malformed-key rejection.

Stacking

Base: feat/backup-phase0a-sqs (PR #714).

Builds on PR #714. Adds the DynamoDB encoder for the Phase 0
logical-backup decoder.

Snapshot prefixes handled:
- !ddb|meta|table|<base64url(table)> -> dynamodb/<table>/_schema.json
  (DynamoTableSchema proto -> DescribeTable-shaped JSON; cluster-
  internal fields like key_encoding_version and migrating_from_generation
  stripped from the public projection because they're not user-visible).
- !ddb|item|<base64url(table)>|<gen>|<rest> -> per-item JSON files
  under dynamodb/<table>/items/. Hash-only tables emit
  items/<pk>.json; composite-key tables emit items/<pk>/<sk>.json.
- !ddb|gsi|... ignored (derivable from items + schema; replaying GSI
  rows on restore would conflict with the destination's own index
  maintenance).
- !ddb|meta|gen|... ignored (operational counter, not user state).

Implementation choices:
- Lex order ('i' < 'm') means items arrive before the table schema.
  Encoder buffers per encoded-table-segment and emits at Finalize
  once the schema is known, parallel to the SQS encoder's strategy.
- Table-segment parsing is unambiguous: base64url alphabet contains
  no '|', so the first '|' after the prefix is the table/gen
  separator. No heuristic boundary detection needed.
- Item filename derivation reads the hash and range key NAMES from
  the schema, then looks them up in the item's attributes map. A
  missing required-key attribute on an item is structural error
  (it could never have been GetItem-able) and surfaces as
  ErrDDBInvalidItem on Finalize.
- B-attribute (binary) primary keys take EncodeBinarySegment so they
  cannot collide with hex-shaped string keys; matches the design's
  "binary keys take b64.<base64url>" rule.
- All 10 documented attribute kinds (S, N, B, BOOL, NULL, SS, NS,
  BS, L, M) are translated to their AWS-DynamoDB-JSON shapes.
  Empty oneof (a malformed proto value) surfaces as NULL=true so
  the dump remains deserialisable.
- attributeValueToPublic is split into scalar / set / composite
  helpers so the cyclomatic complexity stays under the package cap.
- Bundle mode (--dynamodb-bundle-mode jsonl) is left as a stub:
  WithBundleJSONL(true) makes Finalize return a clear "not
  implemented in this PR" error so the master pipeline can
  surface the capability gap. Per-item layout is the documented
  default.

Tests cover hash-only and composite-key round-trips, binary primary
key rendering, orphan items without schema, value-magic rejection
on schema and item paths, missing-key-attribute rejection at
Finalize, GSI rows ignored, all 10 attribute kinds round-trip
through JSON, empty-oneof -> NULL, bundle-mode stub, malformed
table-segment key rejection.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 85e99c10-1942-4b26-877b-5b0086dbc1b2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-dynamodb

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a DynamoDB encoder designed to export table schemas and items into a structured filesystem layout for logical backups. The implementation handles the conversion of internal protobuf representations into AWS-compatible JSON formats, supporting various DynamoDB attribute types and secondary indexes. Feedback suggests adopting a fail-fast error handling strategy during the finalization phase when flushing tables and refining the serialization of set attributes to ensure consistent JSON output for empty or nil collections.

Comment thread internal/backup/dynamodb.go Outdated
Comment on lines +170 to +182
var firstErr error
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil && firstErr == nil {
firstErr = err
}
}
return firstErr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current error handling in Finalize correctly implements a lenient path for incomplete entries (orphan items) by emitting a warning instead of failing or dropping them. This aligns with the repository rule to avoid silently dropping entries during serialization to ensure data preservation on disk. For actual flush errors, failing fast is appropriate.

Suggested change
var firstErr error
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil && firstErr == nil {
firstErr = err
}
}
return firstErr
for _, st := range d.tables {
if st.schema == nil {
d.emitWarn("ddb_orphan_items",
"encoded_table", st.encoded,
"buffered_items", len(st.items))
continue
}
if err := d.flushTable(st); err != nil {
return err
}
}
return nil
References
  1. Avoid silently dropping entries during serialization or normalization. If placeholder or incomplete entries are valid in some contexts, provide a specific, more lenient handling path for them to ensure they are preserved on disk and avoid state inconsistencies.

Comment on lines +432 to +442
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)}
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of setAttributeValueToPublic can produce null in the JSON output for nil set slices. To avoid state inconsistencies during normalization and ensure entries are correctly preserved, handle nil set slices to produce an empty JSON array [] instead of null.

Suggested change
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string{}, v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string{}, v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte{}, v.Bs.GetValues()...)}
}
return nil
}
func setAttributeValueToPublic(av *pb.DynamoAttributeValue) map[string]any {
switch v := av.GetValue().(type) {
case *pb.DynamoAttributeValue_Ss:
return map[string]any{"SS": append([]string(nil), v.Ss.GetValues()...)}
case *pb.DynamoAttributeValue_Ns:
return map[string]any{"NS": append([]string(nil), v.Ns.GetValues()...)}
case *pb.DynamoAttributeValue_Bs:
return map[string]any{"BS": append([][]byte(nil), v.Bs.GetValues()...)}
}
return nil
}
References
  1. Avoid silently dropping entries during serialization or normalization to ensure they are preserved on disk and avoid state inconsistencies.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63bfe8db9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/dynamodb.go Outdated
Comment on lines +233 to +237
enc := rest[:idx]
if _, err := base64.RawURLEncoding.DecodeString(enc); err != nil {
return "", errors.Wrap(ErrDDBMalformedKey, err.Error())
}
return enc, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve item generation when grouping DynamoDB rows

Item keys encode !ddb|item|<table>|<generation>|..., but this parser drops everything after the first separator and the encoder buffers rows only by table name. In real clusters, old-generation rows can coexist temporarily (e.g., delete/recreate before async cleanup finishes), and those stale rows will be emitted under the current schema; this can silently restore deleted data or make Finalize fail when primary-key names changed across generations. The backup path should retain generation and filter rows against the active schema generation (and any explicitly supported migration generation) instead of merging all generations.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 19:33
Three issues, all valid.

#237 Codex P1 -- preserve item generation when grouping rows.
Item keys carry !ddb|item|<table>|<generation>|... but the prior
parser dropped the generation and Finalize merged all generations
under the active schema. In real clusters, in-flight delete/recreate
cleanup leaves stale-gen rows visible in the snapshot for a window;
those rows would silently restore as items under the new schema
(potentially resurrecting deleted data, or failing Finalize when
PK names changed across generations).

Fix:
- parseDDBItemKey now returns (encodedTable, generation).
- ddbTableState.items -> itemsByGen map[uint64][]*pb.DynamoItem.
- flushTable filters by st.schema.GetGeneration() and emits a
  ddb_stale_generation_items warning carrying the count and active
  gen so the operator can correlate the orphan window.

#182 Gemini -- fail-fast on flush errors.
Finalize previously deferred the first error and continued through
remaining tables. Real flush errors (out of disk, bad permissions)
should surface immediately, not be misattributed to a later table
whose flush also fails. Orphan-table warnings still continue (those
are informational, not errors).

#442 Gemini -- empty set serializes as [] not null.
setAttributeValueToPublic now uses make+append so nil/empty SS/NS/BS
become "{\"SS\":[]}" rather than "{\"SS\":null}". Downstream tools
that distinguish "present-but-empty" from "missing" no longer see
a misleading null.

Tests: TestDDB_StaleGenerationItemsExcludedAndWarned,
TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull,
TestDDB_ParseItemKeyExtractsGeneration. Existing fixtures updated
to set Generation=1 on schemas (matching the item key gen).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

3 issues addressed in fdc9b36b:

Codex P1 #237 — Preserve item generation when grouping rows
The most important one. parseDDBItemKey now returns (encodedTable, generation); ddbTableState.itemsitemsByGen map[uint64][]*pb.DynamoItem. flushTable filters by st.schema.GetGeneration() and emits a ddb_stale_generation_items warning for stale rows. Stale-gen rows from in-flight delete/recreate cleanup no longer silently restore under the new schema.

Gemini MEDIUM #182 — Fail fast on flush errors
Finalize previously deferred the first error. Now real flush errors (no-disk, bad permissions) surface immediately, not misattributed to a later table. Orphan-table warnings still continue (informational, not errors).

Gemini MEDIUM #442 — Empty set → [] not null
setAttributeValueToPublic uses make + append so nil/empty SS/NS/BS render as {"SS":[]} rather than {"SS":null}. Downstream tools that distinguish "present-but-empty" from "missing" no longer see a misleading null.

Tests: TestDDB_StaleGenerationItemsExcludedAndWarned, TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull, TestDDB_ParseItemKeyExtractsGeneration.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 30s —— View job


Review: fdc9b36 — backup: address review on DynamoDB encoder (PR #716)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

All three review comments are correctly addressed. Details per lens below.


Lens 1 — Data loss

All file writes go through writeFileAtomic (tmp+rename), so no partial-file risk. os.MkdirAll errors propagate correctly. The active := st.itemsByGen[activeGen] map lookup returns nil for a table with no items at activeGen, so range nil is a no-op — a schema-only table emits _schema.json with zero item files, which is correct.

No data loss issues.


Lens 2 — Concurrency / distributed failures

DDBEncoder has no locking. The Handle-then-Finalize lifecycle is documented as sequential, which is the right contract for an offline backup pipeline. No issues.


Lens 3 — Performance

totalItemsAcrossGens and totalStaleItems are O(items) and only called at Finalize time. ensureItemsByGen (line 73) is called on every HandleItem invocation, returning the existing map unchanged after the first call. This is a redundant nil-check on the hot path for high-item-count tables — an inline if st.itemsByGen == nil { st.itemsByGen = make(...) } in HandleItem is slightly cleaner, but this is a nit with no practical impact at the item counts a backup pipeline would see.

No performance issues.


Lens 4 — Data consistency

Codex P1 fix is correct. parseDDBItemKey extracts (encodedTable, generation) unambiguously — base64url doesn't contain |, so the two |-splits after the prefix are reliable separators. flushTable correctly restricts to st.itemsByGen[activeGen] and warns on any gen != activeGen.

One unaddressed edge case worth documenting: MigratingFromGeneration.

DynamoTableSchema has a migrating_from_generation field (proto field 6). During a live key-encoding migration — not a delete/recreate — items may legitimately coexist at migration_from_generation while the migration converges. A backup taken mid-migration would count those items as stale and emit a ddb_stale_generation_items warning but exclude them from the dump. This is conservative (no resurrection risk), but an operator restoring from such a backup would silently lose the in-flight items.

Phase 0a is an offline snapshot decoder and the risk is low (migrations are presumably rare and the operator can re-snapshot after convergence). However, the current comment in flushTable only describes the delete/recreate scenario. It would be worth either:

  • Adding a note that items at migratingFromGeneration are also excluded and will appear in the warning count, or
  • Emitting a separate ddb_migration_in_progress warning when schema.GetMigratingFromGeneration() != 0 so the operator knows to re-snapshot.

Fix this →

Non-deterministic _schema.json output. schemaToPublic iterates s.GetAttributeDefinitions() and s.GetGlobalSecondaryIndexes() (both Go maps) to build the attribute_definitions and global_secondary_indexes slices. The resulting JSON arrays have non-deterministic element order across runs. This doesn't affect correctness (JSON arrays of objects are compared by content, not order), but it makes diff-based backup comparison noisy. A sort.Slice by Name would make output stable and reproducible with zero cost.

Fix this →


Lens 5 — Test coverage

Three new tests look good:

  • TestDDB_StaleGenerationItemsExcludedAndWarned — covers the primary Codex fix path, confirms stale item not written and warning fires.
  • TestDDB_EmptyStringSetSerializesAsEmptyArrayNotNull — directly exercises the Gemini Remove .golangci.bck.yaml, update renovate.json #442 fix; uses json.Marshal to verify wire output.
  • TestDDB_ParseItemKeyExtractsGeneration — unit tests the new return value of parseDDBItemKey.

One gap: TestDDB_StaleGenerationItemsExcludedAndWarned has one stale generation (gen=4) and one active (gen=5). A follow-up test with multiple stale generations (e.g., gen=3 and gen=4 both present, active gen=5) would confirm totalStaleItems aggregates across all non-active gens and that stale_count in the warning reflects the combined count.


Summary

Finding Severity
Codex P1: generation filtering ✅ Correctly fixed
Gemini #182: fail-fast flush errors ✅ Correctly fixed
Gemini #442: empty set → [] ✅ Correctly fixed
MigratingFromGeneration excluded without distinct warning Low — conservative but silent
Non-deterministic _schema.json ordering Low — readability / diffability
Single-stale-gen test only Low — coverage gap

The core correctness fixes are solid. The two "Low" findings would improve operator experience but don't affect backup safety. Happy to address any of them if you want them in this PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdc9b36b6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/dynamodb.go Outdated
"stale_count", stale,
"hint", "stale-gen rows are excluded from the dump; restore would otherwise emit them under the new schema")
}
active := st.itemsByGen[activeGen]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include migration source generation when emitting DynamoDB items

flushTable now emits only st.itemsByGen[activeGen], but DynamoDB tables can be in a live migration state where schema.generation is new and schema.migrating_from_generation still contains readable data; the read path explicitly falls back to that source generation during migration (see readLogicalItemAt in adapter/dynamodb.go). In that state, this filter silently drops unmigrated rows from backups, so restoring the dump loses logically visible items if a snapshot is taken mid-migration. Fresh evidence in this revision is the hard filter at this line with no migrating_from_generation inclusion path.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 19:59
… round 2)

Codex P1 #227: during a generation migration, schema.Generation
is the new gen and schema.MigratingFromGeneration carries the
source. The live read path (adapter/dynamodb.go readLogicalItemAt)
falls back to the source for items not yet copied. The previous
flushTable filtered to only schema.GetGeneration() and silently
dropped unmigrated rows from the dump.

flushTable now emits items from BOTH generations. Order matters:
the source is emitted FIRST, then the active gen, so when both
generations carry the same primary key the active-gen content is
the one that survives writeFileAtomic's tmp+rename — matching the
live code's "new gen wins on conflict" rule.

The stale-generation warning was generalised to take an emit-set
(totalStaleItemsExcluding); rows in any other generation are
counted into the warning so the operator still sees the orphan
window.

Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted,
TestDDB_NewGenerationWinsOverMigrationSourceForSameKey.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

Codex P1 #227 addressed in ce5b7dac. During a generation migration, schema.Generation is the new gen and schema.MigratingFromGeneration carries the source. The live read path (adapter/dynamodb.go readLogicalItemAt) falls back to the source for items not yet copied; my previous filter to activeGen only would silently drop unmigrated rows.

flushTable now emits from both generations. The source is emitted first, then the active gen, so when both carry the same primary key the active-gen content is the one that survives writeFileAtomic's tmp+rename — matching the live code's "new gen wins on conflict" rule.

The stale-generation warning was generalised (totalStaleItemsExcluding) so rows in any OTHER generation are still counted and surfaced.

Tests: TestDDB_MigrationSourceGenerationItemsAreEmitted (both gens emitted), TestDDB_NewGenerationWinsOverMigrationSourceForSameKey (collision resolves to new-gen content).

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 21s —— View job


Claude finished @bootjp's task ——


Review: ce5b7da — backup: emit migrating-from-generation items in DynamoDB dump (PR #716, round 2)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Read adapter/dynamodb.go (readLogicalItemAt) for alignment check
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Codex P1 #227 is correctly addressed. Details per lens below.


Lens 1 — Data loss

writeFileAtomic is tmp+rename, so two writes to the same path are race-free and the last writer wins on disk. The emit order (source gen first, active gen last) ensures the active gen content is the survivor on any key collision — matching the live read preference exactly.

One subtlety verified: parseDDBItemKey is correct for both legacy and ordered key encodings. The ordered encoding (dynamoItemKey) puts binary key data after the second |, but parseDDBItemKey stops at the second | — it never touches the binary suffix. The generation is decimal digits, so the second | in the key is unambiguous.

No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle-then-Finalize contract, no locking needed. No issues.


Lens 3 — Performance

totalStaleItemsExcluding allocates a small map[uint64]struct{} with at most 2 entries (activeGen + migrationSourceGen) per flushTable call. Cost is O(distinct generations), not O(items). No issues.


Lens 4 — Data consistency

The emit order correctly mirrors readLogicalItemAt:

readLogicalItemAt:  try activeGen → found? return it. else try sourceGen.
flushTable:         emit sourceGen first → emit activeGen last → activeGen wins via rename.

Both paths agree: new gen wins on conflict.

migrationSourceSchema() alignment check. The live migrationSourceSchema() sets KeyEncodingVersion: 0 for the source gen, but copies PrimaryKey and AttributeDefinitions unchanged from the active schema. Since flushTable uses hashKey/rangeKey from the active schema (st.schema.GetPrimaryKey()), and both gens share the same PK attribute names, writeDDBItem correctly derives file paths for items from either gen.

emitOrder edge cases correct:

  • migrationSourceGen == 0 → no-migration state → emitOrder = [activeGen] only ✅
  • migrationSourceGen == activeGen → guarded by migrationSourceGen != activeGen → no duplicate ✅

No data consistency issues.


Lens 5 — Test coverage

Two new tests cover the core fix correctly:

  • TestDDB_MigrationSourceGenerationItemsAreEmitted — distinct PKs in each gen; verifies both files land on disk ✅
  • TestDDB_NewGenerationWinsOverMigrationSourceForSameKey — same PK in both gens; verifies v.S == "new-version" survives ✅

One coverage gap — three-generation scenario:

TestDDB_StaleGenerationItemsExcludedAndWarned uses a schema with MigratingFromGeneration == 0, so emitOrder == [5] and the stale item at gen=4 is excluded. No test exercises the case where all three are present: migrationSourceGen (included), activeGen (included), and an even-older gen (stale + warned). A table in mid-migration that also has cleanup debris from a prior delete/recreate would hit this path. totalStaleItemsExcluding is correct by inspection, but this scenario is untested.

Fix this →

Carried over from prior round — still unaddressed:

schemaToPublic iterates GetAttributeDefinitions() and GetGlobalSecondaryIndexes() (Go maps) to build the output slices. The _schema.json arrays have non-deterministic element order across runs, making diff-based backup comparison noisy.

Fix this →


Summary

Finding Severity
Codex P1 #227: include migration-source-gen items ✅ Correctly fixed
Emit order matches live readLogicalItemAt preference ✅ Verified
parseDDBItemKey correct for ordered encoding ✅ Verified
Same-PK collision resolves to active gen via tmp+rename ✅ Verified
Three-generation scenario (stale + source + active) untested Low
Non-deterministic _schema.json output (carried from prior round) Low

Core correctness is solid. The two Low findings are improvements for operator experience/testability, not correctness gaps.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce5b7dac74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +116 to +117
rawName, err := base64.RawURLEncoding.DecodeString(encoded)
if err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty table segment in DynamoDB meta keys

HandleTableMeta accepts !ddb|meta|table| as valid because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated/malformed key is treated as table name "" instead of returning ErrDDBMalformedKey. In corrupted snapshots this can route schema output into the wrong scope and mask key corruption that Phase 0 is supposed to surface loudly.

Useful? React with 👍 / 👎.

Comment on lines +299 to +303
genEnd := strings.IndexByte(afterTable, '|')
if genEnd <= 0 {
return "", 0, errors.Wrapf(ErrDDBMalformedKey,
"item key missing gen/rest separator: %q", key)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject item keys with no payload after generation

parseDDBItemKey only checks that a | exists after the generation, but it does not require bytes after that separator, so keys like !ddb|item|<table>|7| are accepted. This lets truncated item keys bypass malformed-key detection and be emitted using only value-side attributes, which can hide snapshot corruption instead of failing fast.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 20:12
Two Codex P2 strict-validation follow-ups.

#117 -- empty table-meta segment.
HandleTableMeta accepted `!ddb|meta|table|` (no encoded segment)
because base64.RawURLEncoding.DecodeString("") returns empty bytes
without error, so the schema would route under the empty table
name. Now rejected with ErrDDBMalformedKey before the JSON decode.
TestDDB_RejectsTableMetaKeyWithEmptySegment.

#303 -- truncated item key.
parseDDBItemKey accepted `!ddb|item|<table>|7|` (gen separator
present, no primary-key payload). The gen-end check was
"genEnd > 0" which a trailing `|` satisfies. Added a follow-up
check that genEnd+1 != len(afterTable) so a payload-less key
surfaces as ErrDDBMalformedKey rather than emit under value-side
attributes only. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

2 Codex P2 strict-validation follow-ups addressed in 9029addb:

#117 — Empty table-meta segment. HandleTableMeta accepted !ddb|meta|table| because base64.RawURLEncoding.DecodeString("") returns []byte{} without error. Now rejected with ErrDDBMalformedKey before the JSON decode. TestDDB_RejectsTableMetaKeyWithEmptySegment.

#303 — Truncated item key. parseDDBItemKey accepted !ddb|item|<table>|7| (gen separator present but no primary-key payload). Added a genEnd+1 != len(afterTable) guard so a payload-less key surfaces as ErrDDBMalformedKey. TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 51s —— View job


Review: 9029add — backup: reject truncated DynamoDB keys (PR #716, round 3)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Both P2 fixes are correct. Details per lens below.


Lens 1 — Data loss

Fix #1 — empty table-meta segment (line 116–122):

stripPrefixSegment("!ddb|meta|table|", "!ddb|meta|table|") returns ("", nil) — no error — because the function simply returns string(key[len(prefix):]). Without the new guard, base64.RawURLEncoding.DecodeString("") also returns ([]byte{}, nil), so the table name resolves to "". EncodeSegment([]byte("")) returns "", so the flush path becomes:

filepath.Join(root, "dynamodb", "", "_schema.json")
  → root/dynamodb/_schema.json

That file lands at the dynamodb/ root, silently overwriting any other _schema.json written there by the same path on a different table (impossible for valid tables, but the behavior is ambiguous garbage). The new guard correctly catches this before any decode or unmarshal. Fix is correct.

Fix #2 — empty primary-key payload (line 315–324):

Without the guard, !ddb|item|<t>|7| passes all prior checks, buffers a DynamoItem from the value, and defers to Finalize. At flush time, writeDDBItem looks up the hash-key attribute in the item's attribute map and derives a path from the value — never from the key payload — so the truncated key itself doesn't produce a file at a wrong path. The real risk is the diagnostic one: a corrupted key silently produces output shaped entirely by the value side, masking the key corruption. The guard surfaces it at HandleItem time with a clear ErrDDBMalformedKey. Fix is correct.

No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle-then-Finalize contract, no concurrency implications. No issues.


Lens 3 — Performance

Both new checks are O(1) comparisons (encoded == "" and genEnd+1 == len(afterTable)). No impact.


Lens 4 — Data consistency

Trace through both test cases:

TestDDB_RejectsTableMetaKeyWithEmptySegment:

  • stripPrefixSegment("!ddb|meta|table|", "!ddb|meta|table|")""
  • encoded == "" guard fires → ErrDDBMalformedKey

TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload (key = !ddb|item|dA|7|):

  • rest = "dA|7|"tableEnd = 2enc = "dA" → valid base64url
  • afterTable = "7|"genEnd = 1gen = 7
  • genEnd+1 == len(afterTable)2 == 2ErrDDBMalformedKey

One subtle point worth noting: tableEnd <= 0 (existing guard, line 297) catches the case where an item key has an empty encoded-table segment (e.g., !ddb|item||7|...), since IndexByte returns 0 for a leading |. The new guard for empty item-key payload is complementary: it catches the case where the table and gen are present but the payload after the final | is empty. Together they cover all three truncation points in the <table>|<gen>|<payload> structure.

No data consistency issues.


Lens 5 — Test coverage

Both new tests are correct and well-targeted:

  • TestDDB_RejectsTableMetaKeyWithEmptySegment — passes the bare prefix as the key, exercises the encoded == "" guard, asserts errors.Is(err, ErrDDBMalformedKey)
  • TestDDB_RejectsItemKeyWithEmptyPrimaryKeyPayload — manually constructs !ddb|item|dA|7|, exercises the genEnd+1 == len(afterTable) guard, asserts errors.Is(err, ErrDDBMalformedKey)

Carried from prior rounds — still unaddressed (both Low):

  1. Three-generation scenario untested. totalStaleItemsExcluding correctly handles multiple non-included gens by inspection, but no test exercises stale_count when more than one generation is stale (e.g., schema gen=7, source gen=6, stale item at gen=5). TestDDB_StaleGenerationItemsExcludedAndWarned covers only the one-stale-gen path.

    Fix this →

  2. Non-deterministic _schema.json output. schemaToPublic iterates Go maps (GetAttributeDefinitions() and GetGlobalSecondaryIndexes()), so attribute_definitions and global_secondary_indexes arrays have random element order across runs. Diff-based backup verification will see spurious diffs.

    Fix this →


Summary

Finding Severity
Codex P2 #117: empty table-meta segment rejected ✅ Correctly fixed
Codex P2 #303: empty item-key payload rejected ✅ Correctly fixed
Both guards trace correctly for normal + truncated inputs ✅ Verified
tableEnd <= 0 + new genEnd+1 guards cover all truncation points ✅ Verified
Three-generation scenario (stale + source + active) untested Low — carried from round 2
Non-deterministic _schema.json output Low — carried from round 2

This PR is ready to merge from a correctness standpoint. The two Low items are quality-of-life improvements (testability and backup diff stability) that can land in a follow-up.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

// parseSQSMessageDataKey peels !sqs|msg|data|<encQueue><gen 8B><encMsgID>
// and returns encQueue. The gen and msgID are not surfaced because the
// dump format pulls QueueGeneration / MessageID out of the value record.

P1 Badge Filter SQS message rows to the active queue generation

parseSQSMessageDataKey only returns the encoded queue segment and explicitly drops the generation, so HandleMessageData cannot distinguish live rows from stale rows left under older generations. In this codebase, sqsQueueMeta.Generation is bumped on DeleteQueue/PurgeQueue and old message keys are cleaned asynchronously, so snapshots taken during cleanup can contain multiple generations for one queue; exporting all of them will resurrect purged/deleted messages on restore. Parse and retain the key generation and emit only the generation that matches the queue meta (or a documented migration fallback).


name, err := base64.RawURLEncoding.DecodeString(encoded)
if err != nil {

P2 Badge Reject empty queue segment in SQS queue-meta keys

HandleQueueMeta accepts !sqs|queue|meta| because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated key is treated as queue name "" instead of malformed input. That routes output to sqs/_queue.json at the root queue directory and masks key corruption that Phase 0 decoding is supposed to surface explicitly; add a non-empty segment check before decoding (matching the DynamoDB guard).

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

encoded, err := stripPrefixSegment(key, []byte(SQSQueueMetaPrefix))
if err != nil {
return err
}
name, err := base64.RawURLEncoding.DecodeString(encoded)

P2 Badge Reject empty queue segment in SQS meta keys

HandleQueueMeta accepts a key with no encoded queue suffix (e.g. !sqs|queue|meta|) because base64.RawURLEncoding.DecodeString("") succeeds, so a truncated/corrupt snapshot key is treated as queue name "" and _queue.json is emitted under the top-level sqs/ directory instead of failing fast. This masks key corruption and can misroute metadata in recovery workflows that rely on malformed-key detection.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

name, err := base64.RawURLEncoding.DecodeString(encoded)
if err != nil {

P2 Badge Reject empty encoded queue segment in SQS meta keys

HandleQueueMeta accepts the truncated key !sqs|queue|meta| because base64.RawURLEncoding.DecodeString("") succeeds, so a malformed snapshot record is treated as queue name "" and can emit _queue.json directly under sqs/ instead of failing fast. This masks key corruption in a path where the live parser (queueNameFromMetaKey) already treats empty segments as invalid, so backups can silently proceed with bad metadata.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d67baf8be5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +380 to +382
case *pb.DynamoAttributeValue_N:
return EncodeSegment([]byte(v.N)), nil
case *pb.DynamoAttributeValue_B:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Canonicalize numeric key segments before naming item files

The filename for numeric primary-key attributes is derived from the raw literal (EncodeSegment([]byte(v.N))), but DynamoDB key equality is numeric (e.g., "1" and "1.0" are the same logical key). In migration mode this breaks the intended “active generation wins” behavior: if source and active rows use equivalent numeric literals with different text forms, they are written to different paths and both survive in the backup, so restore can replay duplicate/stale logical rows. This appears when a table has N hash/range keys and mixed numeric string representations across generations.

Useful? React with 👍 / 👎.

Comment thread internal/backup/dynamodb.go Outdated
Comment on lines +411 to +412
for name, gsi := range s.GetGlobalSecondaryIndexes() {
g := publicGSI{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stabilize schema array order before writing _schema.json

schemaToPublic builds global_secondary_indexes and attribute_definitions by ranging over Go maps, which have non-deterministic iteration order. That means identical snapshots can produce different _schema.json byte output across runs, undermining deterministic backup artifacts and creating noisy diffs even when data is unchanged. Sort keys before appending these arrays.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 21:57
…ound 4)

Codex round 9 raised two issues on commit d67baf8:

1. P1: ddbKeyAttrToSegment was emitting `EncodeSegment([]byte(v.N))`
   for numeric primary-key attributes, but DynamoDB N equality is
   numeric — `"1"` and `"1.0"` (or `"100"` and `"1e2"`, or `"0.5"`
   and `"5e-1"`) name the same logical item. In migration mode
   where source and active rows used different decimal text for
   the same value, both rows survived at distinct paths and
   restore replayed duplicates. Mirror the live adapter's
   canonicalNumberString (adapter/dynamodb.go:7651) which uses
   big.Rat — same canonical form keeps backup filenames in lockstep
   with the live equality check.

2. P2: schemaToPublic ranged over Go maps for both
   global_secondary_indexes and attribute_definitions, so identical
   snapshots produced different `_schema.json` byte output across
   runs. Sort by name before append.

Tests:
  - TestDDB_CanonicalNumberKeySegment: equivalence pairs
    ("1"/"1.0", "100"/"1e2", "-0"/"0", "0.5"/"5e-1") collapse
    to the same key segment.
  - TestDDB_SchemaJSONIsDeterministic: 32 calls to schemaToPublic
    on the same schema produce identical attribute_definitions and
    GSI orders, both matching the documented sort-by-name.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 4 fixes for Codex round 9 on commit d67baf8 (commit cd88e632):

  1. P1 — DynamoDB N keys not canonicalized. ddbKeyAttrToSegment was emitting EncodeSegment([]byte(v.N)) directly. DynamoDB N equality is numeric: "1" and "1.0" (or "100" and "1e2", or "0.5" and "5e-1") name the same logical item. In migration mode where source and active rows used different decimal text, both rows survived at distinct paths and restore replayed duplicates. Now mirrors the live adapter's canonicalNumberString (adapter/dynamodb.go:7651) via big.Rat — same canonical form keeps backup filenames in lockstep with live equality. Test: TestDDB_CanonicalNumberKeySegment (4 equivalence pairs).

  2. P2 — _schema.json non-deterministic. schemaToPublic ranged over Go maps for both global_secondary_indexes and attribute_definitions, so identical snapshots produced different byte output across runs. Sort by name before append. Test: TestDDB_SchemaJSONIsDeterministic (32 calls, asserts identical order matching documented sort-by-name).

go test -race and golangci-lint clean. Fix merged forward into #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 4m 28s —— View job


Review: cd88e63 — backup: canonicalize DynamoDB N keys + sort schema arrays (PR #716, round 4)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go (current branch)
  • Read internal/backup/dynamodb_test.go (current branch)
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Check adapter/dynamodb.go for canonicalNumberString reference
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Both fixes are correctly implemented. Details per lens below.


Lens 1 — Data loss

canonicalDDBNumber is byte-for-byte identical to adapter/dynamodb.go:canonicalNumberString (lines 7651–7657). The implementation is a straight copy — no drift between backup filename identity and live equality. Since both functions use big.Rat.RatString() to produce the canonical form, equivalent N literals will always produce the same segment. writeFileAtomic's tmp+rename ensures the active-gen write (emitted second) survives on any collision path.

One subtle point verified safe: big.Rat.RatString() returns "a/b" form for non-integer rationals (e.g., "0.5""1/2"). EncodeSegment percent-encodes / as %2F, so the resulting filename is 1%2F2.json — a valid single path component, no accidental directory creation. ✓

schemaToPublic determinism is a readability/diffability fix; it does not affect item persistence paths. No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle→Finalize contract, no locking needed. No issues.


Lens 3 — Performance

canonicalDDBNumber allocates one big.Rat per primary-key write (only for N-type keys). This is one allocation per item flush, not per attribute — no hot-path concern for backup throughput. schemaToPublic sorting is O(k log k) where k is the number of attribute definitions or GSIs, called once per table at Finalize time. No issues.


Lens 4 — Data consistency

N canonicalization correctly mirrors live equality. The live path uses canonicalNumberString for set membership and duplicate detection; the backup encoder now uses the same function for filename derivation. Equivalent numeric representations ("1" / "1.0", "100" / "1e2", "-0" / "0", "0.5" / "5e-1") all resolve to the same big.Rat.RatString() output and therefore the same filename. In migration mode: source gen written first, active gen last → active gen wins at any colliding path. ✓

Important distinction preserved: scalarAttributeValueToPublic still emits the original v.N string in the item body (not the canonical form). Only the filename uses the canonical form. This is correct — the dump should reflect the actual stored value, not a normalized one.

schemaToPublic ordering is now deterministic: GSIs sorted by name, attribute definitions sorted by name. Sort by sort.Strings over collected key slices — straightforward and correct. No consistency issues.


Lens 5 — Test coverage

Both new tests are correct:

TestDDB_CanonicalNumberKeySegment — 4 equivalence pairs exercise the key collapsing behavior at the unit level. Correctly asserts ddbKeyAttrToSegment(nAttr(a)) == ddbKeyAttrToSegment(nAttr(b)) for all four cases. ✓

TestDDB_SchemaJSONIsDeterministic — 32 iterations guard against Go map-ordering non-determinism. Also asserts the documented sort order (alphabetical by name for both GSIs and attribute definitions). ✓

One carried gap (Low — round 2/3, still open):

TestDDB_StaleGenerationItemsExcludedAndWarned exercises schema.Generation=5, migrationSource=0, one stale item at gen=4. No test covers the three-generation mixture: activeGen=7, migrationSourceGen=6, and a debris item at gen=5 (stale). totalStaleItemsExcluding is correct by inspection but this path is untested.

Fix this →

One new gap (Low):

TestDDB_NewGenerationWinsOverMigrationSourceForSameKey uses S-type keys. The P1 concern is specifically about N-type keys with different text representations ("1" vs "1.0"). An end-to-end test with N-type primary keys in both generations would directly reproduce the original bug scenario (before the fix, these would produce different filenames).

Fix this →


Summary

Finding Severity
Codex P1 (round 9): N key canonicalization via big.Rat ✅ Correctly fixed
Codex P2 (round 9): non-deterministic _schema.json output ✅ Correctly fixed
canonicalDDBNumber matches live canonicalNumberString byte-for-byte ✅ Verified
Rational form "1/2" for non-integers percent-encoded safely by EncodeSegment ✅ Verified safe
N value in item body still uses original string (not canonical) ✅ Correct design
Three-generation stale scenario (stale + source + active) untested Low — carried from round 3
No end-to-end N-key migration-conflict test for P1 fix Low — unit test present; S-key integration test present

Core correctness is solid. The two Low findings are additional test coverage improvements; the unit-level proof for N canonicalization is already in TestDDB_CanonicalNumberKeySegment.

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

name, err := base64.RawURLEncoding.DecodeString(encoded)
if err != nil {

P2 Badge Reject empty queue segment in queue-meta keys

A truncated key like !sqs|queue|meta| is currently accepted because DecodeString("") succeeds, so HandleQueueMeta creates a queue with an empty name instead of surfacing ErrSQSMalformedKey. In corrupted snapshots this silently routes output to sqs/_queue.json and masks key corruption that Phase 0 decoding should fail on.


encoded, err := stripPrefixSegment(key, []byte(SQSQueueGenPrefix))
if err != nil {
return err
}
gen, err := strconv.ParseUint(strings.TrimSpace(string(value)), 10, 64) //nolint:mnd // 10 == decimal radix
if err != nil {
return errors.Wrap(ErrSQSMalformedKey, err.Error())
}
s.queueState(encoded).activeGen = gen
return nil

P2 Badge Validate queue segment before applying queue-gen record

HandleQueueGen accepts any suffix after !sqs|queue|gen| without base64/empty-segment validation, so malformed or truncated gen keys are treated as valid queue IDs. If this happens, the real queue never gets activeGen and filterStaleGenMessages falls back to no filtering, which can re-emit stale generation messages instead of failing fast on malformed input.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34ea5bad26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if err != nil {
return err
}
dir := filepath.Join(itemsDir, hashFilename)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject dot segments before joining composite-key paths

writeDDBItem uses filepath.Join(itemsDir, hashFilename) for composite keys, but hashFilename comes from EncodeSegment, which can legally return "." or ".." for string/number key values. In that case Go path-cleaning escapes or collapses the items/ directory, so an item like hash=".." and range="_schema" is written as <table>/_schema.json and can overwrite the schema file (or other table-level files). This affects any backup containing such key values and can silently corrupt output.

Useful? React with 👍 / 👎.

Codex P1 round 12 (commit 34ea5ba):

`writeDDBItem` joined `itemsDir` with the encoded hash/range key
filename. EncodeSegment treats `.` and `..` as RFC3986-unreserved
and preserves them, so a DynamoDB S/N key value of "." or ".."
flows through as a single-character filename. filepath.Join then
either collapses `<itemsDir>/.` back to itemsDir or resolves
`<itemsDir>/..` to the parent — letting an item like hash=".."
range="_schema" overwrite the table-level _schema.json.

`refuseDotSegmentFilename` rejects sole-dot encoded segments for
both hash and range keys; surfaces ErrDDBInvalidItem so existing
callers branching on errors.Is keep their semantics.

Test: TestDDB_DotSegmentKeyRejected covers all four shapes
(hash="." / hash=".." / range="." / range="..").
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 5 fix for Codex P1 round 12 on commit 34ea5ba (commit 577898ae):

writeDDBItem joined itemsDir with the encoded hash/range key filename. EncodeSegment treats . and .. as RFC3986-unreserved and preserves them, so a DynamoDB S/N key value of "." or ".." flows through as a single-character filename. filepath.Join then either collapses <itemsDir>/. back to itemsDir or resolves <itemsDir>/.. to the parent — letting an item like hash=".." range="_schema" overwrite the table-level _schema.json.

Added refuseDotSegmentFilename which rejects sole-dot encoded segments for both hash and range keys (surfaces ErrDDBInvalidItem so existing callers branching on errors.Is keep their semantics). Test: TestDDB_DotSegmentKeyRejected covers all four shapes (hash=".", hash="..", range=".", range="..").

go test -race and golangci-lint clean. Fix merged forward into #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 37s —— View job


Review: 577898a — backup: refuse dot-segment DynamoDB key filenames (PR #716, round 5)

  • Read PR context and understand request
  • Read internal/backup/dynamodb.go
  • Read internal/backup/dynamodb_test.go
  • Diff against base branch (origin/feat/backup-phase0a-sqs)
  • Review: Data loss lens
  • Review: Concurrency/distributed failures lens
  • Review: Performance lens
  • Review: Data consistency lens
  • Review: Test coverage lens

Codex P1 round 12 is correctly addressed. Details per lens below.


Lens 1 — Data loss

EncodeSegment confirmed to pass through . and .. as-is. isUnreserved (filename.go:290) includes c == '.' in the unreserved set, so EncodeSegment([]byte(".")) = "." and EncodeSegment([]byte("..")) = "..". The guard at writeDDBItem:355 and :375 fires on exactly these values.

Guard placement is correct. For hash-only schema: the hash guard fires before writeFileAtomic(filepath.Join(itemsDir, hashFilename+".json"), body). For composite schema: the range guard fires before os.MkdirAll(filepath.Join(itemsDir, hashFilename), ...) — no partial directory creation occurs. No file is touched before the error surfaces.

B-type keys cannot produce dot-segments. EncodeBinarySegment always emits "b64.<base64url>" (the binaryPrefix is prepended unconditionally). []byte(".") encodes to "b64.Lg" — never "." or "..". The guard is harmless for B keys.

N-type keys with value "." or ".." are also caught. canonicalDDBNumber(".") calls big.Rat.SetString(".") which fails (. is not a valid rational literal), so the function falls through to strings.TrimSpace(".") = ".". EncodeSegment([]byte(".")) = ".". Guard fires. ✓

No data loss issues.


Lens 2 — Concurrency / distributed failures

Sequential Handle→Finalize contract, no locking needed. No issues.


Lens 3 — Performance

refuseDotSegmentFilename is two string comparisons. Called at most twice per item at flush time. No impact.


Lens 4 — Data consistency

Error propagation trace. writeDDBItem returns ErrDDBInvalidItemflushTable returns it immediately (fail-fast from Gemini #182 fix, round 2) → Finalize returns it to the caller. Test verifies errors.Is(err, ErrDDBInvalidItem). ✓

Compound attack case implicitly covered. The Codex issue described hash=".." range="_schema" as the dangerous scenario (writes <table>/_schema.json). Since the hash guard fires before range computation, any dotted hash key — regardless of what the range key is — is caught. The compound scenario is more dangerous than what the test exercises, but more restrictive than the four tested shapes, so it's covered by hash_dotdot.

One new Low finding — table name segment not guarded:

flushTable builds the output directory as:

dir := filepath.Join(d.outRoot, "dynamodb", EncodeSegment([]byte(st.name)))

st.name comes from base64.RawURLEncoding.DecodeString(encoded) in HandleTableMeta. If a snapshot carried a table whose base64-decoded name were "." or ".." (corrupted snapshot — real DynamoDB forbids single-dot table names by naming rules), filepath.Join would collapse the path to dynamodb/ or outRoot/. _schema.json would then land at dynamodb/_schema.json, misrouting schema for every table processed after it.

A refuseDotSegmentFilename-style check on the table name segment in flushTable, or a refuseDotSegmentFilename(EncodeSegment([]byte(st.name)), "table") call before os.MkdirAll, would close the same class of issue at the directory level. In practice, this requires deliberately malformed snapshots — real DynamoDB enforces table naming constraints (must start with letter/digit, min 3 chars) so "." and ".." can never appear as real table names.

Fix this →


Lens 5 — Test coverage

TestDDB_DotSegmentKeyRejected covers all four shapes correctly:

Case Schema type hash range guard fired
hash_dot hash-only "." hash guard ✓
hash_dotdot hash-only ".." hash guard ✓
range_dot composite "ok" "." range guard ✓
range_dotdot composite "ok" ".." range guard ✓

All four assert errors.Is(err, ErrDDBInvalidItem). ✓

Carried from prior rounds — still unaddressed (both Low):

  1. Three-generation scenario untested. totalStaleItemsExcluding correctly handles multiple non-included gens, but no test exercises stale_count with more than one stale generation (e.g., schema gen=7, migrationSource=6, stale debris at gen=5).

    Fix this →

  2. No end-to-end N-key migration conflict test. TestDDB_CanonicalNumberKeySegment unit-tests the canonicalization itself, and TestDDB_NewGenerationWinsOverMigrationSourceForSameKey uses S-type keys. A Finalize-level test with N-type hash keys in both generations would directly reproduce the original round 9 bug.


Summary

Finding Severity
Codex P1 round 12: dot-segment item key rejected via refuseDotSegmentFilename ✅ Correctly fixed
B-type keys cannot produce dot-segments (guard can't fire, not needed) ✅ Verified safe
N-type "." / ".." also caught via canonicalDDBNumber fallthrough + guard ✅ Verified
Guard placement: fires before any os.MkdirAll or writeFileAtomic ✅ Verified correct
Table name segment in flushTable not guarded for dot-segments Low — corrupted-snapshot scenario only; real DynamoDB naming rules prevent it
Three-generation scenario (stale + source + active) untested Low — carried from round 3
No Finalize-level N-key migration conflict test Low — carried from round 4

The core fix is solid. The single new Low finding (table name segment) is a defense-in-depth improvement for malformed snapshots; it does not affect backup safety with real DynamoDB data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant