Skip to content

backup: SQS encoder for queue meta and messages.jsonl (Phase 0a)#714

Open
bootjp wants to merge 15 commits intofeat/backup-phase0a-redis-simplefrom
feat/backup-phase0a-sqs
Open

backup: SQS encoder for queue meta and messages.jsonl (Phase 0a)#714
bootjp wants to merge 15 commits intofeat/backup-phase0a-redis-simplefrom
feat/backup-phase0a-sqs

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 30, 2026

Summary

Stacked on top of #713. Adds the SQS encoder for the Phase 0 logical-backup decoder.

Snapshot prefixes handled:

  • !sqs|queue|meta|<base64url(queue)>sqs/<queue>/_queue.json (dump-format projection of sqsQueueMeta with AWS-style field names)
  • !sqs|msg|data|<base64url(queue)><gen 8B BE><base64url(msgID)>sqs/<queue>/messages.jsonl (one record per line, sorted by (SendTimestampMillis, SequenceNumber, MessageID))
  • !sqs|msg|{vis,byage,dedup,group}, !sqs|queue|tombstone: excluded by default; --include-sqs-side-records routes them to sqs/<queue>/_internals/side_records.jsonl
  • !sqs|queue|{gen,seq}: ignored (operational counters, not user state)

Why buffer + sort at Finalize

Lex-order in the snapshot is m < q, so msg|data arrives BEFORE queue|meta. The encoder buffers per encoded-queue-prefix and resolves the human-readable queue name at Finalize via the meta records that arrive later. Per-queue memory is O(messages-per-queue); documented as a known limit for >100M-message queues.

Boundary detection

The msg|data key shape is <base64url(queue)><gen 8B BE><base64url(msgID)>. base64url alphabet is [A-Za-z0-9-_]; the first byte of an 8-byte BE gen is 0x00 for any production gen (< 2^56), so the first non-alphabet byte cleanly separates queue from gen. The msgID segment is additionally validated by attempting a base64url decode — a failed decode surfaces as ErrSQSMalformedKey rather than routing to the wrong queue.

Defaults

  • Visibility-state fields (visible_at_millis, current_receipt_token, receive_count, first_receive_millis) are zeroed on output. WithPreserveVisibility(true) passes them through.
  • Orphan msg|data records (no matching queue meta) emit a structured warning at Finalize and are dropped from the dump. Restoring orphans without a queue config would silently create a default-attribute queue.

Test plan

  • go test -race ./internal/backup/... — pass.
  • golangci-lint run ./internal/backup/... — clean.
  • 11 tests covering queue-meta round-trip, message ordering with ties, visibility-state zeroing/preservation, orphan warning, magic-prefix rejection, malformed-JSON rejection, wrong-prefix rejection, key-component peek, side-records include/exclude.

Stacking

Base: feat/backup-phase0a-redis-simple (PR #713).

Builds on PR #713. Adds the SQS encoder for the Phase 0 logical-
backup decoder.

Snapshot prefixes handled:
- !sqs|queue|meta|<base64url(queue)> -> sqs/<queue>/_queue.json
  (dump-format projection of the live sqsQueueMeta with AWS-style
  field names; FormatVersion stamped, throttle / partition / dedup-
  scope fields elided -- cluster-internal state, not user-visible
  config)
- !sqs|msg|data|<base64url(queue)><gen 8B BE><base64url(msgID)> ->
  sqs/<queue>/messages.jsonl (one record per line, sorted at
  Finalize-time by (SendTimestampMillis, SequenceNumber, MessageID)).
- !sqs|msg|vis | byage | dedup | group, !sqs|queue|tombstone:
  excluded by default; --include-sqs-side-records routes them to
  sqs/<queue>/_internals/side_records.jsonl as a structured bag.
- !sqs|queue|gen, !sqs|queue|seq: not handled by Phase 0 (operational
  counters, not user-visible state).

Implementation choices:
- Lex-order for the snapshot is m < q < ... so msg|data records
  arrive BEFORE queue|meta. Encoder buffers per encoded-queue-prefix
  and resolves the human-readable queue name at Finalize via the
  queue|meta records that arrive later.
- Boundary detection: the encoded queue segment is base64url-no-
  padding (alphabet [A-Za-z0-9-_]); the first byte of the 8-byte
  BE gen is 0x00 for any production gen value (< 2^56), so the
  first non-alphabet byte is the queue/gen boundary. The msgID
  segment is also validated by attempting a base64url decode -- a
  failed decode is surfaced as ErrSQSMalformedKey rather than
  routed to the wrong queue.
- Visibility-state on emitted messages is zeroed by default;
  --preserve-visibility passes the live values through.
- Orphan messages (data records with no matching queue meta) emit
  a structured warning at Finalize and are dropped from the dump.
  Restoring orphans without a queue config would silently create a
  default-attribute queue, which is rarely what the operator wants.
- Memory: O(messages-per-queue) buffer at Finalize for the sort
  pass. Queues with hundreds of millions of messages will need a
  future stream-and-merge variant; documented as a known limit.

Tests: queue meta round-trip, message ordering by (ts, seq, id) with
ties resolved by message_id, default visibility-state zeroing,
--preserve-visibility round-trip, orphan-message warning,
magic-prefix rejection, JSON-decode rejection, wrong-prefix
rejection, peekMsgDataKey component round-trip, side-records
include/exclude.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61db5a49-68a1-472d-87c8-46099aa740a8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-sqs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the SQSEncoder, which processes SQS snapshot records into a structured per-queue layout consisting of metadata and ordered message logs. The implementation includes logic for parsing SQS keys, validating magic prefixes in stored values, and sorting messages by timestamp and sequence number. Review feedback identified critical boundary check issues in the key parsing functions parseSQSMessageDataKey and parseSQSGenericKey, where missing or empty segments could lead to malformed data or incorrect routing.

Comment thread internal/backup/sqs.go Outdated
Comment on lines +369 to +372
if idx == 0 || idx+genBytes > len(rest) {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment boundary not found in %q", key)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The boundary check should ensure that there is at least some data remaining for the message ID segment. If idx + genBytes == len(rest), the encMsgID will be empty, which is invalid for an SQS message data key. Using >= instead of > ensures that a non-empty message ID segment is present.

Suggested change
if idx == 0 || idx+genBytes > len(rest) {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment boundary not found in %q", key)
}
if idx == 0 || idx+genBytes >= len(rest) {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment boundary or message ID not found in %q", key)
}

Comment thread internal/backup/sqs.go
Comment on lines +397 to +401
idx := scanBase64URLBoundary(rest)
if idx == 0 {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment not found after prefix %q", prefix)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parseSQSGenericKey function lacks a check to ensure that a boundary was actually found. Since all SQS side-record keys (visibility, by-age, deduplication, group, and tombstone) are expected to have a binary suffix (at least the 8-byte generation), scanBase64URLBoundary should return an index strictly less than the length of the string. If idx == len(rest), it means no non-base64url byte was found, and the function will incorrectly return the entire remaining key as the queue name, leading to incorrect routing and directory structure in the backup.

Suggested change
idx := scanBase64URLBoundary(rest)
if idx == 0 {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment not found after prefix %q", prefix)
}
idx := scanBase64URLBoundary(rest)
if idx == 0 || idx == len(rest) {
return "", errors.Wrapf(ErrSQSMalformedKey,
"queue segment boundary not found after prefix %q", prefix)
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 796a42f5d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/sqs.go Outdated
Comment on lines +284 to +285
if len(st.messages) == 0 {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit side records even when a queue has no messages

When WithIncludeSideRecords(true) is enabled, flushQueue returns immediately for queues with zero messages, so _internals/side_records.jsonl is never written even if internalBuf contains tombstone/vis/dedup/group records. This silently drops opted-in SQS side-record data for queues that currently have no message rows (a realistic case after purge/reap or for metadata-only snapshots), which makes --include-sqs-side-records incomplete.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 19:30
Three issues, all valid.

#372 Gemini -- empty msg-id rejected.
parseSQSMessageDataKey changed `idx+genBytes > len(rest)` to
`idx+genBytes >= len(rest)`. AWS SQS message IDs are non-empty
by construction, so an empty trailer cannot be a legitimate
snapshot record. Test
TestSQS_ParseMessageDataKey_RejectsEmptyMsgIDSegment locks it in.

#401 Gemini -- trailerless side-record key rejected.
parseSQSGenericKey now also rejects `idx == len(rest)`. All
side-record key shapes (vis / byage / dedup / group / tombstone)
terminate the encoded queue segment with at least one binary
trailer (the gen u64); a key whose entire suffix is base64url-clean
is malformed. Test
TestSQS_ParseGenericKey_RejectsTrailerlessKey covers it.

#285 Codex P2 -- side records flushed even with zero messages.
flushQueue used to `return` after writing _queue.json when the
queue had no messages, which silently dropped any internalBuf
content (purged or metadata-only queues with --include-sqs-side-
records). Now the message-flush block is gated on len > 0 but
the side-record flush runs unconditionally. Test
TestSQS_SideRecordsFlushedEvenWhenZeroMessages.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

3 issues addressed in 3455ba7f:

Gemini #372parseSQSMessageDataKey now rejects keys with empty msg-id segment (>= instead of >). AWS message IDs are non-empty by construction.

Gemini #401parseSQSGenericKey now also rejects idx == len(rest). Side-record keys always terminate the queue segment with a binary trailer (gen u64); a key whose entire suffix is base64url-clean is malformed.

Codex P2 #285flushQueue no longer returns early when the queue has zero messages; the side-record flush runs unconditionally. Purged or metadata-only queues now correctly emit _internals/side_records.jsonl under --include-sqs-side-records.

New tests: TestSQS_ParseMessageDataKey_RejectsEmptyMsgIDSegment, TestSQS_ParseGenericKey_RejectsTrailerlessKey, TestSQS_SideRecordsFlushedEvenWhenZeroMessages.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 48s —— View job


Review of commit 3455ba7

  • Read current code in internal/backup/sqs.go
  • Read current tests in internal/backup/sqs_test.go
  • Review the 3 fixes
  • Review broader code for remaining issues
  • Post review feedback

All three fixes are correct and the new tests properly lock them down. Two minor issues remain.

Fix 1 — parseSQSMessageDataKey boundary (>=) ✅

// sqs.go:378
if idx == 0 || idx+genBytes >= len(rest) {

>= correctly rejects the case where idx+genBytes == len(rest) (empty msg-id segment). The inline comment at lines 375–377 explains the AWS invariant. The test TestSQS_ParseMessageDataKey_RejectsEmptyMsgIDSegment constructs exactly that key shape (prefix + cQ + 8-byte gen, no msg-id) and asserts ErrSQSMalformedKey. Solid.

Fix 2 — parseSQSGenericKey trailerless rejection ✅

// sqs.go:412
if idx == 0 || idx == len(rest) {

idx == len(rest) correctly rejects a key whose entire suffix after the prefix is base64url-clean. Side-record keys always have a binary gen trailer starting with 0x00, so a legitimate key will always have idx < len(rest). TestSQS_ParseGenericKey_RejectsTrailerlessKey uses a pure base64url suffix (cQQQ) to trigger this path. Correct.

Fix 3 — flushQueue side records flush independently ✅

// sqs.go:284–310
if len(st.messages) > 0 {
    // ... write messages.jsonl ...
}
if len(st.internalBuf) > 0 {
    if err := s.flushInternals(dir, st.internalBuf); err != nil { ... }
}

The side-records flush is now unconditionally outside the messages block. TestSQS_SideRecordsFlushedEvenWhenZeroMessages registers a meta record + one vis side-record (no message-data records) and asserts the _internals/side_records.jsonl file exists after Finalize. Covers the exact purged-queue scenario. Correct.


Two minor nits (non-blocking)

1. peekMsgDataKey uses > not >= (sqs.go:582)

// debug helper — sqs.go:582
if idx == 0 || idx+genBytes > len(rest) {

The production parseSQSMessageDataKey uses >= (rejects empty msg-id), but this debug helper still uses >, which accepts an empty msg-id. Since peekMsgDataKey is only called from tests and is explicitly documented as a non-production path, it's not a safety issue — but a caller probing the edge case through both functions would get different results. Fix this →

2. Stale comment in HandleSideRecord (sqs.go:241–243)

if err != nil {
    // Tombstones include a fixed-width gen but no msg ID; the
    // generic parser tolerates the empty trailer.
    return err
}

After Fix 2, parseSQSGenericKey no longer tolerates an empty trailer — it now returns ErrSQSMalformedKey for idx == len(rest). The comment was presumably written to explain why tombstone keys don't fail the generic parser (they don't because the gen bytes start with 0x00, a non-base64url byte that terminates the scan before len(rest)). The comment now says the opposite of the truth. Fix this →


Overall: The three fixes are correct and well-tested. The nits above are cosmetic and don't affect correctness. Ready to merge once the stacking dependency (#713) lands.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f14fefc32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/sqs.go Outdated
Comment on lines +262 to +266
if st.meta == nil {
s.emitWarn("sqs_orphan_messages",
"encoded_queue", st.encoded,
"buffered_messages", len(st.messages),
"hint", "no !sqs|queue|meta record matched this encoded prefix; messages dropped from the dump")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve side records when queue metadata is missing

When Finalize encounters a queue state without meta, it warns and skips flushing the queue entirely, which also drops any buffered side records (vis/byage/dedup/group/tombstone) for that queue. This breaks --include-sqs-side-records for a common case where DeleteQueue leaves tombstones but no !sqs|queue|meta|... row, so operators requesting side records lose exactly the deletion-era records they opted in to export.

Useful? React with 👍 / 👎.

Codex P2 round 8 (commit 86fbf3a):

When `Finalize` encountered a queue state without a !sqs|queue|meta
record (most commonly post-DeleteQueue, where the meta row was
removed but tombstones remain), it warned and skipped the queue
entirely — including any buffered side records. With
--include-sqs-side-records on, that silently dropped exactly the
deletion-era state operators opted in to capture, contradicting
the opt-in contract.

Now the orphan branch flushes side records to a `<encoded>.orphan`
sub-directory while still warning that messages were dropped. The
`.orphan` suffix prevents a restore tool from mistaking it for a
real queue dir produced by a successful meta flush; the encoded
prefix is the only stable identifier available without meta.
Orphan messages still drop because restoring them without queue
config would silently create a default-settings queue.

The branch is gated on `s.includeSideRecords` so the
default-off contract is preserved: without the flag, no `.orphan`
directory is ever created.

Tests:
  - TestSQS_OrphanQueueSideRecordsPreserved — drives a vis-record
    without a meta row, asserts the .orphan/_internals path is
    populated and the orphan-messages warning fires.
  - TestSQS_OrphanQueueSideRecordsSuppressedWhenOptOut — same
    scenario without --include-sqs-side-records; asserts no
    .orphan dir is created.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 2 fix for Codex P2 round 8 on commit 86fbf3a (commit 1621ede4):

When Finalize saw a queue state without a !sqs|queue|meta row (most commonly post-DeleteQueue, where the meta row was removed but tombstones remain), it warned and skipped the queue entirely — including any buffered side records. With --include-sqs-side-records on that silently dropped exactly the deletion-era state operators opted in to capture, contradicting the opt-in contract.

Now the orphan branch flushes side records to a <encoded>.orphan sub-directory while still warning that messages were dropped. The .orphan suffix prevents restore tools from mistaking it for a real queue dir; the encoded prefix is the only stable identifier without meta. Orphan messages still drop (restoring them without queue config would silently create a default-settings queue). The branch is gated on includeSideRecords so the default-off contract is preserved.

Tests: TestSQS_OrphanQueueSideRecordsPreserved and TestSQS_OrphanQueueSideRecordsSuppressedWhenOptOut. go test -race and golangci-lint clean. Fix merged forward into #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 11s —— View job


Review of commit 1621ede4

  • Read current internal/backup/sqs.go
  • Read current internal/backup/sqs_test.go
  • Review the orphan-queue side-records fix
  • Check the two previous nits from last round
  • Broader review pass
  • Post feedback

The orphan-queue fix is correct and the two new tests are well-constructed. Two nits from last round remain unaddressed.

Fix: flushOrphanQueueSideRecords

// sqs.go:288-303
func (s *SQSEncoder) flushOrphanQueueSideRecords(st *sqsQueueState) error {
    s.emitWarn("sqs_orphan_messages", ...)
    if !s.includeSideRecords || len(st.internalBuf) == 0 {
        return nil
    }
    dir := filepath.Join(s.outRoot, "sqs", st.encoded+".orphan")
    return s.flushInternals(dir, st.internalBuf)
}

The warning unconditionally fires for all orphan queues (regardless of opt-in status) so operators always know about the missing meta. The flush is gated on includeSideRecords && len(internalBuf) > 0, which is correct. The .orphan directory suffix, the flushInternals delegation, and os.MkdirAll handling are all consistent with the normal flush path.

One observation: the !s.includeSideRecords guard at line 294 is technically redundant — HandleSideRecord already drops records at intake when the flag is off (line 236–237), so internalBuf will always be empty when includeSideRecords == false. The double-guard is harmless and could be read as defence-in-depth, but it slightly obscures the real invariant. Non-blocking.

Finalize routing ✅

// sqs.go:265-279
for _, st := range s.queues {
    if st.meta == nil {
        if err := s.flushOrphanQueueSideRecords(st); err != nil && firstErr == nil {
            firstErr = err
        }
        continue
    }
    ...
}

Clean separation. All queues are iterated even on error (first-error-wins accumulation is the pre-existing pattern).

TestSQS_OrphanQueueSideRecordsPreserved

Creates a vis side-record with no matching meta, asserts sqs_orphan_messages warning fires and sqs/<encoded>.orphan/_internals/side_records.jsonl exists. Covers the exact scenario in the bug report.

TestSQS_OrphanQueueSideRecordsSuppressedWhenOptOut

With includeSideRecords off, HandleSideRecord drops the record at intake so no sqsQueueState is ever created. Finalize has nothing to iterate over, no .orphan dir is written. The test comment correctly explains this ("HandleSideRecord drops the record at intake") and the os.IsNotExist assertion is correct.


Two nits still open from last round (non-blocking)

1. peekMsgDataKey boundary inconsistency (sqs.go:611)

// still uses > instead of >= — inconsistent with parseSQSMessageDataKey
if idx == 0 || idx+genBytes > len(rest) {

The production parseSQSMessageDataKey correctly uses >= to reject empty msg-id segments (Fix 1 from last round), but this debug helper still uses >. It's test-only and documented as non-production, so it's not a safety issue — but a caller testing both functions against the same edge-case key gets different results. Fix this →

2. Stale comment in HandleSideRecord (sqs.go:241–243)

if err != nil {
    // Tombstones include a fixed-width gen but no msg ID; the
    // generic parser tolerates the empty trailer.
    return err
}

After Fix 2 (last round), parseSQSGenericKey now rejects idx == len(rest) — the comment says the opposite of the truth. Tombstone keys work because their gen starts with 0x00 (a non-base64url byte), which terminates the scan before len(rest) — not because the parser tolerates an empty trailer. Remove or replace the comment. Fix this →


Overall: The orphan-queue side-records fix is correct and complete. The .orphan directory convention is well-documented in the Finalize docstring, the two tests cleanly lock down the opt-in and opt-out paths, and the flushInternals delegation keeps the flush logic DRY. The two lingering nits are cosmetic. Ready to merge once the stacking dependency (#713) lands.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18a8ae3f74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/sqs.go
if err != nil {
return "", err
}
idx := scanBase64URLBoundary(rest)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept partitioned SQS msg-data keys

The parser only handles legacy !sqs|msg|data|<queue><gen><msgID> layout and infers queue/gen boundary from the first non-base64 byte, so keys in the partitioned format (!sqs|msg|data|p|..., defined in adapter/sqs_keys.go as SqsPartitionedMsgDataPrefix) are treated as malformed (the queue segment becomes "p" and base64 decode fails). In environments where HT-FIFO partitioned keys are present, HandleMessageData will return an error and abort backup decoding instead of exporting messages.

Useful? React with 👍 / 👎.

Comment thread internal/backup/sqs.go
if err != nil {
return "", err
}
idx := scanBase64URLBoundary(rest)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse partitioned side-record keys before routing

parseSQSGenericKey also assumes the legacy key shape and returns bytes up to the first non-base64 character as the queue segment. For partitioned side-record keys like !sqs|msg|vis|p|<queue>|..., this extracts "p" as the queue key, so side records from different queues collapse into one state bucket and flush under the wrong/orphan path when --include-sqs-side-records is enabled.

Useful? React with 👍 / 👎.

bootjp added 2 commits April 30, 2026 21:35
Codex round 9 (commit 8d08827):

`parseSQSMessageDataKey` and `parseSQSGenericKey` only handled the
legacy SQS key shape:

	!sqs|msg|<family>|<encQueue><gen 8B><rest>

HT-FIFO partitioned queues use the discriminator-prefixed shape
(adapter/sqs_keys.go SqsPartitionedMsg*Prefix):

	!sqs|msg|<family>|p|<encQueue>|<part 4B><gen 8B><rest>

The legacy heuristic read "p" as the queue segment and aborted with
ErrSQSMalformedKey. In environments running partitioned FIFO queues
HandleMessageData would refuse every record and the dump pass
would abort entirely (P1); side records would similarly collapse
into one mis-routed bucket under --include-sqs-side-records (P2).

Recognise the partitioned shape via the literal "p|" prefix on the
post-family suffix and route into a new
parseSQSPartitionedQueueAndTrailer helper that:

  - peels the "p|" discriminator
  - reads the queue segment up to the literal '|' terminator
  - validates that the trailer carries at least partition u32 +
    gen u64 bytes
  - if the caller is parsing a msg-data key, validates the
    msg-id segment decodes as base64url

Tests:
  - TestSQS_ParsePartitionedMessageDataKey
  - TestSQS_ParsePartitionedSideRecordKey (vis/byage/dedup/group)
  - TestSQS_ParsePartitionedMessageDataKey_RejectsTruncatedTrailer

The legacy parser path is unchanged for non-partitioned keys.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

Round 3 fixes for Codex round 9 on commit 8d08827 (commit f923d99f):

The legacy SQS key parsers handled only !sqs|msg|<family>|<encQueue><gen 8B><rest>. HT-FIFO partitioned queues use a different shape (adapter/sqs_keys.go SqsPartitionedMsg*Prefix):

!sqs|msg|<family>|p|<encQueue>|<partition u32, 4B><gen u64, 8B><rest>

The legacy heuristic read "p" as the queue segment and surfaced ErrSQSMalformedKey. In any cluster running partitioned FIFO queues, HandleMessageData would refuse every record and the dump pass would abort (P1); side records would similarly collapse into one mis-routed bucket under --include-sqs-side-records (P2).

Both parsers now branch on the literal p| discriminator and route into a new parseSQSPartitionedQueueAndTrailer helper that peels the discriminator, reads the queue segment up to the | terminator, validates the partition+gen trailer, and (for msg-data keys) validates the trailing msg-id segment.

Tests: TestSQS_ParsePartitionedMessageDataKey, TestSQS_ParsePartitionedSideRecordKey (vis/byage/dedup/group), TestSQS_ParsePartitionedMessageDataKey_RejectsTruncatedTrailer. Legacy parser path is unchanged for non-partitioned keys.

go test -race and golangci-lint clean. Fix merged forward into #716 / #718.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Review of commit f923d99f (Round 3)

  • Read internal/backup/sqs.go — partitioned key parsing changes
  • Read adapter/sqs_keys.go — verify key shape assumptions
  • Read internal/backup/sqs_test.go — new tests
  • Review the partitioned parsing logic
  • Check the two previous nits from last round
  • Post review feedback

View job run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant