Skip to content

backup: Redis set encoder (Phase 0a)#758

Open
bootjp wants to merge 1 commit into
mainfrom
feat/backup-phase0a-redis-set
Open

backup: Redis set encoder (Phase 0a)#758
bootjp wants to merge 1 commit into
mainfrom
feat/backup-phase0a-redis-set

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 15, 2026

Summary

  • Adds the !st|meta| + !st|mem|sets/<key>.json encoder per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
  • Members are emitted as a sorted byte-order array (not a JSON object) so binary-safe member bytes round-trip via the typed {"base64":"..."} envelope without colliding under percent-encoded JSON object keying — same shape as the hash encoder's fields array.
  • Duplicate HandleSetMember calls collapse via a map[string]struct{} buffer; Redis sets are mathematical sets, so a snapshot iterator that re-emits an !st|mem| key is harmless.
  • TTL records on !redis|ttl| for a set user key fold into the set JSON's expire_at_ms field via the HandleTTL switch — same policy as hash + list encoders, no separate sidecar.
  • The uint64 → int64 overflow guard applied symmetrically to the hash + list encoders (see PR backup: Redis list encoder (Phase 0a) #755 round 2) is also enforced here on the set's declared-length field.
  • !st|meta|d| (LenDelta) records are silently skipped: !st|mem| keys are the source of truth at backup time.

Test plan

  • go test -race ./internal/backup/... — all 11 new set tests + existing hash/list/string suites pass (1.4s)
  • golangci-lint run ./internal/backup/... — 0 issues
  • Coverage: sorted byte-order round-trip; empty set still emits a file (SCARD==0 observable); TTL inlined from scan index; length-mismatch warning shape; binary member base64 envelope; meta-delta key silently skipped; malformed meta length rejected; overflow guard rejected; members-without-meta path; duplicate members collapse; parser-level delta-key rejection; math.MaxInt64 boundary accepted.

Self-review (5 lenses)

  1. Data loss: encoder writes via writeFileAtomic (tmp+rename). Only meta-deltas are skipped, which is consistent with the hash/list policy and the live read path's source-of-truth invariant.
  2. Concurrency / distributed: RedisDB.Handle* methods documented not goroutine-safe; decoder pipeline is sequential per scope. No shared mutable state.
  3. Performance: members buffered in map[string]struct{}; single allocation per member. Sort at flush is O(n log n); set member count bounded by maxWideColumnItems on the live side.
  4. Data consistency: redis_set_length_mismatch warning fires on declared-vs-observed drift; sorted byte order is deterministic across runs.
  5. Test coverage: 11 named tests cover every public handler plus the parse-error paths, the new overflow guard, and the duplicate-members idempotency contract.

Phase 0a remaining after this PR

  • redis_zset.go (!zs|...) encoder — sorted-set with (member, score) records, JSON shape {"format_version": 1, "members": [{"member":..., "score":...}], "expire_at_ms": null}
  • redis_stream.go (!stream|...) encoder — JSONL output (streams/<key>.jsonl with _meta trailer)
  • cmd/elastickv-snapshot-decode/ CLI binary
  • cmd/elastickv-snap-token helper
  • docs/operations/snapshot_restore.md runbook

Adds the !st|meta| + !st|mem| → sets/<key>.json encoder per the Phase
0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md).
Wire format mirrors store/set_helpers.go:

  - !st|meta|<userKeyLen(4)><userKey>          → 8-byte BE Len
  - !st|mem|<userKeyLen(4)><userKey><member>   → empty value (member
                                                bytes live in the
                                                key, binary-safe)
  - !st|meta|d|<userKeyLen(4)><userKey>...     → skipped silently
                                                (same policy as hash
                                                and list deltas)

Output JSON shape matches the design's other wide-column types:

  {"format_version": 1,
   "members": [..., {"base64":"..."}, ...],
   "expire_at_ms": null | <ms>}

Members are emitted as a sorted array (not a JSON object) for the
same binary-safety reason the hash encoder uses an array for fields:
distinct binary member names can collide under JSON's percent-encoded
object-key path, and base64-envelope encoding for non-UTF-8 members
keeps each record byte-faithful. Duplicate HandleSetMember calls
collapse via a map[string]struct{} buffer so a snapshot iterator that
re-emits the same !st|mem| key is harmless (Redis sets are
mathematical sets, not multisets).

TTL records on !redis|ttl| route into the set JSON's expire_at_ms
field via the HandleTTL switch — same fold-into-record policy as
hash + list encoders; no separate sidecar.

The uint64 → int64 overflow guard for the declared-length field is
applied symmetrically (matching the hash + list encoders) so a
corrupted store with the high bit set fails closed at meta-record
ingest rather than silently wrapping to negative declaredLen and
firing spurious redis_set_length_mismatch warnings.

Refactor: leverages the existing flushWideColumnDir generic helper
introduced for the list encoder (PR #755) — no further changes to
redis_string.go's shared infra.

Tests cover: sorted byte-order round-trip; empty set still emits a
file (SCARD==0 observable); TTL inlined from scan index; length-
mismatch warning shape; binary member base64 envelope; meta-delta
key silently skipped; malformed meta length rejected; overflow guard
rejected; members-without-meta still emits without false-positive
mismatch warning; duplicate members collapse idempotently; parser-
level delta-key rejection; math.MaxInt64 boundary accepted.

Phase 0a remaining after this PR: redis_zset / redis_stream
encoders, cmd/elastickv-snapshot-decode CLI, cmd/elastickv-snap-token
helper, docs/operations/snapshot_restore.md runbook.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 21 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dbe9855a-1da3-4ed6-a23b-16014bbd1655

📥 Commits

Reviewing files that changed from the base of the PR and between b59e587 and 0552136.

📒 Files selected for processing (3)
  • internal/backup/redis_set.go
  • internal/backup/redis_set_test.go
  • internal/backup/redis_string.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/backup-phase0a-redis-set

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a Redis set encoder that translates snapshot records into a per-set JSON format, including support for TTL inlining and deterministic member sorting. The review feedback identifies the use of a hash-specific constant in the set encoder and suggests using a local value instead. Additionally, it emphasizes the importance of correctly handling all wide-column types in the HandleTTL and Finalize logic to prevent data loss or state inconsistencies.

if !ok {
return nil, nil, false
}
member := rest[hashUserKeyLenSize+len(userKey):]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constant hashUserKeyLenSize appears to be a leftover from the hash encoder implementation. While it likely holds the correct value (4 bytes), using a hash-specific constant in the set encoder is misleading and creates an unnecessary dependency on the hash implementation details. Please define a local constant for the set encoder or use a more generic shared constant if one exists in the backup package.

Suggested change
member := rest[hashUserKeyLenSize+len(userKey):]
member := rest[4+len(userKey):]

r.warn("redis_orphan_ttl",
"count", r.orphanTTLCount,
"hint", "remaining wide-column encoders (set/zset/stream) have not landed yet")
"hint", "remaining wide-column encoders (zset/stream) have not landed yet")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hint message for redis_orphan_ttl has been updated to remove set, which is correct as the set encoder is now implemented. However, ensure that the corresponding logic in HandleTTL and Finalize correctly handles all wide-column types. This is crucial to avoid silently dropping entries or creating state inconsistencies during normalization, ensuring all data is preserved on disk.

References
  1. Avoid silently dropping entries during serialization or normalization. Provide specific handling for all valid contexts to ensure they are preserved on disk and avoid state inconsistencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant