backup: filename encoding (Phase 0a foundation) by bootjp · Pull Request #711 · bootjp/elastickv

bootjp · 2026-04-29T09:15:35Z

Summary

First piece of the Phase 0a logical-backup decoder described in docs/design/2026_04_29_proposed_snapshot_logical_decoder.md. Adds internal/backup/filename.go + tests — the filename encoding/decoding primitive every per-adapter encoder will depend on.

Subsequent PRs will add: KEYMAP.jsonl writer, per-adapter encoders (DynamoDB / S3 / Redis / SQS), the main decoder pipeline, and the cmd/elastickv-snapshot-decode CLI.

What this PR does

Encode: bytes in [A-Za-z0-9._-] pass through; every other byte becomes %HH. Long segments (>240 bytes after expansion) take a SHA-256-prefix + truncated-original fallback.
Decode: reverses percent and b64. segments; SHA-fallback inputs return ErrShaFallbackNeedsKeymap so callers cannot fabricate the original bytes from the filename alone.
Binary path: DynamoDB B-attribute keys take a separate b64.<base64url> form so binary keys never collide with hex-shaped string keys.

Test plan

go test -race ./internal/backup/... — all tests pass.
golangci-lint run ./internal/backup/... — clean.
Property tests via pgregory.net/rapid covering round-trip on both encoding paths and SHA-fallback post-condition.
Negative tests: truncated %HH, non-hex digits, raw reserved bytes, malformed b64. segments.

Self-review

Data loss — N/A; pure encoding/decoding library. SHA-fallback explicitly returns a typed error rather than synthesizing a wrong key.
Concurrency — All functions are pure; no shared state. -race clean.
Performance — Single-pass encoders; pre-grown builder; no allocations beyond the output. SHA-256 only on the fallback path.
Data consistency — Encoder is deterministic given the same input. Decode is the inverse on non-fallback inputs (verified by rapid). SHA-fallback is documented as requiring KEYMAP.jsonl for full reverse.
Test coverage — Table-driven cases for the documented rules + rapid property tests for round-trip + targeted negative tests. New non-test code lines added: 222.

…ndation) Implements internal/backup/{filename.go,filename_test.go} per the Phase 0 design doc (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md). Encoding rules (deterministic, reversible from the filename alone except for the SHA fallback): - RFC3986 unreserved set [A-Za-z0-9._-] passes through. - Other bytes -> %HH (uppercase hex), like form-urlencoded but applied to every non-allowlisted byte. - Segments exceeding 240 bytes after percent-encoding render as <sha256-hex-prefix-32>__<truncated-original> with the full original recorded in KEYMAP.jsonl by the caller. DecodeSegment refuses these with ErrShaFallbackNeedsKeymap so callers cannot fabricate the original bytes from the filename alone. - DynamoDB B-attribute (binary) keys take a separate "b64.<base64url>" path so a binary key never collides with a hex-shaped string key. Tests cover passthrough, percent escaping, hex-uppercase invariant, SHA-fallback firing on long inputs (raw or post-expansion), binary round-trip, malformed-input rejection, output-length bound under adversarial inputs, and rapid-driven property tests for round-trip on both percent and binary paths.

coderabbitai · 2026-04-29T09:15:41Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 10 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 068e4ea6-8f6e-4f03-88c1-3454a02ca186

📥 Commits

Reviewing files that changed from the base of the PR and between 7ac9c04 and 1e28238.

📒 Files selected for processing (2)

internal/backup/filename.go
internal/backup/filename_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/backup-phase0a-filename

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 23 minutes and 10 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-04-29T09:16:09Z

@claude review

claude · 2026-04-29T09:16:33Z

Claude finished @bootjp's task in 4m 11s —— View job

Code Review: `internal/backup/filename.go`

The implementation is clean and well-structured overall, but there is one correctness bug and a project-convention violation that should be fixed before merging.

Bug (High) — SHA fallback detection collides with valid user keys

isShaFallback (filename.go:198–208) is not injection-proof.

The detector checks: "first 32 chars are hex-like, then __". The problem: all of those characters (0–9, a–f, A–F, _) are in the unreserved set and pass through percentEncode unchanged. A user key like:

0123456789abcdef0123456789abcdef__mykey

is entirely unreserved (len 38 < 240), so EncodeSegment emits it verbatim. Then DecodeSegment calls isShaFallback → true → returns ErrShaFallbackNeedsKeymap for what is actually a well-formed, directly-decodable key.

The existing test TestEncodeSegment_ShaFallbackPrefixCannotCollideWithEncodedHex misses this case: it only tests inputs that are one char too short or that have a wrong separator. A test case that demonstrates the current breakage:

key := "0123456789abcdef0123456789abcdef__mykey"
enc := EncodeSegment([]byte(key))
// enc == key (all unreserved, length 38)
_, err := DecodeSegment(enc)
// currently: ErrShaFallbackNeedsKeymap  ← WRONG
// should be: nil, []byte(key)

The property test doesn't reliably catch this because it draws inputs up to 80 bytes randomly — the probability of hitting exactly 32 hex chars + __ is negligible.

Root cause: _ is unreserved, so __ can naturally appear in encoded user-key output. Any separator composed entirely of unreserved bytes can collide.

Fix: use a separator byte that can only appear in encoder-generated output. Since % is never a literal character in percent-encoded output (a user byte % would be encoded as %25), embedding % in the SHA-fallback separator makes it injection-proof. For example:

shaFallbackSeparator = "%__"
// isShaFallback then checks seg[32:35] == "%__"

A user key producing % at position 32 of its encoded form would mean the user byte at that position is non-unreserved, which would already break the 32-char hex check. So this separator cannot collide with any percent-encoded user-key output.

This is a wire-format change and is cheap to make now in Phase 0a before any data is written. Fix this →

Convention violation (Medium) — `//nolint:mnd` suppressions

CLAUDE.md states: "Avoid //nolint — refactor instead."

There are 9 //nolint:mnd suppressions across filename.go. Each can be removed by extracting a named constant:

Location	Magic value	Suggested constant
`percentEncode`, line 139	`3` (len("%HH"))	`percentEncodedMaxExpansion = 3`
`percentEncode`, lines 146–147	`4` (nibble width), `0x0F` (low-nibble mask)	`nibbleWidth = 4`, `lowNibbleMask byte = 0x0F`
`percentDecode`, line 164	`2` (hex digits after `%`)	`hexEscapeDigits = 2`
`percentDecode`, line 182	`4` (nibble width)	same `nibbleWidth`
`hexUpper`, lines 228, 231	`10` (decimal/hex boundary)	`hexAlphaBase = 10`
`unhex`, lines 239, 241	`10`	same `hexAlphaBase`

Fix this →

Minor — `shaFallbackHexPrefixBytes` misleading name

shaFallbackHexPrefixBytes = 32 is 32 hex characters, which encode 16 bytes. The name suffix Bytes implies raw bytes, but the constant is used as a character count. Consider shaFallbackHexPrefixLen or shaFallbackHexChars to avoid confusion at the call sites.

What's well done

Encoding logic is correct for the non-colliding cases: percentEncode / percentDecode are proper inverses; the boundary check at i+2 >= len(seg) is correct; hexUpper / unhex are symmetric.
EncodeBinarySegment separation cleanly prevents the b64. form from colliding with hex-shaped string keys.
Negative tests (TestDecodeSegment_Rejects*) cover truncated escapes, non-hex digits, raw reserved bytes, and malformed base64 — exactly the right failure cases.
Property tests via rapid cover the round-trip invariant and the SHA-fallback post-condition for randomly generated inputs.
Length bound guarantee (TestEncodeSegment_OutputLengthBoundedByMax) is a good correctness invariant test.
SHA-fallback human-suffix test confirms operator greppability.

gemini-code-assist

Code Review

This pull request introduces a filename encoding and decoding system for logical backups, supporting percent-encoding, base64-encoded binary segments, and a SHA-256 fallback for segments exceeding 240 bytes. The review feedback highlights critical improvements for robustness, specifically addressing potential OOM risks by enforcing length checks before large memory allocations in encoding and decoding functions. Furthermore, a logic fix was suggested for EncodeSegment to prevent user-supplied keys from colliding with the SHA-fallback pattern, ensuring deterministic decoding.

gemini-code-assist · 2026-04-29T09:17:39Z

+func EncodeSegment(raw []byte) string {
+	encoded := percentEncode(raw)
+	if len(encoded) <= maxSegmentBytes {
+		return encoded
+	}
+	return shaFallback(raw)
+}


The current implementation of EncodeSegment has two issues:

OOM Risk: It calls percentEncode(raw) before checking the length. If raw is extremely large, percentEncode will attempt to allocate 3 * len(raw) bytes (via strings.Builder.Grow), which can lead to an OOM. Since any raw longer than maxSegmentBytes is guaranteed to exceed the limit after encoding, we should short-circuit early.

Collision Bug: A user-supplied key that happens to match the SHA-fallback pattern (32 hex characters followed by __) will be returned as-is by EncodeSegment but will be misidentified as a fallback by DecodeSegment, making it undecodable without a keymap. EncodeSegment should detect this collision and force the SHA-fallback path to ensure the invariant that DecodeSegment only returns ErrShaFallbackNeedsKeymap for segments that actually require the keymap.

Suggested change

func EncodeSegment(raw []byte) string {

encoded := percentEncode(raw)

if len(encoded) <= maxSegmentBytes {

return encoded

}

return shaFallback(raw)

}

func EncodeSegment(raw []byte) string {

if len(raw) > maxSegmentBytes {

return shaFallback(raw)

}

encoded := percentEncode(raw)

if len(encoded) <= maxSegmentBytes && !isShaFallback(encoded) {

return encoded

}

return shaFallback(raw)

}

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to inputs that can trigger large allocations.

When deserializing or processing data, pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.

gemini-code-assist · 2026-04-29T09:17:39Z

+func EncodeBinarySegment(raw []byte) string {
+	enc := binaryPrefix + base64.RawURLEncoding.EncodeToString(raw)
+	if len(enc) <= maxSegmentBytes {
+		return enc
+	}
+	return shaFallback(raw)
+}


Similar to EncodeSegment, EncodeBinarySegment should guard against large inputs to prevent excessive memory allocation in base64.EncodeToString before the length check is performed.

Suggested change

func EncodeBinarySegment(raw []byte) string {

enc := binaryPrefix + base64.RawURLEncoding.EncodeToString(raw)

if len(enc) <= maxSegmentBytes {

return enc

}

return shaFallback(raw)

}

func EncodeBinarySegment(raw []byte) string {

if len(raw) > maxSegmentBytes {

return shaFallback(raw)

}

enc := binaryPrefix + base64.RawURLEncoding.EncodeToString(raw)

if len(enc) <= maxSegmentBytes {

return enc

}

return shaFallback(raw)

}

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to inputs that can trigger large allocations.

When deserializing or processing data, pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.

gemini-code-assist · 2026-04-29T09:17:39Z

+func DecodeSegment(seg string) ([]byte, error) {
+	if isShaFallback(seg) {


As a defensive measure, DecodeSegment should validate that the input segment length does not exceed maxSegmentBytes. Since all valid encoded segments (including fallbacks) are guaranteed to be within this limit, rejecting oversized inputs prevents unnecessary allocations in percentDecode or base64.DecodeString.

Suggested change

func DecodeSegment(seg string) ([]byte, error) {

if isShaFallback(seg) {

func DecodeSegment(seg string) ([]byte, error) {

if len(seg) > maxSegmentBytes {

return nil, errors.Wrap(ErrInvalidEncodedSegment, "segment length exceeds maximum")

}

if isShaFallback(seg) {

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections or inputs that can grow from external requests.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25f5da368d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T09:18:54Z

+		if _, ok := unhex(seg[i]); !ok {
+			return false
+		}
+	}
+	return seg[shaFallbackHexPrefixBytes:shaFallbackHexPrefixBytes+len(shaFallbackSeparator)] == shaFallbackSeparator


Disambiguate SHA fallback from plain encoded segments

Any segment that begins with 32 hex characters followed by __ is treated as SHA fallback, even when it is a normal output of EncodeSegment (e.g. key 0123456789abcdef0123456789abcdef__x, which is unreserved and <=240 bytes). In that case EncodeSegment returns the raw string but DecodeSegment returns ErrShaFallbackNeedsKeymap, so round-trip decode fails for valid keys and restore logic can incorrectly require a KEYMAP entry that was never written.

Useful? React with 👍 / 👎.

Two reviewer-flagged correctness issues in EncodeSegment: 1. SHA-fallback collision (Gemini HIGH, Codex P1): A user key whose percent-encoded form happens to begin with 32 hex characters + "__" (e.g. "0123456789abcdef0123456789abcdef__x") passed through EncodeSegment unchanged. DecodeSegment's structural detection then misclassified the legitimate key as a SHA fallback and returned ErrShaFallbackNeedsKeymap, breaking round-trip for keys that should not require KEYMAP.jsonl. EncodeSegment now promotes such inputs to a real SHA fallback so KEYMAP.jsonl carries the original bytes for exact recovery. 2. Huge-input transient allocation (Gemini HIGH on EncodeSegment and EncodeBinarySegment, MEDIUM on DecodeSegment): percentEncode pre-grew its builder to 3*len(raw); a 1 GiB raw input would allocate 3 GiB before the length check fired. The encoders now short-circuit to shaFallback when the worst-case expansion (3*len for percent-encoded, 4/3*len + 4 for b64) would always overflow the ceiling. DecodeSegment refuses inputs longer than maxSegmentBytes outright as a defensive measure. Tests added: - TestEncodeSegment_KeyMatchingShaFallbackShapeIsPromotedToFallback - TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion - TestDecodeSegment_RejectsOversizedInput - TestEncodeBinarySegment_HugeInputTakesShaFallbackWithoutEncoding

bootjp · 2026-04-30T10:22:29Z

Review feedback addressed in e7a84eba:

HIGH (Gemini line 90, Codex P1 line 207) — SHA-fallback collision
Confirmed bug. A user key like 0123456789abcdef0123456789abcdef__suffix (35 unreserved bytes) passed through EncodeSegment unchanged, then DecodeSegment's structural detector misclassified it as a SHA fallback and returned ErrShaFallbackNeedsKeymap. Round-trip broke for legitimate keys that should not have required KEYMAP.jsonl. EncodeSegment now promotes such inputs to a real SHA fallback so KEYMAP.jsonl carries the original bytes.

HIGH (Gemini lines 90, 104) — huge-input transient allocation
percentEncode pre-grew its builder to 3 * len(raw); a 1 GiB raw input would allocate 3 GiB just to be discarded by the length check. Encoders now short-circuit to shaFallback when the worst-case expansion would always overflow the ceiling — 3*len for percent, 4/3*len + 4 for b64.<base64url>. No allocation of the full expansion happens for adversarially large inputs.

MEDIUM (Gemini line 111) — DecodeSegment defensive length check
DecodeSegment now rejects inputs longer than maxSegmentBytes outright with ErrInvalidEncodedSegment. EncodeSegment never produces such inputs; any caller passing one is reading a corrupted dump or has a bug.

New tests:

TestEncodeSegment_KeyMatchingShaFallbackShapeIsPromotedToFallback — the collision-shaped input
TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion — 1 MiB raw, asserts SHA fallback + bounded output
TestDecodeSegment_RejectsOversizedInput
TestEncodeBinarySegment_HugeInputTakesShaFallbackWithoutEncoding

@claude review

claude · 2026-04-30T10:22:41Z

Claude finished @bootjp's task in 3m 1s —— View job

Code Review: `internal/backup/filename.go` (round 2)

Read current state of filename.go and filename_test.go
Verify SHA-fallback collision fix
Verify huge-input OOM fix
Verify DecodeSegment defensive length check
Verify new tests are correct and sufficient
Look for any remaining issues

All three high/medium issues from round 1 are correctly fixed. Two convention items from round 1 remain unaddressed.

Fixes verified ✓

SHA-fallback collision (EncodeSegment, lines 98–107)

The logic is correct. After percentEncode(raw), the result is checked with isShaFallback(encoded): if the encoded form happens to match the 32-hex-chars + "__" pattern (possible only when the first 32 raw bytes are all from {0-9, a-f, A-F} and bytes 32–33 are __, all of which are unreserved and pass through unchanged), the encoder promotes to a real SHA fallback so DecodeSegment's structural check remains correct. The new test TestEncodeSegment_KeyMatchingShaFallbackShapeIsPromotedToFallback (line 312) directly exercises the broken case.

OOM short-circuit in EncodeSegment (line 99)

len(raw)*percentEncodeMaxExpansion > maxSegmentBytes is exactly right. At len(raw) = 80, 80*3 = 240 = maxSegmentBytes (not >), so percentEncode is still called and b.Grow allocates at most 240 bytes. At len(raw) = 81, we short-circuit. The bound is tight.

OOM short-circuit in EncodeBinarySegment (line 118)

Using base64.RawURLEncoding.EncodedLen(len(raw)) + len(binaryPrefix) > maxSegmentBytes is correct: EncodedLen is pure integer arithmetic with no allocation, so the check is free.

DecodeSegment defensive length guard (lines 138–141)

Clean. Any EncodeSegment output is ≤ maxSegmentBytes, so anything longer is either a corrupted dump or a caller bug — rejecting it outright is the right call.

Remaining issues (both carried from round 1)

Convention violation (Medium) — //nolint:mnd suppressions (9 sites)

CLAUDE.md: "Avoid //nolint — refactor instead." All nine suppressions in the file can be replaced with named constants:

Line	Literal	Suggested constant
174	`3` in `b.Grow(len(raw) * 3)`	use existing `percentEncodeMaxExpansion`
181	`4` (nibble shift)	`nibbleWidth = 4`
182	`0x0F` (low-nibble mask)	`lowNibbleMask byte = 0x0F`
199	`2` (hex digits after `%`)	`hexEscapeDigits = 2`
217	`4` (nibble shift)	same `nibbleWidth`
263, 266, 274, 276	`10` (decimal/hex boundary)	`hexAlphaBase = 10`

Note that percentEncodeMaxExpansion was added in this PR specifically to name the 3, but line 174 inside percentEncode still uses the literal with //nolint. The constant should be used there too.

Fix this →

Minor — shaFallbackHexPrefixBytes misleading name

Still names a character count, not a byte count (32 hex chars = 16 bytes of SHA-256). Consider shaFallbackHexPrefixLen or shaFallbackHexChars. Affects lines 44, 225, 234, 237, 242.

What's well done in this revision

The promoted-fallback path in EncodeSegment is the right design choice: it preserves the invariant that isShaFallback is authoritative on decoder input, with no separator change required.
base64.RawURLEncoding.EncodedLen used for the pre-check in EncodeBinarySegment is precise and allocation-free.
TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion (1 MiB) and TestEncodeBinarySegment_HugeInputTakesShaFallbackWithoutEncoding are the right regression anchors.
TestDecodeSegment_RejectsOversizedInput correctly validates the new guard.
Commit message is excellent — it maps each fix back to the specific reviewer and comment.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7a84eba67

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T10:26:04Z

+	if len(raw)*percentEncodeMaxExpansion > maxSegmentBytes {
+		return shaFallback(raw)


Remove overbroad fast path in EncodeSegment

The early-return condition len(raw)*percentEncodeMaxExpansion > maxSegmentBytes is not a safe “always overflow” check: it sends every input longer than 80 bytes to SHA fallback, including plain unreserved keys whose encoded form would still be <=240 bytes (e.g., 200 ASCII letters). That changes many reversible keys into ErrShaFallbackNeedsKeymap cases, creating unnecessary KEYMAP dependence and breaking the documented non-fallback round-trip behavior for 81–240 byte segments.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-30T10:26:04Z

+	if strings.HasPrefix(seg, binaryPrefix) {
+		raw, err := base64.RawURLEncoding.DecodeString(seg[len(binaryPrefix):])


Prevent b64. prefix collisions with string segments

Any segment beginning with b64. is decoded as binary, but EncodeSegment leaves unreserved string keys unchanged, so string keys like b64.Zm9v decode to different bytes (foo) and keys like b64.abc can fail decode entirely if the suffix is not valid base64url. This makes string-key round-tripping non-invertible for a realistic key prefix unless callers carry an external type channel for every segment.

Useful? React with 👍 / 👎.

Two P1 follow-ups from Codex. #100 Codex P1 -- overbroad fast path. The previous len(raw)*3 > 240 fast path tripped for any input longer than 80 bytes, including pure-ASCII keys whose 1:1 encoded form would have fit fine. That converted reversible keys into ErrShaFallbackNeedsKeymap unnecessarily. Replaced with: 1. len(raw) > maxSegmentBytes -> SHA fallback (1:1 lower bound; no encoding can fit). 2. percentEncodeBounded: stream-encodes with an in-loop overflow check so the partial allocation is bounded to maxSegmentBytes even on adversarial inputs that DO need escaping. Returns ("", false) on overflow so the caller takes the SHA path without seeing the partial output. TestEncodeSegment_LongUnreservedASCIIEncodesAsIs locks the correct 200-byte-ASCII round-trip. #146 Codex P1 -- b64. prefix collision. A user STRING key like "b64.foo" was returned as-is by EncodeSegment (all unreserved) and then misclassified by DecodeSegment as a binary segment, decoding the base64 to "foo" instead of round-tripping. EncodeSegment now promotes any input whose percent-encoded form starts with the binary prefix to a real SHA fallback, parallel to the existing SHA-shape collision check, so KEYMAP.jsonl carries the original bytes. TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback covers it. The previous huge-input OOM-guard property (no all-at-once 3*len(raw) allocation) is preserved by percentEncodeBounded; the existing TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion still passes.

bootjp · 2026-04-30T10:58:29Z

2 Codex P1 follow-ups addressed in 1e282382:

#100 — Overbroad fast path. len(raw)*3 > 240 falsely tripped for any input longer than 80 bytes, including pure-ASCII unreserved keys whose 1:1 encoding would have fit. Replaced with:

len(raw) > maxSegmentBytes → SHA fallback (1:1 lower bound, can't fit anyway).
percentEncodeBounded streams the encode and bails on overflow, capping the partial allocation at maxSegmentBytes. Restores the documented round-trip for 81–240 byte unreserved keys while preserving the OOM guard for adversarial inputs that DO need escaping.

TestEncodeSegment_LongUnreservedASCIIEncodesAsIs (200-byte ASCII round-trip) locks it in. The existing TestEncodeSegment_HugeInputDoesNotMaterialiseFullExpansion still passes because percentEncodeBounded never allocates more than maxSegmentBytes + 3.

#146 — b64. prefix collision. A user STRING key like b64.foo was returned as-is (all unreserved) and then misclassified by DecodeSegment as binary, decoding the base64 to foo instead of round-tripping. EncodeSegment now promotes any input whose encoded form starts with b64. to a real SHA fallback (parallel to the existing SHA-shape collision check). KEYMAP.jsonl carries the original bytes.

TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback.

@claude review

claude · 2026-04-30T10:58:48Z

Claude finished @bootjp's task in 2m 54s —— View job

Code Review: `internal/backup/filename.go` (round 3)

Read current state of filename.go and filename_test.go
Verify overbroad fast-path fix (percentEncodeBounded)
Verify b64. prefix collision fix
Verify new tests are correct and sufficient
Check remaining issues from round 2
Post final review

Both Codex P1 fixes are correctly implemented. One new low-priority observation.

Fixes verified ✓

Overbroad fast path (#100) — percentEncodeBounded (lines 121–145)

The streaming encoder is correct. It bails as soon as adding a character (+1) or a 3-byte escape (+escapeBytes) would exceed maxLen, so it never over-runs the ceiling. The pre-grow cap is bounded to min(3*len(raw), maxLen+3) = 243 bytes regardless of input size — never 3*len(raw) for adversarial inputs.

The early guard len(raw) > maxSegmentBytes (line 105) handles the 1:1 lower bound before percentEncodeBounded is called, which also keeps the cap multiplication within 3*240 = 720 (never overflows). The bound is tight: len(raw) = 80 bytes → 3*80 = 240 = maxSegmentBytes, so percentEncodeBounded is still called; at len(raw) = 81 the caller short-circuits first.

TestEncodeSegment_LongUnreservedASCIIEncodesAsIs (line 312) correctly anchors the previously-broken 200-byte ASCII case.

b64. prefix collision (#146) — EncodeSegment line 110

encoded, ok := percentEncodeBounded(raw, maxSegmentBytes)
if !ok || isShaFallback(encoded) || strings.HasPrefix(encoded, binaryPrefix) {
    return shaFallback(raw)
}

The check is correct and complete. It covers both fully-unreserved keys (b64.foo → b64.foo, promoted) and partially-reserved keys (b64.!!! → b64.%21%21%21, still has b64. prefix, promoted). TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback (line 331) is the right regression anchor.

Remaining issues (carried across all three review rounds)

Convention violation (Medium) — //nolint:mnd suppressions (now 10 sites)

Round 3 added 2 more at lines 141–142 inside percentEncodeBounded. All 10 can be replaced with named constants:

Lines	Literal	Suggested constant
141, 215	`c >> 4` (nibble shift)	`nibbleWidth = 4`
142, 216	`c & 0x0F`	`lowNibbleMask byte = 0x0F`
208	`len(raw) * 3`	use existing `escapeBytes` local
251	`hi << 4`	same `nibbleWidth`
297, 300, 308, 310	`10`	`hexAlphaBase = 10`

CLAUDE.md: "Avoid //nolint — refactor instead."

Fix this →

Minor — shaFallbackHexPrefixBytes misleading name

The constant counts hex characters, not raw bytes (32 hex chars = 16 bytes of SHA-256). Still present at lines 44, 259, 268, 271, 276. Consider shaFallbackHexPrefixLen or shaFallbackHexChars.

New observation (Low, non-blocking)

percentEncode (lines 205–219) is now unreachable from the main encoding path — percentEncodeBounded superseded it in EncodeSegment. Its only remaining caller is shaFallback (line 264), where it encodes the truncated suffix (at most 64 bytes, so the 3*64 = 192-byte pre-grow is safe). The two functions share an identical inner loop, which is minor duplication. Not a bug and not blocking, but percentEncode could be refactored away by having shaFallback call percentEncodeBounded with maxLen = shaFallbackTruncatedSuffixBytes*3.

What's well done in this revision

percentEncodeBounded's streaming design is exactly right for an OOM guard: bail early, cap pre-growth, no large intermediate allocation.
base64.RawURLEncoding.EncodedLen(len(raw)) pre-check in EncodeBinarySegment is allocation-free and precise.
The three promotion invariants (len > max, SHA-shape, b64. prefix) are all in a single readable ternary (line 110) with a clear doc comment above.
New tests TestEncodeSegment_LongUnreservedASCIIEncodesAsIs and TestEncodeSegment_KeyStartingWithBinaryPrefixIsPromotedToFallback are the right regression anchors for the P1 bugs.

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

This was referenced Apr 29, 2026

backup: KEYMAP.jsonl writer/reader and MANIFEST.json schema (Phase 0a) #712

Open

backup: Redis simple-type encoders (strings, HLL, TTL routing) #713

Open

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

		func DecodeSegment(seg string) ([]byte, error) {
		if isShaFallback(seg) {

		if len(raw)*percentEncodeMaxExpansion > maxSegmentBytes {
		return shaFallback(raw)

		if strings.HasPrefix(seg, binaryPrefix) {
		raw, err := base64.RawURLEncoding.DecodeString(seg[len(binaryPrefix):])

Conversation

bootjp commented Apr 29, 2026

Summary

What this PR does

Test plan

Self-review

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

bootjp commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: internal/backup/filename.go

Bug (High) — SHA fallback detection collides with valid user keys

Convention violation (Medium) — //nolint:mnd suppressions

Minor — shaFallbackHexPrefixBytes misleading name

What's well done

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: internal/backup/filename.go (round 2)

Fixes verified ✓

Remaining issues (both carried from round 1)

What's well done in this revision

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 30, 2026

Uh oh!

claude Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: internal/backup/filename.go (round 3)

Fixes verified ✓

Remaining issues (carried across all three review rounds)

New observation (Low, non-blocking)

What's well done in this revision

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

claude Bot commented Apr 29, 2026 •

edited

Loading

Code Review: `internal/backup/filename.go`

Convention violation (Medium) — `//nolint:mnd` suppressions

Minor — `shaFallbackHexPrefixBytes` misleading name

claude Bot commented Apr 30, 2026 •

edited

Loading

Code Review: `internal/backup/filename.go` (round 2)

claude Bot commented Apr 30, 2026 •

edited

Loading

Code Review: `internal/backup/filename.go` (round 3)