Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893
Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalize#893arreyder wants to merge 3 commits into
Conversation
81d1b03 to
73a98cf
Compare
ad6a638 to
1567f72
Compare
… dangle it WriteEnvelope previously held no context: a caller cancellation during the payload tar/zstd walk had to wait for filepath.Walk to drain the entire Pebble checkpoint before the function returned. Big-tenant checkpoints are GBs; that wait is the gap during which the io.Pipe upload pattern can end up with clean-EOF semantics instead of CloseWithError. WriteEnvelope now takes ctx as its first parameter. The walk callbacks check ctx.Err() per entry, and each regular-file copy is routed through new CtxCopy: chunked io.CopyN (4 MiB) with a ctx.Done() check between chunks. Bounded worst-case cancel latency stays well under a second even on slow disks while keeping the per-chunk select overhead invisible against typical c1z throughput. registeredStore.save (the Pebble equivalent of C1File.finalize) now runs the Checkpoint + WriteEnvelope + rename sequence on a context detached from the caller via dotc1z.FinalizeTimeout(), inside a uotel.StartWithLink new-root span. This protects the Pebble write path even when the caller is a direct C1ZStore consumer that didn't already detach (the syncer.Close path from PR 1 still detaches once at the outer layer; double-detach is harmless and gives defense in depth). Tests cover CtxCopy's four behaviors (happy path, cancel-before-start, cancel-mid-stream after one chunk, source-error propagation) plus WriteEnvelope refusing to walk on a cancelled context. The existing envelope round-trip tests are updated to pass t.Context() through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…k cancel test, single-alloc copy, error context
Multi-angle review surfaced 10 fix-now items. Nine applied here; one
deferred (encoder-pool reuse for writeZstdTar) because pkg/dotc1z's
pool.go can't be imported from pkg/dotc1z/format/v3 without a
restructure — filed as follow-up.
1. CtxCopy relocated. pkg/dotc1z/format/v3 was the wrong home for a
generic ctx-aware copy helper; sibling subpackages
(synccompactor, manager/s3, c1's c1age) would have ended up
importing format/v3 for an unrelated dep. New home:
pkg/dotc1z/iox — a leaf subpackage of dotc1z importable by
anything in the tree without circular risk.
2. Single allocation per CtxCopy call. Old impl called io.CopyN
per chunk, which means io.Copy allocated a fresh 32 KiB buffer
every iteration. New impl holds one buffer and uses
io.CopyBuffer + io.LimitReader. At Lilly's 72 GiB checkpoint
that's the difference between ~18k 32 KiB allocations vs one,
per file walked.
3. Error context on cancellation. CtxCopy now wraps ctx.Err()
with "iox.CtxCopy: copied %d bytes: %w". writeZstdTar/writeTar
wrap their per-entry ctx.Err with "c1z v3: tar(_zstd) walk
canceled at %q: %w" so operators triaging "context canceled"
in logs can see exactly which Pebble checkpoint file the walk
was visiting.
4. ctxCopyChunkSize unexported. Nothing external needed it and
exporting locked it into the public API surface.
5. CtxCopy doc fixed: "checks ctx.Done()" → "checks ctx.Err()"
(matches the implementation).
6. WriteEnvelope godoc tightened. Now explicit that magic, length,
and manifest bytes are written BEFORE the first ctx check
inside the walk, so callers using WriteEnvelope with an io.Pipe
must propagate ctx.Err() via CloseWithError to make downstream
consumers (e.g. s3manager) abort instead of finalizing a
truncated upload.
7. TestRegisteredStoreSaveSurvivesCanceledCtx — Pebble symmetry
with PR 1's TestC1FileCloseSurvivesCanceledCtx. Verifies a
pre-cancelled or pre-deadlined ctx through Close still produces
a durable, readable v3 envelope on disk.
8. TestWriteEnvelopeCanceledMidWalk — covers the mid-walk cancel
case the prior test couldn't (it only handled pre-call cancel).
Uses a teeWriter + cancelOnWriteOnce to fire the cancel after
the very first header bytes hit the writer, then asserts the
wrapped error names the in-flight tar entry.
9. CtxCopy edge cases — TestCtxCopyEmptySource (degenerate
zero-byte file in payloadDir), TestCtxCopyPropagatesDstWriteError,
TestCtxCopyPropagatesDstShortWrite. Plus
TestWriteEnvelopeEmptyPayloadDir for the "Pebble checkpoint
with no SSTs" round-trip.
10. Two pre-existing unused //nolint:gosec directives in the tar
walkers were now flagged by nolintlint because the lines moved;
removed them.
Deferred follow-up:
- writeZstdTar still allocates a fresh zstd.Encoder per save.
pkg/dotc1z/pool.go has getEncoder/putEncoder for exactly this
case; using it from format/v3 needs either a shared subpackage
for the pool or an injection-based API. Out of scope for this
review pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…filenames
PR 893 CI failures, fixed:
- gosec G122 fires under CI's pinned golangci-lint v2.12.2 on the
os.Open(path) inside both writeTar and writeZstdTar walk callbacks.
Local v2.8.0 doesn't have G122 yet, so nolintlint there reports
the directive as unused. CI is authoritative.
- TestWriteEnvelopeCanceledMidWalk built filenames with
string(rune('a'+i)) which generates Windows-invalid characters
('|', '{', '}', '~') at i >= 27. Switch to fmt.Sprintf("entry-%02d",
i) so go-test on windows-latest can create the inputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1567f72 to
3bcd4cd
Compare
| if errors.Is(err, io.EOF) { | ||
| return total, nil | ||
| } |
There was a problem hiding this comment.
🟡 Suggestion: io.CopyBuffer never returns io.EOF from a source-side EOF — it swallows it and returns (n, nil). The only way this check triggers is if dst.Write() returns io.EOF as a write error, in which case this would incorrectly treat it as successful completion. The check is effectively dead code today (the n < ctxCopyChunkSize branch at line 59 handles normal source exhaustion), but if you want to keep a defensive guard, consider checking err == io.EOF (identity, not wrapped) to avoid swallowing a wrapped writer error.
General PR Review: Thread ctx through v3 envelope, add CtxCopy helper, and detach pebble finalizeBlocking Issues: 0 | Suggestions: 1 | Threads Resolved: 0 Review SummaryThis PR threads Security IssuesNone found. Correctness IssuesNone found. Suggestions
Prompt for AI agents |
…ally bounds the loop (#899) * synccompactor: pass runCtx into the doOneCompaction loop Compact() built runCtx = context.WithTimeout(ctx, c.runDuration) but the loop body called doOneCompaction(ctx, ...). The deadline only gated the pre-flight select; individual compactions ran under the unbounded parent ctx and could blow past c.runDuration. This is the sibling of the bug fixed in #893 (WriteEnvelope walking under an unbounded context): a deadline-bearing context exists but the inner work uses a different one, so the deadline dangles instead of bounding the work. Pass runCtx into doOneCompaction so c.runDuration actually limits the loop, and when runCtx fires due to its deadline (rather than parent cancellation), surface the same clean "compaction run duration has expired" message the pre-flight check already uses — instead of letting a bare context.DeadlineExceeded bubble out of inner SQLite operations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(synccompactor): cover runDuration-expired and parent-cancel disambiguation TestCompactorRunDurationExpired locks in the user-visible contract that when c.runDuration expires (with parent ctx still alive), Compact returns the clean "compaction run duration has expired" error wrapping context.DeadlineExceeded. TestCompactorParentCtxCancelDoesNotMaskAsRunDuration locks in the disambiguation between runDuration expiry and parent ctx cancellation — a parent-cancelled runCtx must surface as the raw cause (context.Canceled) and must NOT be misreported as runDuration expiry. Without the disambiguation either branch could silently regress and mislabel cancellation as a deadline. Both tests reuse the existing makeEmptySync helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Stacked on #891. Merge that one first.
WriteEnvelopepreviously held no context: a caller cancellation during the payload tar/zstd walk had to wait forfilepath.Walkto drain the entire Pebble checkpoint before the function returned. Big-tenant checkpoints are GBs; that wait is the gap during which the io.Pipe upload pattern can end up with clean-EOF semantics instead ofCloseWithError.WriteEnvelopenow takes ctx as its first parameter. The walk callbacks checkctx.Err()per entry, and each regular-file copy is routed through a newiox.CtxCopy: chunkedio.CopyBuffer(4 MiB poll granularity, 32 KiB working buffer held across iterations — one allocation per call, not one per chunk).registeredStore.save(the Pebble equivalent ofC1File.finalize) now runs the Checkpoint + WriteEnvelope + rename sequence on a context detached from the caller viadotc1z.FinalizeTimeout(), inside auotel.StartWithLinknew-root span — same pattern as PR #891'sC1File.finalize. This protects the Pebble write path even when the caller is a direct C1ZStore consumer that didn't already detach.Changes
pkg/dotc1z/iox/copy.go(new, new package) —CtxCopyhelper. Lives in a leaf subpackage ofdotc1zso sibling subpackages (format/v3,engine/pebble, futuremanager/s3/synccompactorcallers) can import it without circular risk and without pulling in the rest of dotc1z.pkg/dotc1z/format/v3/envelope.go—WriteEnvelope(ctx, w, m, payloadDir)signature. Walk callbacks checkctx.Err()per entry and wrap with the in-flight tar entry name (c1z v3: tar_zstd walk canceled at "entry-X": ...) so operators triaging "context canceled" in logs can see exactly which Pebble checkpoint file was being visited. Godoc tightened to say magic+length+manifest bytes are written before the first ctx check, so io.Pipe callers must propagate ctx.Err() via CloseWithError on cancellation.pkg/dotc1z/engine/pebble/register.go—registeredStore.saveadopts the detach + new-root pattern. Honors the propagated parent ctx-err label from PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891 so the inner finalize span reports the real caller cancel state.pkg/dotc1z/engine/pebble/adapter_clone_sync.go— passes ctx through toWriteEnvelope.pkg/dotc1z/iox/copy_test.go(new) — 7 unit tests: happy, empty source, cancel-before-start, cancel-mid-stream (with read counter to prove the source is not over-consumed), source-error, dst-error, short-write.pkg/dotc1z/format/v3/envelope_test.go— addsTestWriteEnvelopeCanceledMidWalk(tee-writer fires cancel after the first header bytes hit the writer; asserts wrapped error names the in-flight entry) andTestWriteEnvelopeEmptyPayloadDir.pkg/dotc1z/engine/pebble/registered_store_save_cancel_test.go(new) — pebble symmetry with PR Detach ctx for c1z finalize so caller cancel cannot truncate the artifact #891'sTestC1FileCloseSurvivesCanceledCtx. A regression that replacesfinalizeCtxwith the caller ctx insave()would fail this test.Deferred to follow-up
writeZstdTarstill allocates a freshzstd.Encoderper save instead of usingpkg/dotc1z/pool.go's existing encoder pool. Using the pool fromformat/v3requires either moving the pool to a shared subpackage or threading an encoder via a new option — out of scope for this PR.Test plan
go test ./...cleanmake lintclean on the touched files🤖 Generated with Claude Code