Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Adds an experimental snapshotDirectory API for containers backed by Docker volumes, including snapshot create, restore (with custom mount points), and stale-volume cleanup.
Findings (highest severity first):
- [MEDIUM] Dangling
kj::StringPtrin deferreddeleteDockerVolumecleanup —container-client.c++:1421 - [MEDIUM]
latestTypeScript types expose experimental-only snapshot fields and types —types/generated-snapshot/latest/index.d.ts:3785 - [LOW] Global mutable bool
staleSnapshotVolumeCheckScheduledwithout thread-safety annotation —container-client.c++:48
These comments were generated by an AI code review assistant and may contain inaccuracies.
|
Review posted successfully on PR #6376. Here's a summary of the findings: 3 issues found, ranked by severity:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6376 +/- ##
==========================================
- Coverage 70.69% 70.52% -0.17%
==========================================
Files 424 424
Lines 115450 115732 +282
Branches 18770 18824 +54
==========================================
+ Hits 81619 81624 +5
- Misses 22599 22882 +283
+ Partials 11232 11226 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
CC @anonrig for review |
|
/bonk review |
There was a problem hiding this comment.
3 findings ranked by severity:
- [MEDIUM]
normalizePathperforms O(n) allocations for n trailing slashes —container-client.c++:51-56 - [LOW] Snapshot ID validation allows all-hyphens, which produces a Docker volume named
workerd-snap----—container-client.c++:1258-1263 - [LOW]
normalizePathordering withvalidateAbsolutePathinsnapshotDirectory—container-client.c++:1388-1389
Previous review findings from PR #6337 and the prior review on this PR have been addressed:
- The dangling
kj::StringPtrinKJ_DEFER(finding #1 from ask-bonk) is already fixed — line 1421 passeskj::str(volumeName)(owned copy). - The
staleSnapshotVolumeCheckScheduledglobal is alreadystd::atomic_bool(line 48). - The
latesttypes issue is a pre-existing pipeline problem (author confirmed), not a regression.
Overall: the design is sound. The capnp schema additions, compat-flag gating, path validation, snapshot ID validation during restore, and test coverage (round-trip, named, multi-directory, custom mount point, root restore, cross-DO, non-existent directory) are thorough. The KJ_DEFER + volumeCommitted cleanup pattern correctly handles partial failures.
These comments were generated by an AI code review assistant and may contain inaccuracies.
| JSG_REQUIRE( | ||
| snapshotId.size() > 0 && snapshotId.size() <= 64, Error, "Invalid snapshot ID length"); | ||
| for (auto c: snapshotId) { | ||
| JSG_REQUIRE((c >= 'a' && c <= 'f') || (c >= '0' && c <= '9') || c == '-', Error, | ||
| "Invalid snapshot ID: must contain only hex digits and hyphens"); | ||
| } |
There was a problem hiding this comment.
[LOW] Snapshot ID validation allows degenerate IDs.
The character-class check permits IDs consisting entirely of hyphens (e.g., "----") or a single hyphen, which would create Docker volume names like workerd-snap----. This is unlikely to cause breakage since Docker accepts these names, but a leading/trailing hyphen check or requiring at least one hex digit would be a cheap defensive measure:
JSG_REQUIRE(snapshotId.size() > 0 && snapshotId.size() <= 64, Error, "Invalid snapshot ID length");
bool hasHex = false;
for (auto c: snapshotId) {
if ((c >= 'a' && c <= 'f') || (c >= '0' && c <= '9')) {
hasHex = true;
} else {
JSG_REQUIRE(c == '-', Error,
"Invalid snapshot ID: must contain only hex digits and hyphens");
}
}
JSG_REQUIRE(hasHex, Error, "Invalid snapshot ID: must contain at least one hex digit");This is minor — the IDs are generated by randomUUID() during creation and this validation only runs during restore from deserialized data.
| validateAbsolutePath(dir); | ||
| dir = normalizePath(kj::mv(dir)); |
There was a problem hiding this comment.
[LOW] validateAbsolutePath runs after normalizePath, which is fine, but the validation and normalization are also done in the restore path (lines 1266-1268) with normalizePath before validateAbsolutePath. The ordering is consistent and correct — normalizePath strips trailing slashes which doesn't change the absolute-path or path-traversal properties, so validation after normalization is safe.
Noting this for clarity: if normalizePath were ever extended to do more transformations (e.g., collapsing // or resolving .), the validation-after-normalization order could become important. The current implementation is fine.
|
Review posted on PR #6376. Here's a summary: Review body (top-level comment, review ID 3983426108):
3 inline comments:
Key verification results:
|
|
The generated output of |
971c76e to
7c84356
Compare
Expose a new experimental `snapshotDirectory` API for containers and include an implementation using Docker volumes (used by Wrangler for local dev). This is not yet implemented in production for Cloudflare Workers, but lays the groundwork for that feature.
Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>
7c84356 to
6fad9a1
Compare
|
Rebased on main and resolved conflicts. |
* Use .exchange() to update atomic boolean * Use consistent ordering for normalize -> validate code paths * Add tests for snapshotting a stopped container and restoring a container with a non-existent snapshot ID
|
/bigbonk roast this pr |
Co-authored-by: anonrig <anonrig@users.noreply.github.com>
|
Here's a summary of what was done: Addressed anonrig's review feedback on
|
Expose a new experimental
snapshotDirectoryAPI for containers and include an implementation using Docker volumes (used by Wrangler for local dev).This is not yet implemented in production for Cloudflare Workers, but lays the groundwork for that feature.
This PR is (as of now) identical in content to #6337 to make reviewing easier for those who reviewed the last PR. I'll address any review feedback in separate commits to keep things separated.
Implementation notes
The local dev implementation of snapshots uses a combination of Docker volumes and the Docker archive API. We use Docker volumes to manage and store the snapshots on disk (this is mostly a UX convenience for the user, so that workerd created snapshots are visible in
docker volume lsand can be easily removed withdocker volume prune), but we do not bind-mount the volumes when restoring snapshots, since Docker volumes do not have the same properties as snapshots (namely, snapshots should be immutable).Instead when creating or restoring a snapshot we create a temporary container (using the same image as the "main" container) with the volume attached and then copy between the containers using the archive API.
NOTE: This is known to perform pretty badly on macOS since each of the archive API calls sends the entire snapshot tar contents over the Docker UNIX socket, which traverses the VM<->host boundary on macOS, so has a lot of overhead. There may be some ways to improve this but for now the plan is to defer those performance improvements to a follow up PR so that we can at least land the functional implementation sooner.