fix(box): boxes.json cross-process lock, atomic image index, split signing.rs#8
Merged
Merged
Conversation
added 4 commits
May 31, 2026 10:27
Per the CLAUDE.md >1000-line split rule. Behavior-preserving:
- signing/crypto.rs: PEM/SPKI/ECDSA/base64 primitives + Fulcio X.509 helpers (pub(super))
- signing/sign.rs: SignResult + sign_image + private key parsing
- signing/mod.rs: policy/result types, cosign payloads, verify_image_signature
orchestration, and the test module (kept here). Re-imports the submodules
(use crypto::*; pub use sign::{sign_image, SignResult};) so all call sites and
the public API (oci::signing::{sign_image,SignResult,SignaturePolicy,VerifyResult})
are unchanged.
Verified: cargo clippy -D warnings clean; 34 signing unit tests pass.
boxes.json had no inter-process lock: every writer did load -> mutate -> save() and save() rewrites the whole record vector, so the monitor daemon, compose, per-box health checkers, and concurrent CLI commands clobbered each other (lost-update / resurrected / dropped records). Add a flock(LOCK_EX)-based StateLock (state/lock.rs) and a transactional StateFile::modify() (load->mutate->save under the lock; never .await inside), plus atomic add_record/remove_record helpers. save() now takes the lock too. Migrate the long-window async daemon writers (monitor poll_once + run_due_health_checks, per-box health loop) to reload-before-save modify(), and the common add/remove paths (create, rm) to the atomic helpers. Verified on Linux: 12 parallel `create`s now persist all 12 records (was lossy); state/network unit tests and core_smoke 14/14 still pass.
save_index_inner used a non-atomic tokio::fs::write, so a concurrent reader (another process running create/run) could observe a truncated/empty file, surfacing as "Failed to parse image store index: EOF". Write to a temp file and rename into place so readers always see the old or new index, never a partial one. Verified: 12 parallel creates no longer hit the EOF race.
libc::openpty takes a const winsize; pass &winsize (not &mut) so `cargo clippy --all-targets -- -D warnings` (the CI gate) stays green.
ZhiXiao-Lin
pushed a commit
that referenced
this pull request
May 31, 2026
The #8 state refactor switched rm_one to the atomic, lock-safe static StateFile::remove_record (disk), but left test_rm_force_removes_paused_stale_record asserting the in-memory handle, which rm_one no longer mutated — a latent failure not caught because release-branch PRs don't run ci.yml (main-only). Add a no-save StateFile::forget(id) and call it after remove_record so the in-memory handle stays consistent without a second clobbering save. Full workspace lib tests green (cli 535/0).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two remaining hardening items, on top of #7 (stacked on
fix/box-net-hardening).All verified on a real Linux x86_64 + KVM host; macOS arm64 + clippy
--all-targets -D warningsclean.Changes
boxes.jsonwrites (cross-process lock).boxes.jsonhadno inter-process lock;
save()rewrites the whole record vector, so themonitor daemon,
compose, per-box health checkers, and concurrent CLIcommands clobbered each other. Added a
flock(LOCK_EX)StateLock, atransactional
StateFile::modify()(load→mutate→save under the lock, never.awaitinside), and atomicadd_record/remove_record. Migrated thelong-window async daemon writers (monitor
poll_once+run_due_health_checks,per-box health loop) to reload-before-save
modify(), and the commonadd/remove paths (
create,rm) to the atomic helpers;save()now alsolocks. Verified: 12 parallel
creates now persist all 12 records (waslossy).
index.json.save_index_innerused a non-atomicwrite, so a concurrent reader saw a truncated file (
"Failed to parse image store index: EOF"). Now tmp + rename. Verified: the parallel-create EOFrace is gone.
oci/signing.rs(1458 lines) intosigning/{mod,crypto,sign}.rsper the >1000-line rule. Behavior-preserving: crypto/Fulcio primitives →
crypto.rs, signing →sign.rs, orchestration + tests stay inmod.rswithglob re-imports so all call sites and the public API are unchanged.
Verified: 34 signing unit tests pass.
clippy::unnecessary_mut_passedin thecore_smokepty test so--all-targets -D warnings(CI gate) stays green.Verification (real KVM host)
core_smokeregression: 14/14creates → 12 records (concurrency)cargo clippy --all-targets -D warningsclean; macOS arm64cargo checkcleanScope note
The remaining synchronous CLI mutators (pause/unpause/rename/snapshot/network/
container-update/start/restart/stop/compose) still use the now-locked
save()(writes serialized); migrating each to
modify()/add_record/remove_recordfor full per-command atomicity is a mechanical follow-up. The daemon (the
long-await primary offender) and the common create/rm paths are done.