v2.3.0
What's Changed
- ci: trigger on tags + self-hosted KVM real-microVM integration gate by @ZhiXiao-Lin in #56
- fix(state): quarantine corrupt boxes.json instead of silently wiping the fleet by @ZhiXiao-Lin in #57
- fix(tee): pin SNP chain to genuine AMD ARK roots + RSA-PSS (close attestation fail-open) by @ZhiXiao-Lin in #58
- feat(monitor): install the monitor as a supervised per-user service by @ZhiXiao-Lin in #59
- feat(monitor): Prometheus metrics + /healthz endpoint (--metrics-addr) by @ZhiXiao-Lin in #60
- fix(release): unblock + de-silence crates.io publish; honest Helm image note by @ZhiXiao-Lin in #61
- test(bench): reproducible perf + leak harness for the documented claims by @ZhiXiao-Lin in #62
- fix(restart): on-failure restarts only on non-zero exit (stop clean-exit loop) by @ZhiXiao-Lin in #63
- fix(cp): pipefail so a tar failure in dir copy isn't silently masked by @ZhiXiao-Lin in #64
- fix(pool): destroy source VM when snapshot-fork template build fails (leak) by @ZhiXiao-Lin in #65
- fix(seccomp): wrong-arch jumps to KILL_PROCESS not EPERM (BPF off-by-one) by @ZhiXiao-Lin in #66
- fix(state,monitor): two regressions introduced this session (#60 load_readonly, #59 plist) by @ZhiXiao-Lin in #67
- fix(run): route stop/cleanup writes through StateFile::modify (lost-update race) by @ZhiXiao-Lin in #68
- fix(update): persist via StateFile::modify across the live-apply awaits (lost-update) by @ZhiXiao-Lin in #69
- fix(network): cross-process lock for IP allocation (duplicate-IP / lost-endpoint race) by @ZhiXiao-Lin in #70
- fix(oci): cross-process lock + reload for the image index (lost-pull race) by @ZhiXiao-Lin in #71
- fix(cri): stop workload instead of orphaning it when StartContainer loses the Created CAS by @ZhiXiao-Lin in #72
- ci(kvm): use the runner's existing toolchain (dtolnay re-download blew the timeout) by @ZhiXiao-Lin in #73
- fix(tee): empty SNP cert chain fails closed (close latent fail-open) by @ZhiXiao-Lin in #74
- chore(shim): bump libkrun-sys pin 2.1.0 → 2.2.0 (version drift) by @ZhiXiao-Lin in #75
- feat(monitor): /healthz reflects poll-loop liveness (not a lying static 200) by @ZhiXiao-Lin in #77
- test(cri): regression for StartContainer lost-CAS orphan fix (#72) by @ZhiXiao-Lin in #78
- ci(kvm): gate leak-freeness under churn (bench/bench.sh leak) on real VMs by @ZhiXiao-Lin in #76
- fix(volume): cross-process lock on volumes.json (lost-update data loss) by @ZhiXiao-Lin in #79
- fix(oci): cross-process lock on credentials.json (lost-update on concurrent login) by @ZhiXiao-Lin in #81
- feat(bench+ci): cross-process race gate — lost-update detection on boxes.json by @ZhiXiao-Lin in #80
- fix(network): route run/compose/cleanup connect through with_write_lock (dup-IP race) by @ZhiXiao-Lin in #82
- fix(stop): a failed VM stop must not leak the box's resources by @ZhiXiao-Lin in #83
- fix(log): bound the raw console.log/console.err.log (unbounded disk growth) by @ZhiXiao-Lin in #84
- fix(cache): atomic, idempotent layer cache put (concurrent-pull corruption) by @ZhiXiao-Lin in #85
- fix(log): bound console.log for the
nonedriver too (disk-fill follow-up to #84) by @ZhiXiao-Lin in #87 - fix(monitor): tear down the orphaned VM when a box is rm'd mid-restart by @ZhiXiao-Lin in #89
- fix(logs): follow-by-name so
logs -fsurvives log rotation by @ZhiXiao-Lin in #90 - fix(cache): atomic, idempotent rootfs cache put (concurrent-build corruption) by @ZhiXiao-Lin in #88
- fix(cri): cancel the guest exec command when the client disconnects by @ZhiXiao-Lin in #91
- fix(cri): decode mountinfo octal escapes before unmount (CRITICAL host data-loss) by @ZhiXiao-Lin in #93
- fix(security): 3 fail-closed / secret-redaction fixes (audit #12/#18/#19) by @ZhiXiao-Lin in #94
- fix(build): contain COPY/ADD source + destination paths (traversal escapes, audit #3/#4/#27) by @ZhiXiao-Lin in #95
- fix: 4 correctness/atomicity fixes (audit #25/#17/#22/#30) by @ZhiXiao-Lin in #96
- fix: host cgroup leak + CRI corrupt-state data loss (audit #13/#28) by @ZhiXiao-Lin in #97
- fix: 3 security/correctness fixes (audit #34/#23/#26) by @ZhiXiao-Lin in #98
- fix: CRI CreateContainer TOCTOU orphan + kill -9 paused-box hang (audit #20/#24) by @ZhiXiao-Lin in #99
- fix(pool): destroy freshly-booted VMs if shutdown lands mid-replenish (audit #9) by @ZhiXiao-Lin in #100
- fix(state): reconcile-on-load must persist + tear down under the state lock (CRITICAL) by @ZhiXiao-Lin in #92
- fix(exec): host-side read timeouts so a wedged guest can't hang the host (audit #5/#6) by @ZhiXiao-Lin in #101
- fix(pool): cross-process lock the snapshot-fork template build (audit #31) by @ZhiXiao-Lin in #102
- fix(pool): bounded retry for snapshot-fork template builds (audit #29) by @ZhiXiao-Lin in #103
- fix(build): key COPY --from on the source stage's file content (audit #7, stale binary) by @ZhiXiao-Lin in #104
- fix(cri): CAS Created->Running before spawning workload (StartContainer double-spawn race) by @ZhiXiao-Lin in #105
- fix(tee): enforce RA-TLS proof-of-possession + bind sealed version into AEAD (audit #14, #15) by @ZhiXiao-Lin in #106
- fix(resize): fail instead of writing the root cgroup when the per-container slice is ambiguous (audit #32) by @ZhiXiao-Lin in #109
- fix(cli): guard CLI numeric parsing/conversion against integer overflow by @ZhiXiao-Lin in #110
- fix(guest): deferred-main (warm/IDLE boot) joins its per-container cgroup (audit #33) by @ZhiXiao-Lin in #108
- fix(snapshot): tolerate missing metadata fields + log instead of silent skip by @ZhiXiao-Lin in #111
- fix(run): plumb CPU cgroup limits to guest-init (--cpu-quota/period/shares silently dropped) by @ZhiXiao-Lin in #107
- fix(cli): clean up the box dir when create/restore fails before registration by @ZhiXiao-Lin in #112
- fix(shim): remove the dead host-side cgroup path (audit #35) by @ZhiXiao-Lin in #113
- fix(cgroup): enforce --memory-reservation/--memory-swap in-guest (audit #35 Part B) by @ZhiXiao-Lin in #114
- fix(guest): apply seccomp/caps/no_new_privs confinement on the PTY path (audit #11) by @ZhiXiao-Lin in #115
- fix(cri): tear down the sandbox VM if RunPodSandbox is cancelled before storing it (audit #21) by @ZhiXiao-Lin in #116
- fix(guest): give TTY containers the exec path's cgroup + path-restriction setup (#11 follow-up) by @ZhiXiao-Lin in #117
- chore(release): v2.3.0 — audit closure (security + hardening) by @ZhiXiao-Lin in #118
Full Changelog: v2.2.0...v2.3.0