Skip to content

v0.14.0 — Defense-in-Depth & Cold-Start Performance

Choose a tag to compare

@clemlesne clemlesne released this 23 Feb 11:14
· 259 commits to main since this release

Security

  • 17 kernel sysctls (Firecracker/Lambda alignment) — eBPF restricted, user namespaces blocked, kptr_restrict=2, dmesg_restrict=1, perf_event_paranoid=3, userfaultfd disabled, ptrace_scope=2, filesystem link protections, modules_disabled=1 (irreversible). Re-enabled mitigations=auto + KASLR (previously mitigations=off nokaslr).
  • Read-only rootfs — Ext4 mounted ro,nosuid,nodev by default. /bin, /sbin RO bind-mounted with nosuid (Alpine 3.23 usrmerge). /home/user tmpfs for writable scratch space.
  • REPL child hardening (5-stage)no_new_privs, drop to UID 1000 (setgidsetuid), PR_SET_DUMPABLE=0, RLIMIT_NPROC=1024 via raw prlimit64 from parent. Combined with kernel.threads-max=1200 for fork bomb defense-in-depth.
  • /proc hardeninghidepid=2 hides PID 1 from UID 1000. /proc/sys and /proc/sysrq-trigger RO bind-remounted.
  • /dev RO bind-remount — Blocks mknod. /dev/vda and /dev/block/ removed after mount.
  • /etc/hosts, /etc/resolv.conf RO bind-remounted — Prevents DNS spoofing/hijacking.

Performance

  • Python cold-start ~35-43% faster — jemalloc LD_PRELOAD eliminates musl malloc lock contention, .pyc precompilation with unchecked-hash mode, post-sentinel drain 50ms → 5ms.
  • JavaScript swap-thrashing eliminated — Bun --smol shrinks JSC heap 343MB → 54MB. REPL warm-up moved before balloon inflate (start at 256MB, idle at 42MB). 2.2x speedup at 160MB.
  • PVH direct boot (x86_64) — Uncompressed vmlinux skips gzip decompression. ~50ms boot savings.
  • io_uring re-enabled — Removed io_uring_disabled=2; VM isolation is the security boundary.
  • File-transfer pipelining — Guest spawn_blocking worker + host look-ahead reads. Up to 846 MiB/s download throughput.
  • QSD event-driven job completion — Replaced 10ms polling with event buffer + 50ms fallback. CPU during stress 80% → 11%.
  • Kernel cmdline tuningloglevel=1, numa_balancing=0, page_alloc.shuffle=0. ~200-300ms boot reduction.
  • Deferred cloudpickle — Lazy-load on first Process.start(). Saves 100-150ms for non-multiprocessing scripts.

Features

  • Alpine 3.21 → 3.23, kernel 6.12 → 6.18 LTS — Updated module paths, adapted for usrmerge, zram defrag_mode=1.
  • Warm pool for all languages — Removed hardcoded Python/JavaScript list; auto-discovers all Language variants including RAW.
  • Warm REPL pre-spawn — New warm_repl protocol command eagerly starts REPL before balloon inflate.
  • Orphan VM protectionexit-with-parent=on (QEMU 10.2+) kills QEMU when parent dies.
  • ESM import() in Bun — Dynamic module loading via __import wrapper in JS REPL.
  • WASM Web API exposure — Missing Web APIs added to JS REPL sandbox.
  • Guestfish fast-path — Guest-agent-only changes patched in ~41s vs full qcow2 rebuild (~3:52).

Bug Fixes

  • UTF-8 BOM stripping — Strip U+FEFF before code execution (all languages).
  • Null byte rejection — Code input validated before runtime with clear error.
  • CLI tilde expansion~ in --upload/--download paths now expanded.
  • Overlay pool cross-FSaiofiles.os.rename()shutil.move() with partial-target cleanup.
  • Balloon inflate floor 96MB → 128MB — 96MB caused freezes on kernel 6.18. QMP event filtering, watermark boosting disabled.
  • Guest-agent file ownership — Written files chown'd to sandbox user.
  • Benchmark p95 bug — Fixed nearest-rank percentile calculation.

Refactors

  • Guest-agent decomposed — 3,777-line monolith → 8 modules with 124 Rust unit tests. Removed tracing, once_cell, regex deps.
  • Tiny-init decomposed — Split into sys, device, zram modules with DRY helpers.
  • Protocol types unified — Type-safe dispatch, reduced boilerplate.
  • Build unifiedbuild and build-images merged. Guest-agent cargo runs in Docker.

Tests

Signal isolation (30 standard + RT signals blocked to PID 1, bypass attempts via tgkill/pidfd/rt_sigqueueinfo), WASM compatibility (27 tests: MVP through Wasm 3.0 SIMD + exceptions), psutil regression canary (CPU, memory, process, disk, network), Dask regression canary (schedulers, POSIX semaphores), session resilience (SIGKILL, OOM, repeated REPL death), kernel attack surface (eBPF, nf_tables, AF_PACKET, AF_VSOCK, user namespaces), QEMU attack surface (virtio-only device set), loop device + cgroupfs absence.