Release v0.18.0 · dualeai/exec-sandbox

Higher concurrency, lower latency

Default overcommit ratios retuned from a 500-VM sweep on production hardware. At the optimal config (CPU=2×, MEM=8×), exec latency is 57ms p50 / 71ms p95 with 39 VMs/s throughput at just 7% host RAM. Beyond 2× CPU, latency degrades sharply with no throughput gain.

New make bench-optimizer lets you find the optimal ratios for your own hardware.

Reliable at high VM counts

VMs no longer crash with SIGABRT ("Resource temporarily unavailable") under burst startup — root causes were a per-UID process limit applied per-VM and low default fd/thread limits
Concurrent asset downloads no longer corrupt cached files — a cross-process file lock serializes the full download+decompress pipeline
QMP connection race during snapshot restore eliminated via socket activation (fd inheritance instead of filesystem polling)
Admission queue wakeups reduced from O(N²) to O(N) under burst load, preventing unnecessary contention

Startup health checks

The scheduler now probes system limits at startup and auto-raises safe ones (fd limit, nproc) or warns about unsafe ones (overcommit mode, max_map_count, threads-max, cgroup pids.max). No more silent failures under load — operators get actionable guidance before the first VM boots.

Observability

New admission_ms timing field separates queue wait from infra setup — previously lumped together in setup_ms, making it impossible to distinguish capacity contention from actual overhead
README now includes a throughput/latency benchmark table with 5 configs for capacity planning

CI & testing

4 package-install tests skipped under TCG emulation (prevented 940s timeout flakes)
CI runners hardened with matching system limit tuning
ci-diagnose rewritten as failure-centric Python tool (replaces bash script)
Flaky test_peak_ram_per_vm stabilized on free-threaded Python 3.14t

Internal

Subprocess spawn boilerplate consolidated into start_managed_process() (6 call sites across QEMU, gvproxy, QSD, qemu-img)
Two-level async+flock locking primitive (lock_utils.py) prevents thread-pool starvation deadlock
Boot path parallelized (overlay + cgroup run concurrently)
Overlay pool pre-creates 100% of max slots (was 50%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.18.0

Choose a tag to compare

Sorry, something went wrong.