Skip to content

v0.17.0

Choose a tag to compare

@clemlesne clemlesne released this 01 Mar 22:09
· 68 commits to main since this release

Highlights

Output limits — The guest agent now enforces output size limits and raises OutputLimitError when exceeded, preventing runaway output from consuming host memory.

VM resilience — Stale connection detection after guest agent idle timeout, zram memory limits to prevent OOM thrashing, and proper distinction between system failures and user code results in execute().

Security hardening — Sensitive /proc files are masked to block guest reconnaissance, /dev/urandom and /dev/random are restricted to mode 0644, and REPL processes now run with HOME=/home/user instead of inherited /root.

CI stability — 52 test fixes eliminate flakes under TCG emulation on ARM64 runners, plus a new resource monitor for crash diagnosis.

What's changed

Output & streaming

  • Enforce output limits in guest agent, raise OutputLimitError on exceed
  • Prevent JS stdout truncation at 64KB pipe boundary
  • Flush stdout before sentinel to prevent truncation on large outputs
  • Retry drain_stdout on timeout to prevent stdout truncation
  • Defensively wrap on_stdout/on_stderr callbacks to prevent stream loop breakage

Security & input validation

  • Mask sensitive /proc files to block guest reconnaissance
  • Restrict /dev/urandom and /dev/random to mode 0644
  • Set HOME=/home/user for REPL processes instead of inherited /root
  • Prevent sparse file evasion of tmpfs disk limits
  • Raise CodeValidationError for empty/null-byte code instead of exit=-1
  • Reject oversized code with VmConfigError before VM boot
  • Document bun:ffi attack surface accessible from user JS code
  • Document accepted risk of env var blocklist visibility

Memory & resource management

  • Set zram mem_limit to prevent OOM thrashing on memory exhaustion
  • Prevent OOM when multiple Schedulers run in parallel
  • Enforce 384MB memory floor for snapshot VMs to prevent TCG zram thrashing
  • Page-align tmpfs mount sizes to prevent statvfs mismatch
  • Prevent QEMU thread exhaustion on CI ARM64 runners
  • Centralize resource limits and harmonize CPU/memory admission
  • Align SchedulerConfig.default_memory_mb with constants.DEFAULT_MEMORY_MB (192)
  • Destroy all tracked VMs on VmManager.stop() to prevent resource leaks
  • Await cancelled idle timer tasks to prevent orphaned coroutines

VM lifecycle & error handling

  • Detect stale connections after guest agent idle timeout
  • Distinguish system failures from user code results in execute()
  • Catch all transport/protocol errors in execute() for proper retry
  • Eliminate double admission wait on L1 restore → cold boot fallback
  • Centralize retry profiles, ephemeral VM lifecycle, and stream consumption
  • Move lifecycle methods from VmManager to QemuVM
  • Handle ConnectionResetError in dispatch loop
  • Split validation_error into domain-specific guest error types
  • Map guest package_error to PackageNotAllowedError

Snapshots & port forwarding

  • Restore port forwarding on L1 memory snapshot restore
  • Hold sockets open during batch port allocation to prevent duplicates
  • Bound ENOSPC recursion to one retry in _create_snapshot
  • Capture QEMU diagnostics on VM death during snapshot creation
  • Retry transient network errors during package installation
  • Offload vmstate SHA-256 hashing to thread pool

QEMU & emulation

  • Use microvm machine type for TCG x86_64 instead of pc (i440FX)
  • Use max CPU model for ARM64 TCG, capture diagnostics on exit-0
  • Use correct isa-fdc property names to disable floppy drives
  • Eliminate TOCTOU race in QMP socket connection
  • Expose performance Web API in JS VM sandbox context

Observability & logging

  • Expand setup_ms to cover full pre-boot phase and add teardown_ms metric
  • Use stdlib QueueHandler to prevent BlockingIOError under concurrent load
  • Move crash diagnostics from extra={} into log messages for CI visibility
  • Suppress noisy log on event loop closure during shutdown
  • Demote L1 cache miss traceback from WARNING to DEBUG
  • Catch RuntimeError in _write_worker with unit tests

CI & build

  • Add CI resource monitor for runner crash diagnosis
  • Build QEMU from source with io_uring, seccomp, and ARM64 TCG version guard
  • Upgrade pytest-asyncio to 1.x to fix orphaned task warnings
  • Consolidate memory optimization tests from 17 to 8

Docs

  • Fix snapshot_cache_dirdisk_snapshot_cache_dir and memory_snapshot_cache_dir
  • Add missing language parameter to port forwarding examples
  • Pin package versions in CLI examples to match validation requirement
  • Correct TCG emulation slowdown from 10-50x to ~5-8x

Test stability

52 test fixes targeting CI flakes under TCG emulation (ARM64 runners without KVM), including timeout increases, TCG-aware skips, retry wrappers, proportional thresholds, warmup runs, tracemalloc-based measurement, and marker reclassification.


Full changelog: v0.16.0...v0.17.0