2026.07.03 Release v0.5.0
CubeSandbox 0.5.0 introduces AutoPause/AutoResume, a platform-level sandbox lifecycle automation that transparently suspends idle sandboxes and resumes them on-demand on the next dataplane request. This release also delivers ARM64 (aarch64) native support across the entire stack — from hypervisor to CI/CD — a TencentCloud Terraform cluster deployer for production-grade one-click deployment, and network security hardening with per-sandbox traffic access tokens, CubeEgress fail-closed bootstrap, and policy-routing egress. Additional highlights include a pure-Go native rootfs export pipeline that bypasses Docker, skopeo, and umoci entirely, a snapshot runtime locking refactor to eliminate high-concurrency deadlocks, image uid/gid preservation fixes for non-root container images, E2B SDK alignment with complete filesystem and PTY APIs, and a one-click upgrade mode with three-way config merge. 116 commits from 26 contributors.
🎯 Major Features
AutoPause / AutoResume: Sandbox Lifecycle Automation
Sandboxes in agent workflows spend most of their time idle — waiting for user input, callbacks, or the next RL rollout cycle. AutoPause/AutoResume lets the platform automatically suspend idle sandboxes and instantly wake them on the next incoming request, releasing physical host resources during idle periods. This is implemented as a platform-side, per-sandbox capability with semantics aligned to the E2B lifecycle parameter.
- AutoPause mechanism: A sweeper in the new cube-proxy-sidecar (under
CubeProxy/sidecar/) tracks sandbox activity vialast_activetimestamps reported by CubeProxy'slog_phase.lua. Whenidle >= timeout_seconds, the sidecar triggers a pause through CubeMaster → Cubelet, which snapshots the full VM state (memory + filesystem) to/data/cubelet/root/pausevm/<sandbox>, then shuts down the MicroVM. A configurableBootstrapWarmupwindow prevents premature pausing during sidecar startup. - AutoResume mechanism: When a dataplane request arrives for a paused sandbox, CubeProxy's
sandbox_state.luagate intercepts it and fires an internal sub-request to the sidecar's/internal/resume. The sidecar drives a resume RPC through CubeMaster → Cubelet → containerd, which restores the VM from the pause snapshot. The dataplane request blocks until resume completes (bound by nginxproxy_read_timeout). Concurrent resumes for the same sandbox are coalesced in-process (singleflight pattern); cross-replica coordination uses Redis SETNX locks. - Configurable resource release ratio (#553): A node-level configuration
host.quota.paused_resource_release_ratio(float[0, 1], default0) controls how much CPU/memory quota paused sandboxes release back to the scheduler. At ratio1.0, all quota is released for maximum node density; at ratio0, paused sandboxes retain full quota (guaranteed resume). Before resuming, a local admission check verifies the node has capacity — if not, the resume is rejected with HTTP 409 and a precise capacity diagnostic. - Traffic access token gating (#639): Sandboxes created with
network.allow_public_traffic=falsereceive a per-sandboxtraffic_access_token(UUID v4). CubeProxy enforces this token on every inbound request (both cold-path and cache-hit), returning HTTP 403 for missing or mismatched tokens. Token values are redacted from all logs. Accepts bothe2b-traffic-access-tokenandcube-traffic-access-tokenheaders. - Kill-path lifecycle: Sandboxes with
on_timeout="kill"go through an idle timeout kill path withtask.Kill. NewPOST /cube/sandbox/timeoutandPOST /cube/sandbox/refreshAPIs exposeend_atfor deterministic lifecycle management. The CubeProxy gate returns410 Gonefor killing/killed sandboxes.
New files: CubeProxy/sidecar/ (Go binary — sweeper, resumer, registry, stream consumer, last-active poller, Redis coordination); CubeProxy Lua modules (sandbox_state.lua, admin_phase.lua); CubeMaster lifecycle endpoints; Cubelet pause/resume RPCs.
ARM64 (aarch64) Native Support
CubeSandbox now runs natively on ARM64 hosts, spanning the hypervisor, guest agent, shim, networking (BPF), build system, CI/CD, and deployment tooling. The work was a deep collaboration between Arm engineering and the Cube project team, progressing from feasibility to formal enablement.
- Hypervisor port (4dc7275): The SysCtrl device (guest-to-host signaling for shutdown, reboot, vsock-ready) was rewritten from PIO (x86-only) to MMIO for ARM64, registered on the
mmio_busatLEGACY_SYS_CTRL_MAPPED_IO_START. KVM register access (get_one_reg) was updated for a changed upstream API signature. Seccomp rules were aligned for ARM64 syscall number differences (SYS_lstatvsSYS_fstatat/SYS_newfstatat). Live migration support remains x86_64-gated. - Guest agent (cb5706a): The RPC readiness signal changed from x86
ioperm()+ PIO port write (port0x680) to ARM64/dev/memmmap at physical address0x0903_0000(SysCtrl MMIO region) withptr::write_volatile. Build target auto-detected from host arch. - CubeShim (fe3044c, 4feab88): Kernel command line adapted per architecture —
console=ttyAMA0,115200(ARM PL011 UART) vsconsole=hvc0(virtio-console). x86-only mitigations (no_timer_check,noreplace-smp) gated to#[cfg(target_arch = "x86_64")]. Seccomp allow-lists aligned:SYS_mkdir→SYS_mkdiraton ARM64; added missingSYS_faccessat2for glibc path resolution on ARM64. - BPF / CubeNet (cb263f2): Hardcoded
-target amd64in BPF//go:generatedirectives replaced with-target $GOARCH. Per-architecturevmlinux.hheaders (amd64 + arm64 BTF dumps). BPF object files regenerated at build time; prebuilt.ofiles removed from git. Requires clang ≥ 14 (added to builder image viaapt.llvm.org). - Multi-arch builds (c6bb376, 475b488):
Dockerfile.builderparameterized withTARGETARCHfor Go, protoc, and Rust toolchain downloads. CubeEgress, CubeAPI, and envd images support multi-arch manifest lists. A containerizedmake guest-kerneltarget supports both native and cross builds with architecture-specific kernel configs (kernel-oc9.x86_64.config,kernel-oc9.aarch64.config). - CI/CD (#720): Builder image, VMLinux, and one-click release workflows all produce per-architecture artifacts. Release workflow split into
release_amd64andrelease_arm64jobs running on native runners; both upload to the same GitHub Release. - Deployment (cb52bc7): Dev environment (
run_vm.sh) auto-detects architecture — ARM64 usesmachine virtwith UEFI firmware (qemu-efi-aarch64). Release bundle script packages per-architecturemkcertbinaries. - Known limitations (documented): PVM (nested KVM) is x86_64-only; ARM64 requires bare-metal with native KVM. Live migration is x86_64-only.
TencentCloud Terraform Cluster Deployer
A production-grade, one-click cluster deployment for TencentCloud driven entirely by Terraform IaC. From a single release bundle and create.sh entry point, the deployer provisions a full CubeSandbox cluster with managed control plane, HA middleware, and elastic compute nodes.
- Infrastructure provisioning (#629): Terraform provisions a private VPC (
10.0.0.0/16) with per-zone subnets, NAT Gateway with EIP, security groups, and a bastion jumpserver. Managed cloud services are automatically created:- MySQL (TencentDB 8.0): Multi-AZ with semi-sync replication, 4 GB / 200 GB, application account
cubewith database-level privileges. - Redis (TencentDB 7.0): Standard master/replica architecture, configurable memory (default 1 GB), password-protected.
- CFS (Cloud File Storage): NFS share for cube-master's shared persistent storage (
/data/CubeMaster/storage), mounted ReadWriteMany across replicas. Optional — enables multi-replica HA mode. - TCR (Tencent Container Registry): Private registry with VPC peering, namespace per deployment, long-lived access token.
- MySQL (TencentDB 8.0): Multi-AZ with semi-sync replication, 4 GB / 200 GB, application account
- TKE control plane (#629): Managed Kubernetes cluster (v1.34.1, GlobalRouter, containerd) with intranet-only apiserver. Four control-plane components deployed as Deployments with CLB Services:
- cube-master: Shared CFS NFS volume for template/snapshot/runtime state. CubeEgress MITM CA (ECDSA P-256, Terraform-generated). Internal CLB on port 8089.
- cube-api: Public CLB on port 3000, proxies to cube-master via cluster DNS.
- cube-proxy: Public CLB on ports 80/443, with Redis and TLS Secrets.
- cube-webui: Public CLB on port 80, nginx reverse-proxying to cube-api and cube-proxy.
- Configurable replicas (#658):
cubemaster_replicas,cube_api_replicas,cube_proxy_replicas,cube_webui_replicasvariables (default 1 for POC mode). WhenTENCENTCLOUD_USE_CFS=true, cube-master multi-replica HA is enabled withcubemaster_replicasdriving bothspec.replicasand scheduler concurrency apportionment. - Elastic compute nodes: Configurable CVM instances (PVM or bare-metal) in private VPC, auto-scaling via TKE node pool. Default 1 node, configurable count and instance types.
- One-click upgrade mode (#538):
install.sh --mode=upgradedetects existing installations, performs three-way.envconfig merge (new defaults + old customizations + explicit overrides), runs fail-fast preflight checks (disk space, semver compatibility, CIDR conflict), and backs up configuration before any destructive change. User customizations are preserved; secrets are redacted in the diff report but retained in the actual merged file. - External MySQL/Redis (#514): One-click installer supports pointing at pre-existing external MySQL and Redis instances via
CUBE_EXTERNAL_MYSQL_*/CUBE_EXTERNAL_REDIS_*variables. Local Docker containers are masked, and all components (CubeMaster, CubeAPI, CubeProxy) are configured to use the external endpoints. - Other installer improvements:
CUBE_PROXY_HOST_PORTdeprecated, split intoCUBE_PROXY_HTTP_PORT/CUBE_PROXY_HTTPS_PORT(#588). Manual SQL seed removed in favor of CubeMaster embedded migrations (#628). CubeEgress integrated into compute node startup (#707). Improved external dependency compatibility with shell-safe env persistence and redis-cli timeout detection (#673). Cubelet reporting interval and scheduler scoring configured for multi-node TencentCloud deployments (b8f2242). Image defaults updated to v0.5.0 tags (#719).
New files: deploy/one-click/terraform/tencentcloud/ (create.sh, destroy.sh, main.tf, variables.tf, tke-addons.tf, outputs.tf, lib-state-sync.sh, env.example); docs/guide/tencentcloud-terraform-deploy.md (EN + ZH).
Network Security Hardening
Three critical patches for sandbox network security: inbound access control, outbound fail-closed, and policy-routing egress.
- Per-sandbox traffic access token (#639): Described in AutoPause/AutoResume above. CubeProxy enforces
traffic_access_tokenon every inbound request whenAllowPublicTraffic=false, returning 403 for missing/bad tokens. Token values are never logged. - CubeEgress fail-closed bootstrap (c5f811d): During CubeEgress startup, before L7 policies are loaded from CubeMaster (bootstrap_status ≠ "ready"), the proxy now returns 403 for all non-audit traffic instead of the previous fail-open behavior. This eliminates the security gap where sandbox outbound traffic could bypass all host/SNI/method/path controls during a restart window.
- Route-aware egress / cube-router (7c514e9): An optional
cube-routerkernel dummy device allows sandbox outbound traffic to enter the Linux host routing stack instead of being hard-redirected to the primary NIC. Traffic can leave through any routable device (eth0, eth1, GRE tunnels, VXLAN, WireGuard), enabling seamless integration with existing multi-NIC and VPN network infrastructure. Existing CubeEgress L7 policy, DNS allow-list, and port mapping are all preserved. - BPF TCP checksum fix (7c2dd1f): Fixed invalid TCP checksums on cross-node port-mapped sandbox replies. The
snat_tcp()function was incorrectly usingBPF_F_PSEUDO_HDRwhen only the TCP port had changed, causing checksum corruption on multi-node deployments. - from_world cleanup (5b91946): Removed ineffective
from_worldTC filter attachment on the loopback interface, reducing unnecessary eBPF hook points. - Host service access (48d080e): Sandbox outbound traffic to the host's own IP is now redirected via BPF shortcut, enabling sandboxed code to reach host-local services.
- Network hardening guide (#663): New bilingual documentation covering default control-plane attack surface, binding strategies (private NIC vs firewall whitelisting), CubeAPI auth callback with path+method validation, and TLS/credential rotation guidance.
✨ Enhancements
SDK
- E2B filesystem API alignment (#678): Go and Python SDKs now implement the full E2B filesystem API surface:
list,stat,exists,remove,rename,mkdir, andwatch. Comprehensive integration tests included. - Python SDK PTY APIs (250d248): Complete PTY (pseudo-terminal) interface added to the Python SDK —
create,connect,kill,send_stdin,resize— with streaming output viaPtyHandleiterator. Speaks envd's Connect-JSON RPC directly; no dependency on e2b Python packages. - Python SDK E2B network.rules transforms (#568): Compatibility with E2B's per-host
{transform: {headers: {...}}}credential injection shape, translated into CubeEgress L7action.injectrules. Drop-in replacement for codebases using E2B's per-host credential injection. - Python SDK double-encoding fix (#572): Execution logs and errors are no longer double-encoded, fixing corrupted output display.
Performance
- Pure-Go native rootfs export (#558): A daemonless, pure-Go rootfs export pipeline that bypasses Docker, skopeo, and umoci entirely. Features concurrent prefetch, loop-mount streaming directly into ext4 block devices, and a "decompress-and-delete" strategy that significantly reduces peak memory and build time compared to the previous skopeo/umoci pipeline. Enabled by default.
- VirtIO block performance (#575): Enables
VIRTIO_BLK_F_SEG_MAX(multi-segment requests) andVIRTIO_RING_F_INDIRECT_DESC(indirect descriptors) in the hypervisor's virtio-blk device, improving sequential write throughput from 2888 MiB/s to 3293 MiB/s (fio benchmark).
Template Management
- Image pull progress TUI (#580): Real-time pull progress tracking with Redis-backed persistence. New
template watchandtemplate build-watchCLI commands provide an interactive bubbletea TUI showing live metrics (download speed, per-layer completion, step-by-step checklist), with plain-text fallback on non-TTY terminals. - Latest job ID in template list (#546): Template list and detail APIs now expose each template's latest create/rebuild job ID. The Web UI automatically opens build logs when viewing a template with an active (running/pending/building) job.
- Artifact resource leak fixes (#631): Introduces
t_cube_artifact_node_placementtable for node-level artifact tracking independent of replica lifecycle. Periodic artifact GC with MySQLGET_LOCKfor HA coordination. Hardened cascade cleanup across CubeMaster, Cubelet, and CubeAPI. AgentHub snapshot cascade cleanup with path traversal protection.
Web UI
- Template creation overhaul (#675): New multi-step template creation form with image source selection, instance type configuration, network settings, and advanced options. Improved validation and field organization.
AgentHub
- LLM env-var fallback removal (#602): All AgentHub LLM secrets and settings now live exclusively in the database, encrypted with a per-installation CSPRNG-generated master key. Environment variable fallback paths are eliminated. The one-click upgrade script actively deletes obsolete env keys.
decrypt_or_passthroughnow fails closed for undecryptable payloads. - Assistant state persistence (#582): Authentication, OpenClaw runtime state, template inheritance, and snapshot recovery behavior are now persisted in the database. Includes bcrypt-based auth, WeCom secret decryption, rate limiting on auth entrypoints. Enables backup/restore and safe cloning of digital assistants.
Deployment & Installer
- One-click upgrade mode (#538): Detailed in Terraform section above.
--mode=upgradewith three-way env merge, pre-upgrade backup, and fail-fast preflights. - External MySQL/Redis (#514): Detailed above. Support for pointing one-click installer at pre-existing external database instances.
- CUBE_PROXY port split (#588):
CUBE_PROXY_HOST_PORTdeprecated;CUBE_PROXY_HTTP_PORT(default 80) andCUBE_PROXY_HTTPS_PORT(default 443) provide separate HTTP/HTTPS control. - Static builds (#583):
cube-api,cubemaster, andcubemastercliare now fully static binaries (CGO_ENABLED=0/ musl target), eliminating host glibc dependency and preventing version-skew failures. - Embedded migrations (#628): Manual SQL seed removed; CubeMaster embedded goose migrations own single-node seed rows. Eliminates mysql client dependency on control nodes.
- CubeEgress compute integration (#707): CubeEgress integrated into compute node startup/down scripts, extending egress policy enforcement to all nodes.
- resolvectl compatibility (#703): Tolerates missing
resolvectl default-routeon older systemd (pre-v240), preventing install failures on RHEL 8.3 and similar distributions.
Infrastructure
- Redis key unification (#609): All Redis keys centralized in
pkg/base/rediskeywith consistent naming. Read/write pool separation dropped in favor of a single pool, simplifying multi-node deployment configuration. - Migration identity hardening (#620): Three-layer defense against silent migration skipping: 14-digit UTC timestamp prefixes for new migrations, out-of-order application support, and SHA-256 content fingerprinting with startup verification. CI enforces immutability of already-merged migration files.
- Node label management API (#633):
POST /nodes/{id}/labelsandDELETE /nodes/{id}/labelsendpoints for admin-managed labels. Kubernetes-compatible naming (DNS1123 subdomain prefix), SELECT FOR UPDATE race protection, 64-label limit, system-reserved namespace protection (kubernetes.io,beta.kubernetes.io,cube.cloud.tencentcloud.com). - envd version reporting (#650): Collected envd versions propagated as sandbox annotations, enabling E2B SDK feature gating against real runtime versions instead of a hardcoded constant.
- Configurable HTTP bind (#662):
CUBEMASTER_HTTP_BINDmakes the HTTP listen address configurable, supporting private-NIC binding for network security hardening. - TencentCloud scheduler scoring (b8f2242): Multi-node scheduler scoring with
real_time_weighted_averageplugin balancing mvm_num, local_create_num, cpu_usage, and quota_mem_usage.priority_select_numdynamically capped atmin(compute_node_count, 3).
🐛 Bug Fixes
These fixes address issues present in v0.4.0:
- High-concurrency rollback deadlock (#693): Snapshot runtime active binding refactored into a dedicated
t_cube_snapshot_runtime_activetable with sandbox-level and resource-level distributed locks. Eliminates MySQL 1213 deadlock errors. - Image uid/gid squashing (#671, #608): Two complementary fixes for image ownership preservation:
- Native export path (#671): Removed
WithNoSameOwner()option, which was squashing all file uid/gid to the extracting user, breaking image ownership. - Template center (#608):
umoci unpack --rootlessnow only passed when euid ≠ 0 (not when running as root). Docker-export fallback uses--same-owner --numeric-owner. Fixes Chromium profile write failures and CDP unreachability in browser sandboxes, and envd exec EACCES on/home/userfor Python images.
- Native export path (#671): Removed
- Hung hostdir mounts (#691): Hostdir bind and remount operations now run through a bounded 3-second timeout subprocess. Previously, stale NFS mounts or other hung filesystems would block sandbox creation indefinitely.
- CubeProxy envd streaming buffering (#647): Nginx response buffering disabled for envd server-streaming endpoints. Previously, nginx buffered early stream frames, breaking immediate-return semantics for background commands and watch streams.
- MSI-X table/PBA hardening (#619): Guest-triggered panics in the VMM's MSI-X table/PBA read/write paths replaced with graceful error handling. A malicious or buggy guest can no longer crash the VMM process via invalid MSI-X accesses.
- CubeProxy implementation detail leakage (#653): Removed
X-Cube-Retcoderesponse header and$cube_retcodeaccess-log field that exposed internal failure-mode codes. Errors now return opaque HTTP status codes with uniform JSON bodies, preventing sandbox ID enumeration and infrastructure probing. - Auth callback method forwarding (#315):
X-Request-Methodnow forwarded to auth callbacks alongsideX-Request-Path, enabling fine-grained (path + method) authorization. Previously, a read-only credential could access destructive endpoints on the same path. - envd command env propagation (#566): Fixed create-time environment variables being dropped when starting envd via
commands.run. - Sandbox preview route (#570): Fixed missing
/cube/sandbox/previewroute handler that causedcubemastercli tpl renderto silently fail. - One-click upgrade bundle integrity (#597): Fixed missing
scripts/common/directory in upgrade bundles that broke preflight validation scripts for the upgrade feature (#538). - Template creation Cube CA forwarding (#652): Fixed
with_cube_caparameter not being forwarded when creating templates, ensuring clients can control whether the CubeEgress root CA is baked into template rootfs. - CubeEgress compute CA refresh (#614): CubeEgress on compute nodes now refreshes its MITM CA from CubeMaster, fixing CA mismatch after master rotation.
- Template info backfill (#594): Backfilled missing
created_atandimage_infofields in template detail responses. - Snapshot delete job cleanup (#559): Cleaned orphaned job rows after snapshot deletion, preventing stale build job references.
- Cross-replica node sync (#542): Periodic reload goroutine started for nodemeta cross-replica synchronization, fixing stale node state in multi-replica CubeMaster deployments.
- BPF inner map BTF key/value (#595): CubeVS inner maps now created with BTF key/value, fixing BPF map compatibility on newer kernels.
- One-click install root enforce (#649): Configurable install root paths removed; install prefix safety assertion hardened to prevent accidental system directory wipes.
- One-click MIRROR persistence (#622): MIRROR env now persisted to
.one-click.envfor consistent image registry selection across restarts. - One-click same-CIDR reinstall (#586): Distinguishes same-CIDR reinstall from CIDR change, uses systemd to stop services cleanly.
- One-click compute quickcheck race (#637): Post-start compute node checks now tolerant of transient startup races.
- One-click DATABASE_URL persistence (#611):
DATABASE_URLnow persisted for local MySQL in install and systemd start paths. - Python SDK streaming body (377a99d): Request bodies buffered before copying in
IPOverrideTransport, fixing multipart upload failures. - CubeProxy log cleanup (#593): Dead log fields and faulty-backend stubs removed from CubeProxy.
- TencentCloud deployer semver helpers (#587):
semver_compareandversion_ltfunctions added for correct upgrade decision logic.
📚 Documentation
- ARM64 deployment guides (64931ff): All deployment guides (quickstart, bare-metal, dev environment, multi-node, self-build) updated in EN + ZH with architecture-specific instructions. ARM64 PVM limitation documented.
- Network hardening guide (#663): Bilingual operational security guide covering control-plane attack surface, binding strategies, auth callback configuration, and TLS/credential guidance.
- TencentCloud Terraform deploy guide (ae81121): Full deployment guide (EN + ZH) for the Terraform cluster deployer.
- Snapshot/clone/rollback deep-dive (#680): Technical deep-dive blog post on snapshot, clone, and rollback mechanisms (EN + ZH).
- Sandbox logs guide (#692): New
cubecli logsusage guide with examples. - Multi-node scheduler scoring (#672): Guidance on configuring multi-node scheduler scoring for balanced workload distribution.
- Host mount permission fixes (#560): Documentation explaining host mount permission handling.
- v0.4.0 release blog posts (#585): Release announcement and agent-friendly-service posts (EN + ZH).
- Network deep-dive blog (#627): Technical deep-dive on CubeSandbox networking (EN + ZH).
- README improvements (#640, #666, #646): Product highlights, v0.4 showcase, benchmark report links, architecture diagram update, homepage tagline refinement.
- Documentation optimization (#665): Cross-documentation link fixes, sidebar navigation for snapshot-rollback-clone, localized docs flow preservation (#668, #664).
- Install guide links (#475, #466): Installation guide callouts in benchmark posts, troubleshooting links in install error messages.
⚙️ Engineering Improvements
- Multi-arch CI (#720): Builder image, VMLinux, and one-click release workflows all support amd64 + arm64 with per-architecture artifacts and multi-arch manifests.
- Python SDK publish workflow (#700): GitHub Actions workflow for PyPI publishing triggered by
python-sdk-v*tags. Version cross-validation (tag vspyproject.tomlvs__init__.py), smart change detection to skip unchanged publishes, andtwine checkvalidation. - Cubelet reporting default (#722): Reporting interval default changed to 1s for more responsive metrics.
- CI release workflow hardening (#724):
--repoflag added togh releasecommands for correct repository targeting.