v0.5.0 #737
fslongjin
announced in
Announcements
v0.5.0
#737
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
2026.07.03 Release v0.5.0
CubeSandbox 0.5.0 introduces AutoPause/AutoResume, a platform-level sandbox lifecycle automation that transparently suspends idle sandboxes and resumes them on-demand on the next dataplane request. This release also delivers ARM64 (aarch64) native support across the entire stack — from hypervisor to CI/CD — a TencentCloud Terraform cluster deployer for production-grade one-click deployment, and network security hardening with per-sandbox traffic access tokens, CubeEgress fail-closed bootstrap, and policy-routing egress. Additional highlights include a pure-Go native rootfs export pipeline that bypasses Docker, skopeo, and umoci entirely, a snapshot runtime locking refactor to eliminate high-concurrency deadlocks, image uid/gid preservation fixes for non-root container images, E2B SDK alignment with complete filesystem and PTY APIs, and a one-click upgrade mode with three-way config merge. 116 commits from 26 contributors.
🎯 Major Features
AutoPause / AutoResume: Sandbox Lifecycle Automation
Sandboxes in agent workflows spend most of their time idle — waiting for user input, callbacks, or the next RL rollout cycle. AutoPause/AutoResume lets the platform automatically suspend idle sandboxes and instantly wake them on the next incoming request, releasing physical host resources during idle periods. This is implemented as a platform-side, per-sandbox capability with semantics aligned to the E2B
lifecycleparameter.CubeProxy/sidecar/) tracks sandbox activity vialast_activetimestamps reported by CubeProxy'slog_phase.lua. Whenidle >= timeout_seconds, the sidecar triggers a pause through CubeMaster → Cubelet, which snapshots the full VM state (memory + filesystem) to/data/cubelet/root/pausevm/<sandbox>, then shuts down the MicroVM. A configurableBootstrapWarmupwindow prevents premature pausing during sidecar startup.sandbox_state.luagate intercepts it and fires an internal sub-request to the sidecar's/internal/resume. The sidecar drives a resume RPC through CubeMaster → Cubelet → containerd, which restores the VM from the pause snapshot. The dataplane request blocks until resume completes (bound by nginxproxy_read_timeout). Concurrent resumes for the same sandbox are coalesced in-process (singleflight pattern); cross-replica coordination uses Redis SETNX locks.host.quota.paused_resource_release_ratio(float[0, 1], default0) controls how much CPU/memory quota paused sandboxes release back to the scheduler. At ratio1.0, all quota is released for maximum node density; at ratio0, paused sandboxes retain full quota (guaranteed resume). Before resuming, a local admission check verifies the node has capacity — if not, the resume is rejected with HTTP 409 and a precise capacity diagnostic.network.allow_public_traffic=falsereceive a per-sandboxtraffic_access_token(UUID v4). CubeProxy enforces this token on every inbound request (both cold-path and cache-hit), returning HTTP 403 for missing or mismatched tokens. Token values are redacted from all logs. Accepts bothe2b-traffic-access-tokenandcube-traffic-access-tokenheaders.on_timeout="kill"go through an idle timeout kill path withtask.Kill. NewPOST /cube/sandbox/timeoutandPOST /cube/sandbox/refreshAPIs exposeend_atfor deterministic lifecycle management. The CubeProxy gate returns410 Gonefor killing/killed sandboxes.New files:
CubeProxy/sidecar/(Go binary — sweeper, resumer, registry, stream consumer, last-active poller, Redis coordination); CubeProxy Lua modules (sandbox_state.lua,admin_phase.lua); CubeMaster lifecycle endpoints; Cubelet pause/resume RPCs.ARM64 (aarch64) Native Support
CubeSandbox now runs natively on ARM64 hosts, spanning the hypervisor, guest agent, shim, networking (BPF), build system, CI/CD, and deployment tooling. The work was a deep collaboration between Arm engineering and the Cube project team, progressing from feasibility to formal enablement.
mmio_busatLEGACY_SYS_CTRL_MAPPED_IO_START. KVM register access (get_one_reg) was updated for a changed upstream API signature. Seccomp rules were aligned for ARM64 syscall number differences (SYS_lstatvsSYS_fstatat/SYS_newfstatat). Live migration support remains x86_64-gated.ioperm()+ PIO port write (port0x680) to ARM64/dev/memmmap at physical address0x0903_0000(SysCtrl MMIO region) withptr::write_volatile. Build target auto-detected from host arch.console=ttyAMA0,115200(ARM PL011 UART) vsconsole=hvc0(virtio-console). x86-only mitigations (no_timer_check,noreplace-smp) gated to#[cfg(target_arch = "x86_64")]. Seccomp allow-lists aligned:SYS_mkdir→SYS_mkdiraton ARM64; added missingSYS_faccessat2for glibc path resolution on ARM64.-target amd64in BPF//go:generatedirectives replaced with-target $GOARCH. Per-architecturevmlinux.hheaders (amd64 + arm64 BTF dumps). BPF object files regenerated at build time; prebuilt.ofiles removed from git. Requires clang ≥ 14 (added to builder image viaapt.llvm.org).Dockerfile.builderparameterized withTARGETARCHfor Go, protoc, and Rust toolchain downloads. CubeEgress, CubeAPI, and envd images support multi-arch manifest lists. A containerizedmake guest-kerneltarget supports both native and cross builds with architecture-specific kernel configs (kernel-oc9.x86_64.config,kernel-oc9.aarch64.config).release_amd64andrelease_arm64jobs running on native runners; both upload to the same GitHub Release.run_vm.sh) auto-detects architecture — ARM64 usesmachine virtwith UEFI firmware (qemu-efi-aarch64). Release bundle script packages per-architecturemkcertbinaries.TencentCloud Terraform Cluster Deployer
A production-grade, one-click cluster deployment for TencentCloud driven entirely by Terraform IaC. From a single release bundle and
create.shentry point, the deployer provisions a full CubeSandbox cluster with managed control plane, HA middleware, and elastic compute nodes.10.0.0.0/16) with per-zone subnets, NAT Gateway with EIP, security groups, and a bastion jumpserver. Managed cloud services are automatically created:cubewith database-level privileges./data/CubeMaster/storage), mounted ReadWriteMany across replicas. Optional — enables multi-replica HA mode.cubemaster_replicas,cube_api_replicas,cube_proxy_replicas,cube_webui_replicasvariables (default 1 for POC mode). WhenTENCENTCLOUD_USE_CFS=true, cube-master multi-replica HA is enabled withcubemaster_replicasdriving bothspec.replicasand scheduler concurrency apportionment.install.sh --mode=upgradedetects existing installations, performs three-way.envconfig merge (new defaults + old customizations + explicit overrides), runs fail-fast preflight checks (disk space, semver compatibility, CIDR conflict), and backs up configuration before any destructive change. User customizations are preserved; secrets are redacted in the diff report but retained in the actual merged file.CUBE_EXTERNAL_MYSQL_*/CUBE_EXTERNAL_REDIS_*variables. Local Docker containers are masked, and all components (CubeMaster, CubeAPI, CubeProxy) are configured to use the external endpoints.CUBE_PROXY_HOST_PORTdeprecated, split intoCUBE_PROXY_HTTP_PORT/CUBE_PROXY_HTTPS_PORT(feat(one-click): deprecate CUBE_PROXY_HOST_PORT, split into HTTP/HTTPS ports #588). Manual SQL seed removed in favor of CubeMaster embedded migrations (refactor(one-click): remove manual SQL seed in favor of CubeMaster embedded migrations #628). CubeEgress integrated into compute node startup (feat(deploy): integrate cube-egress into one-click runtime scripts #707). Improved external dependency compatibility with shell-safe env persistence and redis-cli timeout detection (fix(one-click): improve external MySQL/Redis deployment compatibility #673). Cubelet reporting interval and scheduler scoring configured for multi-node TencentCloud deployments (b8f2242). Image defaults updated to v0.5.0 tags (Update TencentCloud deploy image defaults and provider mirror docs #719).New files:
deploy/one-click/terraform/tencentcloud/(create.sh, destroy.sh, main.tf, variables.tf, tke-addons.tf, outputs.tf, lib-state-sync.sh, env.example);docs/guide/tencentcloud-terraform-deploy.md(EN + ZH).Network Security Hardening
Three critical patches for sandbox network security: inbound access control, outbound fail-closed, and policy-routing egress.
traffic_access_tokenon every inbound request whenAllowPublicTraffic=false, returning 403 for missing/bad tokens. Token values are never logged.cube-routerkernel dummy device allows sandbox outbound traffic to enter the Linux host routing stack instead of being hard-redirected to the primary NIC. Traffic can leave through any routable device (eth0, eth1, GRE tunnels, VXLAN, WireGuard), enabling seamless integration with existing multi-NIC and VPN network infrastructure. Existing CubeEgress L7 policy, DNS allow-list, and port mapping are all preserved.snat_tcp()function was incorrectly usingBPF_F_PSEUDO_HDRwhen only the TCP port had changed, causing checksum corruption on multi-node deployments.from_worldTC filter attachment on the loopback interface, reducing unnecessary eBPF hook points.✨ Enhancements
SDK
list,stat,exists,remove,rename,mkdir, andwatch. Comprehensive integration tests included.create,connect,kill,send_stdin,resize— with streaming output viaPtyHandleiterator. Speaks envd's Connect-JSON RPC directly; no dependency on e2b Python packages.{transform: {headers: {...}}}credential injection shape, translated into CubeEgress L7action.injectrules. Drop-in replacement for codebases using E2B's per-host credential injection.Performance
VIRTIO_BLK_F_SEG_MAX(multi-segment requests) andVIRTIO_RING_F_INDIRECT_DESC(indirect descriptors) in the hypervisor's virtio-blk device, improving sequential write throughput from 2888 MiB/s to 3293 MiB/s (fio benchmark).Template Management
template watchandtemplate build-watchCLI commands provide an interactive bubbletea TUI showing live metrics (download speed, per-layer completion, step-by-step checklist), with plain-text fallback on non-TTY terminals.t_cube_artifact_node_placementtable for node-level artifact tracking independent of replica lifecycle. Periodic artifact GC with MySQLGET_LOCKfor HA coordination. Hardened cascade cleanup across CubeMaster, Cubelet, and CubeAPI. AgentHub snapshot cascade cleanup with path traversal protection.Web UI
AgentHub
decrypt_or_passthroughnow fails closed for undecryptable payloads.Deployment & Installer
--mode=upgradewith three-way env merge, pre-upgrade backup, and fail-fast preflights.CUBE_PROXY_HOST_PORTdeprecated;CUBE_PROXY_HTTP_PORT(default 80) andCUBE_PROXY_HTTPS_PORT(default 443) provide separate HTTP/HTTPS control.cube-api,cubemaster, andcubemastercliare now fully static binaries (CGO_ENABLED=0/ musl target), eliminating host glibc dependency and preventing version-skew failures.resolvectl default-routeon older systemd (pre-v240), preventing install failures on RHEL 8.3 and similar distributions.Infrastructure
pkg/base/rediskeywith consistent naming. Read/write pool separation dropped in favor of a single pool, simplifying multi-node deployment configuration.POST /nodes/{id}/labelsandDELETE /nodes/{id}/labelsendpoints for admin-managed labels. Kubernetes-compatible naming (DNS1123 subdomain prefix), SELECT FOR UPDATE race protection, 64-label limit, system-reserved namespace protection (kubernetes.io,beta.kubernetes.io,cube.cloud.tencentcloud.com).CUBEMASTER_HTTP_BINDmakes the HTTP listen address configurable, supporting private-NIC binding for network security hardening.real_time_weighted_averageplugin balancing mvm_num, local_create_num, cpu_usage, and quota_mem_usage.priority_select_numdynamically capped atmin(compute_node_count, 3).🐛 Bug Fixes
These fixes address issues present in v0.4.0:
t_cube_snapshot_runtime_activetable with sandbox-level and resource-level distributed locks. Eliminates MySQL 1213 deadlock errors.WithNoSameOwner()option, which was squashing all file uid/gid to the extracting user, breaking image ownership.umoci unpack --rootlessnow only passed when euid ≠ 0 (not when running as root). Docker-export fallback uses--same-owner --numeric-owner. Fixes Chromium profile write failures and CDP unreachability in browser sandboxes, and envd exec EACCES on/home/userfor Python images.X-Cube-Retcoderesponse header and$cube_retcodeaccess-log field that exposed internal failure-mode codes. Errors now return opaque HTTP status codes with uniform JSON bodies, preventing sandbox ID enumeration and infrastructure probing.X-Request-Methodnow forwarded to auth callbacks alongsideX-Request-Path, enabling fine-grained (path + method) authorization. Previously, a read-only credential could access destructive endpoints on the same path.commands.run./cube/sandbox/previewroute handler that causedcubemastercli tpl renderto silently fail.scripts/common/directory in upgrade bundles that broke preflight validation scripts for the upgrade feature (feat(one-click): add --mode upgrade with three-way env merge and pre-upgrade backup #538).with_cube_caparameter not being forwarded when creating templates, ensuring clients can control whether the CubeEgress root CA is baked into template rootfs.created_atandimage_infofields in template detail responses..one-click.envfor consistent image registry selection across restarts.DATABASE_URLnow persisted for local MySQL in install and systemd start paths.IPOverrideTransport, fixing multipart upload failures.semver_compareandversion_ltfunctions added for correct upgrade decision logic.📚 Documentation
cubecli logsusage guide with examples.⚙️ Engineering Improvements
python-sdk-v*tags. Version cross-validation (tag vspyproject.tomlvs__init__.py), smart change detection to skip unchanged publishes, andtwine checkvalidation.--repoflag added togh releasecommands for correct repository targeting.This discussion was created from the release v0.5.0.
Beta Was this translation helpful? Give feedback.
All reactions