2026.06.14 Release v0.4.0
CubeSandbox 0.4.0 introduces CubeEgress, an OpenResty-based security proxy that brings credential injection, domain filtering, and access auditing to sandbox egress traffic. This release also delivers container log forwarding with a new cubecli logs command, a node component version matrix with cluster-wide visibility, template replica compatibility checking, a daemonless template image build pipeline, and significant network performance improvements (35% faster network P50). The builder base image has been downgraded to ubuntu:20.04, lowering the minimum glibc requirement from 2.34 to 2.31 for broader distribution compatibility. 58 commits from 15 contributors.
🎯 Major Features
CubeEgress: Security Proxy
CubeEgress is a new OpenResty-based egress gateway that sits in the sandbox outbound traffic path via TPROXY, enforcing L7 policy before requests leave the cluster. It consists of ~2,200 lines of Lua across 9 modules running on OpenResty/nginx, plus Go-side integration in CubeMaster (CA provisioning, policy push), network-agent (TPROXY iptables rules), and Cubelet (per-sandbox routing, protobuf egress rule model).
- Credential injection (#518): Per-sandbox secrets are attached to outbound requests at the proxy layer via
EgressRule.inject— user code inside the sandbox never handles raw credentials. TheCubeNetworkConfigprotobuf message (formerlyCubeVSContext) now carries L7 egress rules with match conditions (SNI, host, method, path, scheme) and actions (allow/deny, audit, inject). Credential material is redacted as***REDACTED***in CubeMaster safe-log output (#520). - Domain filtering (#518): Policy-driven allow/deny lists gate which destinations a sandbox may reach, evaluated first-match-wins against the L7 request. DNS queries are permitted even when domain-based allow-out rules are set (38fe997).
- Access auditing (#518): Structured JSON logs of every egress request with optional body redaction via a
redactorLua module, enabling downstream compliance review. - Kernel 5.4 compatibility (38fe997): The security proxy runs on kernel v5.4+, expanding deployment coverage.
- CubeVS fast-path hardening (#527): SYN-only packets are now rejected in the port-mapping BPF fast path, preventing guest-initiated connection attempts from bypassing egress policy.
- TAP TX offload (#505): TX checksum/TSO offload and
tx-tcp-mangleid-segmentationare enabled on TAP devices so redirected packets skip GSO before reaching the guest. - CubeEgress version reporting (9d76195): CubeEgress participates in the node component version matrix with build-time version metadata injection, a
/admin/v1/healthendpoint extension, release manifest entries, and cubelet-side file-based collection.
New files: CubeEgress/ (20 files — Lua modules, nginx config, Dockerfile, iptables scripts, systemd units, CA generation); CubeMaster/pkg/service/httpservice/cube/ca_download.go; CubeMaster/pkg/templatecenter/cube_egress_ca/; CubeMaster/pkg/templatecenter/cube_egress_ca_bake.go; DB migration 0005_cube_egress.sql.
Container Log Forwarding
Container init-process stdout/stderr is now streamed from the agent to the shim via a dedicated vsock connection and appended to log files on the host. A new cubecli cubebox logs subcommand lets operators read these logs from outside the sandbox.
- Log streaming (#535): The shim injects a
cube.container.log_forwarding=trueannotation into the OCI spec, causing the agent to create stdout/stderr pipes (1 MiB buffer,O_NONBLOCK) for the init process. A dedicated vsock channel carries the log stream to the shim, which appends to/data/log/template/<id>/stdout|stderrduring template builds and to./stdout/./stderrin the bundle directory for normal sandboxes. Log forwarding is cleanly cancelled before pause/snapshot/teardown, and pipe write fds are closed on process exit so readers receive EOF (#541). Exec I/O relay (FIFO-based) is kept separate from init log forwarding. cubecli cubebox logs(#528): New subcommand to read container stdout/stderr from/data/cubelet/state/io.containerd.runtime.v2.task/default/<id>/stdout|stderr. Supports--tail N,--head N,--all, and--stderrflags. Since log files live inside the cubelet mount namespace, the command re-execs itself via the existing C constructor inpkg/cubemnt/nsenter.cto safely enter the namespace before any Go code runs. IncludesopenNoFollow()path validation hardened against symlink-following attacks.
Node Component Version Matrix
A new version tracking infrastructure gives operators cluster-wide visibility of component versions across all nodes, with a dedicated Web UI page.
- Version collection and matrix (#500): Cubelet collects component versions (guest-image, cube-agent, kernel, plus control-plane components from the release manifest) and reports them to CubeMaster, which maintains a version matrix in the
node_component_versiontable (DB migration0004). The matrix groups nodes by reported version for each component, surfaces version skew, and exposes summary and detail APIs through CubeAPI. - Standardized version injection (#493): All Go and Rust binaries now receive version, commit, and build-time metadata via ldflags /
build.rs. A machine-readablerelease-manifest.jsonis generated in one-click release bundles so every artifact is traceable to the same release. Thecubecli versionandcubemastercli versionoutput formats are unified across components. - Web UI Versions page (#500, #481): A new
Versions.tsxpage (762 lines) with i18n support (en/zh) shows per-component version distribution across nodes. The sidebar and Settings About section now display the actual release tag (injected at build time as__APP_VERSION__) instead of hardcoded versions.
New files: CubeMaster/pkg/nodemeta/versionmatrix.go; web/src/pages/Versions.tsx; web/src/locales/en/versions.json, zh/versions.json; DB migration 0004_node_component_version.sql.
Template Replica Compatibility
Template replicas are now checked against node component versions, with stale/missing replicas surfaced in both the API and Web UI.
- Compatibility matrix and version binding (#510): The template compatibility system compares each template's bound component versions (guest-image, cube-agent, kernel) against what each node currently reports. Results are stored in
template_versions(DB migration0006) and exposed via/templates/compat(summary) and/templates/compat/{id}(per-template detail). Version binding management lets operators pin a template to specific component versions at creation time. - Web UI (#545): The template detail page now shows per-replica compatibility badges, version delta between bound and current component versions, and a stale-replica warning banner with a rebuild trigger. New components:
CompatBadge,CompatSection,CompatWarning,CompatNodeCard,VersionDeltaList.
New files: CubeMaster/pkg/templatecenter/compat.go; CubeMaster/pkg/service/httpservice/cube/template_compat.go; DB migration 0006_template_replica_compat.sql.
Template Image Build Pipeline Overhaul
The template image build pipeline has been rearchitected to support daemonless operation via skopeo/umoci, with a 72% reduction in peak disk usage and file-level content deduplication.
- Daemonless export path (#492, #506): When skopeo and umoci are available on the CubeMaster node, template images are pulled via
skopeo copyinto a local OCI layout and unpacked withumoci unpack --rootless, eliminating the Docker daemon requirement. Falls back to Docker for backward compatibility. The export strategy is chosen once at image resolution time so preparation and export stay consistent. - Artifact management (#506): A new job runner orchestrates the full pipeline (image export → rootfs artifact build → distribution), with redo support that can resume from the last completed phase. File-level content fingerprints (SHA256) enable artifact deduplication across builds, and artifact cleanup is managed through a structured lifecycle. Redo operations now carry the correct template ID through working requests (#544).
- Disk usage optimization (#472): Peak disk usage during image-to-ext4 build is reduced from ~4.2× to ~1.2× image size through five complementary optimizations:
- Pipe-streamed export: Docker export stdout is connected directly to
tar -xfstdin via a 1 MiB pipe (F_SETPIPE_SZ), eliminating the intermediaterootfs.tarfile. - Early workDir cleanup: The scratch workDir is removed immediately after the rootfs reaches the store directory, before ext4 creation begins.
- Precise ext4 sizing: Power-of-2 alignment is replaced with a triple-overhead model (fixed 256 MiB + 10% of data + 1 KiB per file), aligned to 256 MiB boundaries.
- Direct-to-storeDir export: On local fast filesystems (detected via statfs magic), the rootfs is exported directly into the store directory, skipping the workDir→storeDir relocate step. NFS/CIFS fall back to the relocate path to avoid cross-device copies.
- Disk-space pre-check: A fail-fast statfs check on the store directory parent ensures sufficient space before the build starts, with a configurable safety margin (
CUBEMASTER_DISK_SPACE_SAFETY_MARGIN, default 1.5×).
SHA256 computation uses a 4 MiB buffer to reduce read syscalls. A loop-mount streaming ext4 build phase (gated behindCUBEMASTER_LOOP_MOUNT_EXT4_ENABLED, default false) is also implemented withCAP_SYS_ADMINdetection.
- Pipe-streamed export: Docker export stdout is connected directly to
- SDK alignment (#485): CubeAPI
POST /templatesand Python/Go SDKs now expose DNS, egress CIDRs, registry auth, command/args, network type, and node scope options, matching the fullcubemastercli template create-from-imageoption set.
New files: CubeMaster/pkg/templatecenter/image/ (export, ext4, disk, command, ref, source, types, paths, util); CubeMaster/pkg/templatecenter/artifact_build.go, artifact_cleanup.go, distribution.go, fingerprint.go, image_job_runner.go, job_constants.go, job_dto.go.
Network Performance
-
TAP fd acquisition optimization (#487): A three-tier
GetTapFilestrategy replaces the old single-path approach:- Fast path: When
state.tap.Fileis already cached, return it immediately (0 syscalls). - Hot path: For pooled taps with a closed fd, reopen with just 2 syscalls (
open+TUNSETIFF), skipping the expensiverestoreTapflow (netlink lookup,LinkSetUp,SetMTU, TC filter attach, ARP entry). - Recovery path: Fall back to full
restoreTaponly when there is no in-memory state or the tap is held externally.
The fdserver JSON response now includes the ifindex, allowing cubelet to skip its own
netlink.LinkByNamecall — eliminating a serialization point during concurrent sandbox creation. Cubelet falls back toLinkByNameonly when ifindex is 0 (backward-compatible with older agents).A TOCTOU race between
EnsureNetworkandReleaseNetworkis fixed by replacing singleflight-style dedup with a per-sandboxcreatingguard channel registered in the same critical section as the state check. Includes a pprof debug server (--pprof-listenflag) and 390 lines of concurrency tests (6 functions, 64-goroutine stress test clean under-race).Benchmarks (BMI5, Xeon Platinum 8255C, kernel 6.6.119): Network P50 35.3→23.1ms (35% faster), Network P99 86.6→51.2ms (41% faster), Total P50 106.1→92.0ms (13% faster), Throughput 194.8→209.8 sandboxes/s (8% higher).
- Fast path: When
-
BPF checksum optimization (#469):
bpf_csum_diff()is replaced withbpf_{l3,l4}_csum_replacehelpers in bothfrom_worldandfrom_cubeBPF programs. Combined with the TAP TX offload work (#505), this enables TSO/UFO/CSUM offloads to be re-enabled on virtio-net TAPs (reverting #110), and thedisableGRO()requirement on host NICs is dropped.
✨ Enhancements
Scheduling
- Configurable overcommit and Redis allocation bypass (#525): Two new scheduler configuration knobs:
overcommit_ratio(default CPU=3, Mem=2) with optional per-instance-type overrides viaovercommit_ratio_conf, andignore_redis_allocation(default false) to treat Redis-recorded allocations as zero. Applied consistently across filter and score plugins, with non-positive ratios clamped back to defaults. Physical load guards (CPU utilization ceiling, real-time free memory) are intentionally preserved.
Affinity
- Custom node affinity selector (#504, #467): The
com.nodeaffinity.selectorannotation now accepts arbitraryNodeSelectorRequirements(In, NotIn, Exists, DoesNotExist, Gt, Lt) as a JSON array of{key, operator, values}. Node labels from registration are carried throughNode.NodeLabels, merged intoLabels()with anatomic.Pointercache andInvalidateLabelsCache()for mutation safety. DoS hardening: max annotation size 4 KB, 10 selectors per request, 50 values per In/NotIn. Configurable allowed keys default to zone, cluster-id, cpu-type, memory-size, cpu-cores, instance-type. 872 lines of tests covering 47 cases.
Template Management
- tpl- prefix enforcement (#474): Template IDs are now always auto-generated with a
tpl-prefix across all creation paths (API, CLI, Web UI, sandbox commit). User-specified IDs are accepted for backward compatibility but silently ignored — the server always returns an auto-generatedtpl-prefixed ID as the authoritative template identifier. Validation rejects baretpl-/snap-prefixes and non-conforming annotation prefixes. - Builder image downgrade to ubuntu:20.04 (#468): The builder base image is changed from
ubuntu:22.04toubuntu:20.04, lowering the minimum glibc requirement from 2.34 to 2.31. AffectsDockerfile.builder, one-click installer preflight checks, CI workflows, and documentation.
Web UI
- Template policy display (#486): The template detail page now shows environment variables, network type, internet access, DNS servers, allow-out rules, and deny-out rules parsed from
createRequest. A dedicated "Network Policy" section includes per-rule copy buttons. ABoolBadgecomponent is extracted as a shared UI primitive. - CubeAPI container image (#513): A container build for the cube-api service produces a self-contained runtime image suitable for one-click and orchestrated deployments, with a lean build context.
SDK
- Python SDK v0.3.0 (#521): Bump to 0.3.0 with new APIs for security proxy configuration.
PVM
- Kernel LOCALVERSION rename (#511, #534): The PVM host and guest kernel
LOCALVERSIONis renamed to a clean descriptive scheme so the distribution base and host/guest role are obvious fromuname -r. Deployment configs, user-facing guides, and blog references are updated to match.
🐛 Bug Fixes
These fixes address issues present in v0.3.1:
- Virtiofs config skipped when shareDirs is empty (#533): Cubelet no longer generates virtiofs configuration or annotations when no shared directories are specified, preventing broken config generation.
- DNS server IP automatically added to AllowOut (#526): When any DNS rule is configured, the DNS server IP is now added to
AllowOutto ensure DNS resolution works through egress policy. Includes regression test coverage. - Cubelog nil trace panic (#512): Background workers and detached job contexts that run without a request trace no longer panic on nil dereference — trace handling is now tolerant of a missing trace.
- Storage symlink resolution in host-dir cleanup (#530):
cleanupHostDirVolumesnow resolves base-path symlinks when walking sandbox directories, so bind mounts under paths like/data → /mnt/ssd/dataare correctly identified and unmounted instead of leaking or having their backing directories wiped. - Network plugin bootstrap warnings (#491): Cubelet startup no longer logs valid network configuration keys as "unknown TOML fields" — the existing config struct is now reused when reading bootstrap overrides.
- DNS not auto-allowed when internet is disabled (#490): When
AllowInternetAccess=false, resolved DNS servers are no longer appended toallow_out, so the deny-all outbound policy consistently blocks DNS resolution. Fixes #408. - Ripgrep dependency removed from one-click runtime (#496): The one-click install and startup path no longer requires or auto-installs
ripgrep. Shell checks now use grep-based helpers. - Virtiofs migration_on_error set to GuestError (#482): The native virtiofs server now uses
MigrationOnError::GuestErrorinstead ofAbort. Per-inode failures during snapshot restore surface as guest FS errors (ENOENT/EIO) on the affected paths rather than tearing down the entire live migration. - VMM virtio-fs queue fault tolerance (#464):
process_queue_serial()no longer panics on malformed descriptors. Failures are recovered by writing an EIO FUSE error reply to the guest and continuing to serve the queue. A newdevice_memoryview is added for device-backed memory regions (virtio-pmem, virtio-fs DAX, ivshmem/zshm BARs). - Cgroup v2 manager creation (#488): The agent now uses the cgroup v2 creation path from
cgroups-rsand attaches container processes throughcgroup.procs, avoiding v1 controller name failures in unified cgroup mode. Process ID collection for cleanup and signals also reads fromcgroup.procs. - Node health expiry on stale heartbeat (#455): Node health is now derived from heartbeat freshness — stale heartbeats are correctly reported as unhealthy in nodemeta reads, localcache-backed reads, and scheduler prefilter. A shared helper centralizes the timeout rule across all three paths.
- SELinux context restore after one-click install (#471): File contexts under the install prefix are now restored before starting systemd services, fixing one-click installs on SELinux Enforcing hosts. Fixes #465.
- Glibc preflight pipefail race (#473): The
ldd --versionoutput is now fully captured before parsing, preventing strict-mode preflight checks from exiting on an expected SIGPIPE. - Python SDK streaming request body read (377a99d): Request bodies in
IPOverrideTransportare now buffered before copying, so multipart uploads no longer fail withRequestNotRead. - CLI help text corrections (#478): Fixed incorrect command names (e.g.,
cuebcli→cubecli), spelling mistakes, outdated deprecation hints, and truncated descriptions in bothcubecliandcubemastercli.
📚 Documentation
- DEB install instructions (#532): Added apt (DEB) install instructions alongside existing yum (RPM) steps for Python SDK setup in the Quick Start guide.
- Benchmark blog env var fixes (#497): Fixed benchmark setup examples that mixed environment variables from different client stacks — E2B variables for
e2b_code_interpreterexamples,CUBE_API_URL+ CubeProxy settings for CubeSandbox SDK examples. - CNCF Landscape badge (#477): Added CNCF Landscape badge and footer note to README in both English and Chinese.
- Template ID documentation cleanup (#476): Removed all
--template-idflags fromcreate-from-imagedocumentation and examples since template IDs are now auto-generated withtpl-prefix. - Install guide links in benchmark posts (#475): Added installation guide callouts to the §2.1 Hardware section of all four benchmark blog posts (EN + ZH, bare-metal + PVM).
- Troubleshooting links (#466): Added GitHub issue #311 troubleshooting URL to XFS filesystem check error messages in
install.sh,online-install.sh, andcheck-deps.sh. Updated install docs to use direct links to the Releases page. - CODEOWNERS (#522): Added CubeEgress maintainer entry.
⚙️ Engineering Improvements
- Build system reorganization (#529): Per-target
.PHONYdeclarations replace the single bulk list. A newclean-rust-target-dirstarget removestarget/under each top-level Rust workspace. Thealltarget is driven from a sharedBINARIESlist. - Format check CI (#524):
fmttargets are added to all component Makefiles (Go and Rust), with a new.github/workflows/fmt-check.ymlCI workflow that runs format checking on PRs. The agent'sfmttarget automatically generates required files (version.rs, protocol.rs) before formatting. - CI review-comment via stdin (#494): PR review comments are now passed via stdin (
--body-file -) instead of temp files, keeping review content out of the checkout directory. - CI auto-review comment reuse (#489): Automated review comments now update the bot's existing marked comment on repeated PR synchronizations instead of creating new top-level comments each time.
- Metric report jitter (#479): The Cubelet CLS metric report loop now adds random jitter (uniformly distributed between
[t, 1.5t]) to prevent thundering herd issues when multiple agents start concurrently.