Is it necessary to support both gVisor and MicroVM simultaneously? #387

jamesxia20181001 · 2026-05-28T08:55:17Z

jamesxia20181001
May 28, 2026

CubeSandbox adopts MicroVM for sandbox virtualization and isolation. Recently, Google open-sourced Agent-Substrate, which leverages gVisor. Combined with idle suspend (persisting memory snapshots to object storage) and runtime resume (cross-node restoration supported), this solution claims to deliver 30x+ oversubscription by scheduling a large number of stateful workloads onto a small pool of shared physical pods. Google further built the distributed agent runtime Agent Executor (AX) on top of Agent-Substrate.

According to the benchmark report agent-infra/runtime-benchmark, gVisor achieves a significantly higher deployment density compared with MicroVM (represented by Kata Containers).

Memory Overhead per Sandbox Pod

Runtime	Container RSS (MiB)	Runtime Overhead (MiB)	Total (MiB)
runc	85	~0	85
gVisor (runsc)	85	35–50	120–135
Kata Containers	85	120–160	205–245

CPU Overhead per Idle Sandbox Pod (millicores)

Runtime	App CPU	Runtime CPU	Total
runc	5 m	< 1 m	~5 m
gVisor (runsc)	5 m	8–15 m	13–20 m
Kata Containers	5 m	40–80 m	45–85 m

Node Packing Density (n2-standard-8, 32 GiB)

Runtime	Max Concurrent Sandboxes	Resource Utilization at Full Load
runc	~300	97%
gVisor (runsc)	~200	90%
Kata Containers	~100	85%

Agents are evolving toward an architecture of stateless harness + session-based sandbox. The agent harness becomes stateless, standardized and lightweight, and is only responsible for routing, scheduling, session management and access control. Since this component requires very few system calls and simple dependencies, gVisor is fully capable of supporting its operation.

For tool sandboxes running common lightweight workloads such as simple scripts, basic commands and general SDKs, gVisor provides better cost advantages over MicroVM. MicroVM is only required as a safety fallback for high-risk untrusted code, complex toolchains, workloads relying on full Linux system calls/kernel features, or applications running on heterogeneous operating systems.

If the above assessment holds true, supporting gVisor and MicroVM in the long term and enabling them to share the same compute node pool would be a reasonable architecture decision.

kinwin-ustc · 2026-05-28T09:04:04Z

kinwin-ustc
May 28, 2026
Maintainer

We have made a lot of performance optimizations for the cloud hypervisor, and we will not be weaker than gvisor on the various indicators mentioned above. Moreover, many of our capabilities and the customized MicroVM Hypervisor are closely related, so I think there is no essential connection between the points you raised and whether it is gvisor

0 replies

jamesxia20181001 · 2026-05-28T09:20:11Z

jamesxia20181001
May 28, 2026
Author

Thank you for your prompt reply!
As stated in the CubeSandbox README, the memory footprint below 5 MB only accounts for the overhead of the Hypervisor itself, excluding resource consumption of the MicroVM Guest OS. Based on technical principles and previous benchmark results, the memory overhead of the MicroVM Guest OS is significantly higher than that of the gVisor user-space kernel.
We would like to clarify whether the performance optimizations you mentioned also cover memory usage improvements for the Guest OS running inside MicroVM.
Thank you for your clarification.

0 replies

kinwin-ustc · 2026-05-28T09:37:09Z

kinwin-ustc
May 28, 2026
Maintainer

Our built-in guest OS and guest rootfs have been highly trimmed. In addition, based on the principle of our snapshot startup, all sandbox memory comes from the mmap of snapshots on the host machine, with CoW. The vast majority of the guest kernel's memory is read-only, which means that all sandbox kernel code segments based on the same template on the same host machine are globally shared, so I believe that in the kernel of the sandbox, we will not be weaker than gvisor

0 replies

jamesxia20181001 · 2026-05-29T02:38:08Z

jamesxia20181001
May 29, 2026
Author

I conducted a simple benchmark test using the e2b-code-interpreter workload. The results show that gvisor consumes far less memory than CubeSandbox. When running 100 instances, its used memory is only 18% of that of CubeSandbox. Could you help analyze the root causes? Thank you!

Test Environment

Workload: e2b-code-interpreter
Configuration: 2 vCPUs, 1 GB memory

Versions

Name	Version
CubeSandbox	V0.1.2
gvisor	release-20260316.0

We measured the incremental used memory with the free -m command. The statistics are listed below:

Isolation Method	1 Instance	10 Instances	100 Instances
gvisor	105	654	6131
CubeSandbox	335	3341	34458

0 replies

kinwin-ustc · 2026-05-29T05:17:36Z

kinwin-ustc
May 29, 2026
Maintainer

In our single-machine one-click deployment configuration, too many components are installed, including nginx, redis, mysql, etc. These control plane components also consume additional memory due to the creation of sandboxes each time. In actual deployment, these control plane components should be deployed on separate machines. A more accurate way to view the sandbox memory consumption is to check /proc/sandbox pid/numa_maps, as described in issue #272

0 replies

jamesxia20181001 · 2026-05-30T01:43:48Z

jamesxia20181001
May 30, 2026
Author

We measure memory consumption by tracking the incremental memory usage as the number of instances increases. We added comparative tests for the work node only scenario. The results show that the overhead on the control plane is negligible for deployments with up to 100 instances. Overall, there remains a notable performance gap compared to gVisor.

Isolation Method	1 Instance	10 Instances	100 Instances
gvisor	105	654	6131
CubeSandbox Work & Management node	335	3341	34458
CubeSandbox Work node only	338	3400	33760

0 replies

kinwin-ustc · 2026-05-30T02:41:22Z

kinwin-ustc
May 30, 2026
Maintainer

Can you help me check where the additional memory is mainly distributed? In our internal PVM tests, the memory consumption of 100 sandboxes is about 2G, and in previous external user tests, a single machine producing 5000 sandboxes consumed 154G of memory. Let's see if it's related to the kernel version or the environment. From a technical perspective, the kernel of each sandbox is globally shared, which does not bring additional memory overhead.

0 replies

jamesxia20181001 · 2026-05-30T10:10:11Z

jamesxia20181001
May 30, 2026
Author

When creating a template using the sandbox-code image, the following parameters were applied:

cubemastercli tpl create-from-image \
  --image cube-sandbox-cn.tencentcloudcr.com/cube-sandbox/sandbox-code:latest \
  --writable-layer-size 1G

The expose-port and probe parameters were not specified, and the test results are shown below:

Isolation Mode	1 Instance	10 Instances	100 Instances
gvisor	105	654	6131
cubesandbox	335	3341	34458

After rebuilding the template with the additional parameters below:

cubemastercli tpl create-from-image \
  --image cube-sandbox-cn.tencentcloudcr.com/cube-sandbox/sandbox-code:latest \
  --writable-layer-size 1G \
  --expose-port 49999 \
  --expose-port 49983 \
  --probe 49999

The updated test results are as follows:

Isolation Mode	1 Instance	10 Instances	100 Instances
gvisor	105	654	6131
cubesandbox	44	429	4312

The memory footprint is now significantly optimized. As you mentioned, it even consumes less memory than gVisor.

There is one remaining question: why do the expose-port and probe parameters bring such substantial performance improvements? Could you please explain this? Thanks a lot!

0 replies

kinwin-ustc · 2026-05-30T11:37:56Z

kinwin-ustc
May 30, 2026
Maintainer

The probe determines when a memory snapshot is taken. Without probe, a snapshot is taken immediately after kernel startup. With probe, a snapshot is taken only after the network on the corresponding port is accessible. Without probe, the application has an additional initialization process, which generates additional memory pages through Copy-on-Write (CoW).

0 replies

jamesxia20181001 · 2026-06-01T12:13:35Z

jamesxia20181001
Jun 1, 2026
Author

Thanks a lot for the detailed explanation on the earlier memory Copy-on-Write (CoW) issue.

As shown in the table below, I further compared the I/O performance of several virtualization runtimes. The results indicate that CubeSandbox delivers noticeably lower performance on network I/O and mixed network & storage I/O compared with gVisor and kata-fc. Could you help analyze the root causes?

Test Methodology

The tests were conducted on two physical machines with identical configurations, with only a single sandbox running on each, so performance degradation caused by over-provisioning can be ruled out.
Network Performance Test: Inside the sandbox, run an iperf client to communicate with an iperf3 server deployed on another virtual machine within the same network.
Mixed Network & Storage I/O Test: Upload and download 1GB large files to/from MinIO object storage via mc cp.

Test Results

Test Item	Test Method	Baremetal 1			Baremetal 2
Test Item	Test Method	HostOS	CubeSandbox MVM hostdir volume	CubeSandbox MVM WR layer	HostOS	gVisor	kata-fc
Upload	iperf3 -c ip	1.06GB/s	266MB/s	269MB/s	1.1GB/s	925MB/s	1.04GB/s
Upload	mc cp file minio/test	266MB/s	187MB/s	159MB/s	300MB/s	298MB/s	276MB/s
Download	iperf3 -c ip -R	1.09GB/s	1.56MB/s	1.63MB/s	1.1GB/s	783MB/s	1.06GB/s
Download	mc cp minio/test/file .	443MB/s	6.61MB/s	9.6MB/s	614MB/s	537MB/s	439MB/s

0 replies

kinwin-ustc · 2026-06-01T12:46:17Z

kinwin-ustc
Jun 1, 2026
Maintainer

Could you please help me paste the feature results of the sandbox internal and host machine network cards? (ethtool -K dev), it should be encountering some bug, from a technical perspective, our IO virtualization implementation performance should not be weaker than FC

0 replies

chenhengqi · 2026-06-01T14:30:21Z

chenhengqi
Jun 1, 2026
Maintainer

The results indicate that CubeSandbox delivers noticeably lower performance on network I/O and mixed network & storage I/O compared with gVisor and kata-fc. Could you help analyze the root causes?

@jamesxia20181001 Thanks for your report. Please provide outputs of the following command to help us diagnose the issue.

uname -r
ethtool -i iface
ethtool -k iface

0 replies

jamesxia20181001 · 2026-06-02T02:19:39Z

jamesxia20181001
Jun 2, 2026
Author

Thanks for your prompt reply. Below is the collected information; please help analyze it, thank you!

CubeSandbox MicroVM's guestos network information

ethtool -i eth0

version: 1.0.0
firmware-version:
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

ethtool -k eth0

rx-checksumming: on [fixed]
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off [fixed]
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

CubeSandbox hostos kernel and network information

The network is configured with VLAN and Bonding;Bond1 aggregates ens6f0 and ens6f1;The VLAN interface is bond1.2.

uname -a

Linux baremetal-145 6.8.0-86-generic #87~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 29 09:48:07 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

ethtool -i bond1.2

driver: 802.1Q VLAN Support
version: 1.8
firmware-version: N/A
expansion-rom-version:
bus-info:
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

ethtool -i bond1

driver: bonding
version: 6.8.0-86-generic
firmware-version: 2
expansion-rom-version:
bus-info:
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

ethtool -i ens6f0

driver: i40e
version: 6.8.0-86-generic
firmware-version: 6.01 0x8000354e 1.1747.0
expansion-rom-version:
bus-info: 0000:5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ethtool -i ens6f1

driver: i40e
version: 6.8.0-86-generic
firmware-version: 6.01 0x8000354e 1.1747.0
expansion-rom-version:
bus-info: 0000:5e:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ethtool -k bond1.2

Features for bond1.2:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [requested on]
tx-checksum-sctp: off [requested on]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: on
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: on
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: on
fcoe-mtu: off [requested on]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

ethtool -k bond1

Features for bond1:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: on
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [requested on]
tx-esp-segmentation: off
tx-udp-segmentation: on
tx-gso-list: off [requested on]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on [fixed]
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: on
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off
esp-tx-csum-hw-offload: off
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

ethtool -k ens6f0

Features for ens6f0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]```

ethtool -k ens6f1

Features for ens6f1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

2 replies

chenhengqi Jun 2, 2026
Maintainer

I suspect that GRO on is the culprit.

The installation script determine which primary NIC to use:

CubeSandbox/deploy/one-click/install.sh

Lines 437 to 443 in f1378a3

    
           CUBE_SANDBOX_ETH_NAME="${CUBE_SANDBOX_ETH_NAME:-$(detect_primary_interface || true)}" 
        
           if [[ -n "${CUBE_SANDBOX_ETH_NAME}" ]]; then 
        
             export CUBE_SANDBOX_ETH_NAME 
        
             log "using primary network interface: ${CUBE_SANDBOX_ETH_NAME}" 
        
           else 
        
             log "primary network interface not detected; keeping packaged Cubelet eth_name" 
        
           fi

Network Agent disables GRO for it:

CubeSandbox/network-agent/internal/service/local_service.go

Lines 80 to 83 in f1378a3

    
           err = disableGRO(cfg.EthName) 
        
           if err != nil { 
        
           	CubeLog.WithContext(context.Background()).Warnf("network-agent failed to disable GRO on %s: %v", cfg.EthName, err) 
        
           }

It seems like one of these steps are failing to do its job.

chenhengqi Jun 2, 2026
Maintainer

Please check which NIC is used by CubeSandbox and whether GRO is on.

jamesxia20181001 · 2026-06-02T06:44:38Z

jamesxia20181001
Jun 2, 2026
Author

The network is deployed with VLAN and Bonding: Bond1 bundles ens6f0 and ens6f1, with its VLAN sub-interface bond1.2 serving as the primary network adapter for CubeSandbox.

Per installation logs and network port details, GRO has been successfully disabled on bond1.2, confirmed by zero relevant error entries in both installation logs and network-agent logs:

bond1.2: generic-receive-offload: off

However, GRO remains enabled on all underlying physical/bonded ports relied upon by bond1.2:

bond1: generic-receive-offload: on
ens6f0: generic-receive-offload: on
ens6f1: generic-receive-offload: on

After manually turning off GRO across all these ports, we ran another round of benchmark tests. The dataset marked CubeSandbox GRO Disabled represents metrics collected with GRO fully disabled on all ports, showing notable network performance improvement against the original state.

Even with this optimization applied, CubeSandbox still delivers remarkably lower network throughput compared with gVisor and Kata-FC. Could you please help investigate the root cause? Thanks.

Test Data

Test Item	Test Method	baremetal1			baremetal2
Test Item	Test Method	HostOS	CubeSandbox GRO Disabled	CubeSandbox	HostOS	gvisor	kata-fc
Upload	iperf3 -c ip	1.06GB/s	269MB/s	269MB/s	1.1GB/s	925MB/s	1.04GB/s
Upload	mc cp file minio/test	266MB/s	220MB/s	159MB/s	300MB/s	298MB/s	276MB/s
Download	iperf3 -c ip -R	1.09GB/s	600MB/s	1.63MB/s	1.1GB/s	783MB/s	1.06GB/s
Download	mc cp minio/test/file .	443MB/s	250MB/s	9.6MB/s	614MB/s	537MB/s	439MB/s

1 reply

chenhengqi Jun 2, 2026
Maintainer

Thanks. How do you setup your gvisor and kata-fc? I need to reproduce this locally.

jamesxia20181001 · 2026-06-03T08:37:19Z

jamesxia20181001
Jun 3, 2026
Author

To eliminate performance bias introduced by heterogeneous hardware environments, CubeSandbox was deployed on the same bare-metal server hosting gVisor and Kata-fc, and upgraded to the latest v0.3.0 release for validation.

Environment Details

OS Information

Name	Version
OS	OpenEuler 23.09
Architecture	x86
Kernel	6.4.0-10.1.0.20.oe2309.x86_64

Component Version Information

Component	Version
gVisor	release-20260316.0
Kata	3.28.0
Kubernetes	v1.28.2
containerd	v2.0.5
CNI	v1.3.0
Firecracker	1.12.1
CubeSandbox	0.3.0

Network Information

gVisor and Kata-fc are integrated into the Kubernetes cluster with out-of-box default configurations. Flannel is used as the default CNI plugin. The uplink consists of VLAN subinterfaces created on two physical NICs aggregated via bond, and GRO is enabled by default.

Prior to CubeSandbox benchmarking, the Kubernetes cluster was shut down, followed by manual disabling of GRO across all network interfaces before running CubeSandbox performance measurements.

Test Results

The aggregated benchmark metrics are listed in the table below. Test data indicates notable performance improvements for the upgraded CubeSandbox; nevertheless, it still falls behind gVisor and Kata-fc on certain workloads such as the iperf3 -c ip test. We appreciate your support in troubleshooting the root causes.

Test Item	Test Method	OpenEuler23.09
Test Item	Test Method	HOSTOS	CubeSandbox	gVisor	kata-fc
Upload	iperf3 -c ip	1.1GB/s	476MB/s	925MB/s	1.04GB/s
Upload	mc cp file minio/test	300MB/s	237MB/s	298MB/s	276MB/s
Download	iperf3 -c ip -R	1.1GB/s	775MB/s	783MB/s	1.06GB/s
Download	mc cp minio/test/file .	614MB/s	512MB/s	537MB/s	439MB/s

1 reply

chenhengqi Jun 5, 2026
Maintainer

Hi, I just opened #469, could you please give it a try? Thanks.

kinwin-ustc · 2026-06-03T10:44:23Z

kinwin-ustc
Jun 3, 2026
Maintainer

Enabling the TSO for the network card in the sandbox should be very helpful for the upload iperf case. Additionally, we have fixed some issues related to checksum, and the host network card can enable GRO, which should also be very helpful for the download iperf case

0 replies

hualing15 · 2026-06-04T07:15:15Z

hualing15
Jun 4, 2026

We tried enabling TSO and GRO, but neither worked.

1.TSO cannot be enabled within the sandbox; relevant details are provided below:

ethtool -K eth0 tso on
Actual changes:
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp-mangleid-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
Could not change any device features

2.An attempt was made to enable GRO, yet the download speed remains capped at 10 MB/s. Partial packet capture data for both enabled and disabled GRO states is listed as follows.

Enabled GRO

11:39:33.540233 IP 169.254.68.6.38126 > 192.168.2.38.cslistener: Flags [.], ack 13247873, win 763, options [nop,nop,TS val 684967114 ecr 370948812], length 0
11:39:33.540275 IP 192.168.2.38.cslistener > 169.254.68.6.38126: Flags [.], seq 13247873:13249121, ack 146, win 16652, options [nop,nop,TS val 370948812 ecr 684967114], length 1248
11:39:33.540331 IP 192.168.2.38.cslistener > 169.254.68.6.38126: Flags [P.], seq 13249121:13250369, ack 146, win 16652, options [nop,nop,TS val 370948812 ecr 684967114], length 1248
11:39:33.540355 IP 169.254.68.6.38126 > 192.168.2.38.cslistener: Flags [.], ack 13250369, win 763, options [nop,nop,TS val 684967114 ecr 370948812], length 0
11:39:33.540401 IP 192.168.2.38.cslistener > 169.254.68.6.38126: Flags [.], seq 13250369:13251617, ack 146, win 16652, options [nop,nop,TS val 370948813 ecr 684967114], length 1248
11:39:33.540461 IP 192.168.2.38.cslistener > 169.254.68.6.38126: Flags [.], seq 13251617:13252865, ack 146, win 16652, options [nop,nop,TS val 370948813 ecr 684967114], length 1248

Disabled GRO

11:51:14.872406 IP 192.168.2.38.cslistener > 169.254.68.6.50320: Flags [P.], seq 1760033:1761281, ack 146, win 16652, options [nop,nop,TS val 371650146 ecr 1521095002], length 1248
11:51:14.872411 IP 192.168.2.38.cslistener > 169.254.68.6.50320: Flags [.], seq 1761281:1762529, ack 146, win 16652, options [nop,nop,TS val 371650146 ecr 1521095002], length 1248
11:51:14.872412 IP 192.168.2.38.cslistener > 169.254.68.6.50320: Flags [P.], seq 1762529:1763777, ack 146, win 16652, options [nop,nop,TS val 371650146 ecr 1521095002], length 1248
11:51:14.872478 IP 169.254.68.6.50320 > 192.168.2.38.cslistener: Flags [.], ack 1700129, win 17202, options [nop,nop,TS val 1521095002 ecr 371650146], length 0
11:51:14.872536 IP 192.168.2.38.cslistener > 169.254.68.6.50320: Flags [.], seq 1763777:1765025, ack 146, win 16652, options [nop,nop,TS val 371650147 ecr 1521095002], length 1248
11:51:14.872549 IP 169.254.68.6.50320 > 192.168.2.38.cslistener: Flags [.], ack 1761281, win 17226, options [nop,nop,TS val 1521095003 ecr 371650146], length 0
11:51:14.872596 IP 192.168.2.38.cslistener > 169.254.68.6.50320: Flags [.], seq 1765025:1766273, ack 146, win 16652, options [nop,nop,TS val 371650147 ecr 1521095002], length 1248

1 reply

kinwin-ustc Jun 4, 2026
Maintainer

We have not merged the related work in enabling the tso/gro function. After we merge these works, you can try again.

hualing15 · 2026-06-04T08:41:29Z

hualing15
Jun 4, 2026

So when do you plan to merge this feature, or in which release version?

1 reply

chenhengqi Jun 5, 2026
Maintainer

Hi, I just opened #469, could you please give it a try? Thanks.

Is it necessary to support both gVisor and MicroVM simultaneously? #387

Uh oh!

jamesxia20181001 May 28, 2026

Memory Overhead per Sandbox Pod

CPU Overhead per Idle Sandbox Pod (millicores)

Node Packing Density (n2-standard-8, 32 GiB)

Replies: 18 comments · 6 replies

Uh oh!

kinwin-ustc May 28, 2026 Maintainer

Uh oh!

jamesxia20181001 May 28, 2026 Author

Uh oh!

kinwin-ustc May 28, 2026 Maintainer

Uh oh!

jamesxia20181001 May 29, 2026 Author

Uh oh!

Uh oh!

kinwin-ustc May 29, 2026 Maintainer

Uh oh!

jamesxia20181001 May 30, 2026 Author

Uh oh!

kinwin-ustc May 30, 2026 Maintainer

Uh oh!

jamesxia20181001 May 30, 2026 Author

Uh oh!

kinwin-ustc May 30, 2026 Maintainer

Uh oh!

jamesxia20181001 Jun 1, 2026 Author

Test Methodology

Test Results

Uh oh!

Uh oh!

kinwin-ustc Jun 1, 2026 Maintainer

Uh oh!

chenhengqi Jun 1, 2026 Maintainer

Uh oh!

jamesxia20181001 Jun 2, 2026 Author

CubeSandbox MicroVM's guestos network information

ethtool -i eth0

ethtool -k eth0

CubeSandbox hostos kernel and network information

uname -a

ethtool -i bond1.2

ethtool -i bond1

ethtool -i ens6f0

ethtool -i ens6f1

ethtool -k bond1.2

ethtool -k bond1

ethtool -k ens6f0

ethtool -k ens6f1

Uh oh!

chenhengqi Jun 2, 2026 Maintainer

Uh oh!

chenhengqi Jun 2, 2026 Maintainer

Uh oh!

jamesxia20181001 Jun 2, 2026 Author

Uh oh!

chenhengqi Jun 2, 2026 Maintainer

Uh oh!

jamesxia20181001 Jun 3, 2026 Author

Environment Details

OS Information

Component Version Information

Network Information

Test Results

Uh oh!

chenhengqi Jun 5, 2026 Maintainer

Uh oh!

kinwin-ustc Jun 3, 2026 Maintainer

Uh oh!

hualing15 Jun 4, 2026

Enabled GRO

Disabled GRO

Uh oh!

kinwin-ustc Jun 4, 2026 Maintainer

Uh oh!

hualing15 Jun 4, 2026

Uh oh!

chenhengqi Jun 5, 2026 Maintainer

jamesxia20181001
May 28, 2026

Replies: 18 comments 6 replies

kinwin-ustc
May 28, 2026
Maintainer

jamesxia20181001
May 28, 2026
Author

kinwin-ustc
May 28, 2026
Maintainer

jamesxia20181001
May 29, 2026
Author

kinwin-ustc
May 29, 2026
Maintainer

jamesxia20181001
May 30, 2026
Author

kinwin-ustc
May 30, 2026
Maintainer

jamesxia20181001
May 30, 2026
Author

kinwin-ustc
May 30, 2026
Maintainer

jamesxia20181001
Jun 1, 2026
Author

kinwin-ustc
Jun 1, 2026
Maintainer

chenhengqi
Jun 1, 2026
Maintainer

jamesxia20181001
Jun 2, 2026
Author

chenhengqi Jun 2, 2026
Maintainer

chenhengqi Jun 2, 2026
Maintainer

jamesxia20181001
Jun 2, 2026
Author

chenhengqi Jun 2, 2026
Maintainer

jamesxia20181001
Jun 3, 2026
Author

chenhengqi Jun 5, 2026
Maintainer

kinwin-ustc
Jun 3, 2026
Maintainer

hualing15
Jun 4, 2026

kinwin-ustc Jun 4, 2026
Maintainer

hualing15
Jun 4, 2026

chenhengqi Jun 5, 2026
Maintainer