Common flakiness caused by `Failed to QGA guest-exec-status` #40

nrxus · 2023-11-16T20:41:47Z

We use vmtest to run tests in CI to check kernel compatibility with some eBPF programs. About 2 weeks ago we noticed that our CI was failing with:

Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)

At first I thought that maybe a naive "just retry" in QgaWrapper::guest_exec_status would fix it (show in this branch that I made). But the error still appeared when running the vmtest compiled by that branch.

I am more than happy to try to contribute a fix but I am pretty much at a loss on what is going on. I am assuming it is related to this recent PR: #27. This PR is from 6 weeks ago and we only noticed the error two weeks ago but we hadn't been touching our eBPF programs much lately and thus that CI hadn't been running so we wouldn't have noticed.

For context, this error occurs in both newer and older kernels in x86_64. (We test a few kernels starting from 4.14).
I don't think we do anything special in our tests when starting vmtest, but i do see that we run ip link set dev lo up prior to running our tests. I am not sure if related.

The text was updated successfully, but these errors were encountered:

danobi · 2023-11-16T21:13:04Z

Hi, thanks for the report. That sounds fairly mysterious - I haven't seen that error before.

Two things that would be useful to look at:

The patch you provided only retries once. Perhaps retry ~30 times (or some other large number) and see if it changes anything. Just to rule out transient errors.
Try to get qemu-ga to output logs to kmsg. There might be some hints there (not sure though). Right now it looks like it logs to stderr inside the vm -- effectively oblivion. Would be useful in general to log to kmsg so it makes it out to vmtest output. See:

vmtest/src/init/init.sh

Line 99 in df1abca

qemu-ga --method=virtio-serial --path="$vport"

nrxus · 2023-11-17T17:56:15Z

The retry 30 times didn't do much but it looks like it gave the system more time to spew extra information. I am not yet sure how to do your second suggestion there.

Booting the kernel.
Booting the kernel.
[    0.000000] Linux version 5.4.243-kernel-test (root@6bc7a2b45a6d) (gcc version 11.2.1 20220219 (Alpine 11.2.1_git20220219)) #1 SMP Tue Jun 20 17:31:20 UTC 2
[    0.000000] Linux version 5.4.243-kernel-test (root@6bc7a2b45a6d) (gcc version 11.2.1 20220219 (Alpine 11.2.1_git20220219)) #1 SMP Tue Jun 20 17:31:20 UTC 2023
[    0.000000] Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 rw earlyprintk=serial,0,115200 printk.devkmsg=on console=0,115200 l
[    0.205446] PM: Registered nosave memory: [mem 0xbffe0000-0xbfffffff]
[    0.205657] PM: Registered nosave memory: [mem 0xc0000000-0xfffbffff]
[    0.205865] PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
[    0.206363] [mem 0xc0000000-0xfffbffff] available for PCI devices
[    0.207360] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    1.214217] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1
[    1.222660] percpu: Embedded 56 pages/cpu s190552 r8192 d30632 u1048576
[    1.225527] Built 1 zonelists, mobility grouping on.  Total pages: 1032041
[    1.225678] Policy zone: Normal
[    1.225932] Kernel command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 rw earlyprintk=serial,0,115200 printk.devkmsg=on console=0,1
===> Setting up VM
===> Running command
[    8.831513] ip (82) used greatest stack depth: 12416 bytes left
[   16.453919] process 'ebpf-programs-t' launched '/bin/test' with NULL argv: empty string added
[   54.440202] rcu: INFO: rcu_sched self-detected stall on CPU
[   54.440202] rcu:     1-...!: (20246 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[   54.440202]  (t=21002 jiffies g=11069 q=25)
[   54.440202] rcu: rcu_sched kthread starved for 21002 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[   54.440202] rcu: RCU grace-period kthread stack dump:
[   54.440202] rcu_sched       R  running task    14904    10      2 0x90004000
[   54.440202] Call Trace:
[   54.440202]  __schedule+0x237/0x610
[   54.440202]  ? __mod_timer+0x19d/0x3c0
[   54.440202]  schedule+0x39/0xa0
[   54.440202]  schedule_timeout+0x89/0x150
[   54.440202]  ? __next_timer_interrupt+0xd0/0xd0
[   54.440202]  rcu_gp_kthread+0x4f9/0xd50
[   54.440202]  ? kfree_call_rcu+0x10/0x10
[   54.440202]  kthread+0x117/0x130
[   54.440202]  ? __kthread_bind_mask+0x60/0x60
[   54.440202]  ret_from_fork+0x35/0x40
[   54.440202] NMI backtrace for cpu 1
[   54.440202] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[   54.440202] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   54.440202] Call Trace:
[   54.440202]  <IRQ>
[   54.440202]  dump_stack+0x50/0x63
[   54.440202]  nmi_cpu_backtrace.cold+0x14/0x53
[   54.440202]  ? lapic_can_unplug_cpu+0x70/0x70
[   54.440202]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[   54.440202]  rcu_dump_cpu_stacks+0x81/0xaf
[   54.440202]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[   54.440202]  update_process_times+0x5b/0x90
[   54.440202]  tick_sched_handle+0x34/0x50
[   54.440202]  tick_sched_timer+0x51/0xa0
[   54.440202]  ? can_stop_idle_tick+0x90/0x90
[   54.440202]  __hrtimer_run_queues+0x128/0x2a0
[   54.440202]  hrtimer_interrupt+0x110/0x2c0
[   54.440202]  smp_apic_timer_interrupt+0x66/0x130
[   54.440202]  apic_timer_interrupt+0xf/0x20
[   54.440202]  </IRQ>
[   54.440202]  ? syscall_slow_exit_work+0x57/0x140
[   54.440202]  ? do_syscall_64+0xe4/0x110
[   54.440202]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  117.443201] rcu: INFO: rcu_sched self-detected stall on CPU
[  117.443201] rcu:     1-...!: (82231 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[  117.443201]  (t=84004 jiffies g=11069 q=25)
[  117.443201] rcu: rcu_sched kthread starved for 84005 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[  117.443201] rcu: RCU grace-period kthread stack dump:
[  117.443201] rcu_sched       R  running task    14904    10      2 0x90004000
[  117.443201] Call Trace:
[  117.443201]  __schedule+0x237/0x610
[  117.443201]  ? __mod_timer+0x19d/0x3c0
[  117.443201]  schedule+0x39/0xa0
[  117.443201]  schedule_timeout+0x89/0x150
[  117.443201]  ? __next_timer_interrupt+0xd0/0xd0
[  117.443201]  rcu_gp_kthread+0x4f9/0xd50
[  117.443201]  ? kfree_call_rcu+0x10/0x10
[  117.443201]  kthread+0x117/0x130
[  117.443201]  ? __kthread_bind_mask+0x60/0x60
[  117.443201]  ret_from_fork+0x35/0x40
[  117.443201] NMI backtrace for cpu 1
[  117.443201] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[  117.443201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  117.443201] Call Trace:
[  117.443201]  <IRQ>
[  117.443201]  dump_stack+0x50/0x63
[  117.443201]  nmi_cpu_backtrace.cold+0x14/0x53
[  117.443201]  ? lapic_can_unplug_cpu+0x70/0x70
[  117.443201]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[  117.443201]  rcu_dump_cpu_stacks+0x81/0xaf
[  117.443201]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[  117.443201]  update_process_times+0x5b/0x90
[  117.443201]  tick_sched_handle+0x34/0x50
[  117.443201]  tick_sched_timer+0x51/0xa0
[  117.443201]  ? can_stop_idle_tick+0x90/0x90
[  117.443201]  __hrtimer_run_queues+0x128/0x2a0
[  117.443201]  hrtimer_interrupt+0x110/0x2c0
[  117.443201]  smp_apic_timer_interrupt+0x66/0x130
[  117.443201]  apic_timer_interrupt+0xf/0x20
[  117.443201]  </IRQ>
[  117.443201]  ? syscall_slow_exit_work+0x57/0x140
[  117.443201]  ? do_syscall_64+0xe4/0x110
[  117.443201]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  180.446797] rcu: INFO: rcu_sched self-detected stall on CPU
[  180.446797] rcu:     1-...!: (144267 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[  180.446797]  (t=147015 jiffies g=11069 q=25)
[  180.446797] rcu: rcu_sched kthread starved for 147015 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[  180.446797] rcu: RCU grace-period kthread stack dump:
[  180.446797] rcu_sched       R  running task    14904    10      2 0x90004000
[  180.446797] Call Trace:
[  180.446797]  __schedule+0x237/0x610
[  180.446797]  ? __mod_timer+0x19d/0x3c0
[  180.446797]  schedule+0x39/0xa0
[  180.446797]  schedule_timeout+0x89/0x150
[  180.446797]  ? __next_timer_interrupt+0xd0/0xd0
[  180.446797]  rcu_gp_kthread+0x4f9/0xd50
[  180.446797]  ? kfree_call_rcu+0x10/0x10
[  180.446797]  kthread+0x117/0x130
[  180.446797]  ? __kthread_bind_mask+0x60/0x60
[  180.446797]  ret_from_fork+0x35/0x40
[  180.446797] NMI backtrace for cpu 1
[  180.446797] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[  180.446797] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  180.446797] Call Trace:
[  180.446797]  <IRQ>
[  180.446797]  dump_stack+0x50/0x63
[  180.446797]  nmi_cpu_backtrace.cold+0x14/0x53
[  180.446797]  ? lapic_can_unplug_cpu+0x70/0x70
[  180.446797]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[  180.446797]  rcu_dump_cpu_stacks+0x81/0xaf
[  180.446797]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[  180.446797]  update_process_times+0x5b/0x90
[  180.446797]  tick_sched_handle+0x34/0x50
[  180.446797]  tick_sched_timer+0x51/0xa0
[  180.446797]  ? can_stop_idle_tick+0x90/0x90
[  180.446797]  __hrtimer_run_queues+0x128/0x2a0
[  180.446797]  hrtimer_interrupt+0x110/0x2c0
[  180.446797]  smp_apic_timer_interrupt+0x66/0x130
[  180.446797]  apic_timer_interrupt+0xf/0x20
[  180.446797]  </IRQ>
[  180.446797]  ? syscall_slow_exit_work+0x57/0x140
[  180.446797]  ? do_syscall_64+0xe4/0x110
[  180.446797]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)
FAILED

Based on this error I now wonder if the issue is actually in qemu (I don't think the kernel since it fails sporadically on different kernels). I am also wondering if I am just starving it from memory or CPU resources. I still find it suspicious it only began to fail for us on newer releases of vmtest, however. We have a separate CI workflow for testing in aarch that uses an older and forked version of vmtest and that one never hits any issues but that might be a coincidence and work because due to aarch vs x86 instead of due to the older vmtest 🤷🏾‍♂️

danobi · 2023-11-17T18:17:47Z

Looking at:

[  117.443201] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1

It looks like your tests may be triggering some kind of kernel issue. Or qemu issue. Looks like there's some hints about RCU stall detection here: https://docs.kernel.org/RCU/stallwarn.html . Either way it doesn't look too much like a vmtest issue.

I am not yet sure how to do your second suggestion there.

I think qemu-ga has this flag we can use:

l, --logfile     set logfile path, logs to stderr by default

So could be as simple as passing additional -l /dev/kmsg based on if /dev/kmsg exists.

nrxus · 2023-11-17T18:25:08Z

Fwiw, I downgraded to vmtest 0.8.1 in my CI tests and everything passed without any issues (normally I see at least one kernel fail out of 6 I test on). So while it is possible that the issue is in my tests, qemu, or the specific kernels I am testing on, it is something that wasn't being hit on older vmtest versions.

danobi · 2023-11-17T18:36:42Z

That's pretty odd. If you could bisect it down to the commit that broke it that would help a lot.

danobi · 2023-12-11T21:16:59Z

Fwiw I merged the --logfile bit in cffd94a

Now that we can run `vmtest` within a given chroot, we could host the chroot for a foreign architecture and run a vmtest in that chroot using a kernel matching that ar hitecture. ``` $ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release' Linux surya 6.5.0-14-generic danobi#14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux DISTRIB_ID=Ubuntu DISTRIB_RELEASE=23.10 DISTRIB_CODENAME=mantic DISTRIB_DESCRIPTION="Ubuntu 23.10" Finished dev [unoptimized + debuginfo] target(s) in 0.05s Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'` => Image.gz ===> Booting ===> Setting up VM ===> Running command Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty danobi#40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux DISTRIB_ID=Ubuntu DISTRIB_RELEASE=23.04 DISTRIB_CODENAME=lunar DISTRIB_DESCRIPTION="Ubuntu 23.04" ``` Signed-off-by: Manu Bretelle <chantr4@gmail.com>

Now that we can run `vmtest` within a given chroot, we could host the chroot for a foreign architecture and run a vmtest in that chroot using a kernel matching that ar hitecture. ``` $ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release' Linux surya 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux DISTRIB_ID=Ubuntu DISTRIB_RELEASE=23.10 DISTRIB_CODENAME=mantic DISTRIB_DESCRIPTION="Ubuntu 23.10" Finished dev [unoptimized + debuginfo] target(s) in 0.05s Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'` => Image.gz ===> Booting ===> Setting up VM ===> Running command Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty #40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux DISTRIB_ID=Ubuntu DISTRIB_RELEASE=23.04 DISTRIB_CODENAME=lunar DISTRIB_DESCRIPTION="Ubuntu 23.04" ``` Signed-off-by: Manu Bretelle <chantr4@gmail.com>

chantra · 2024-01-12T20:07:09Z

@mrxus do you have a repro you could share by any chance?

Fwiw, I downgraded to vmtest 0.8.1 in my CI tests and everything passed without any issues (normally I see at least one kernel fail out of 6 I test on).

It is unclear on which version is reproduces. Your fork is on top of v0.9.0, are v0.8.2 and v0.8.3 also affected?

I have seen this a few times during some of my tests. I believe this is likely due to resource exhaustion in the VM preventing.

For instance, I could repro with:

 $ vmtest -k kernels/bzImage-x86_64 "kill -STOP \$(pgrep qemu-ga) && while [ 1 ]; do echo hello; sleep 10; done"
=> bzImage-x86_64
===> Booting
===> Setting up VM
===> Running command
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)

other variants:

$ vmtest -k kernels/bzImage-x86_64 "s=1; while [ 1 ]; do echo sleeping \$s seconds; kill -STOP \$(pgrep qemu-ga); sleep \$s; kill -CONT \$(pgrep qemu-ga); s=\$((s+s)); done"
=> bzImage-x86_64
===> Booting
===> Setting up VM
===> Running command
sleeping 1 seconds
sleeping 2 seconds
sleeping 4 seconds
sleeping 8 seconds
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)
sleeping 16 seconds

maybe ionicing would help?

nrxus · 2024-01-15T05:44:35Z

Unfortunately I do not have a repro example I can share as I use it at my job so I cannot share that code and it is not open source. We use vmtest use this during CI for work to run tests that exercise eBPF programs on different kernels . The issue being caused due to it running out of resources sounds plausible although I haven't been able to find where that may be happening as it is only some times, on different kernels, and never when running the tests locally 😞

chantra · 2024-01-16T19:50:01Z

I looked into nice-ing qemu-ga but as a side effect, it would also set the command ran by qemu-ga with high priority, so some fiddling needs to happen.

Assuming coreutils are installed (which I am not sure we can assume in case of image builds....), this is what I am thinking of:
chantra@3bc8079

I will try to repro, but @nrxus if you could also try this patch and confirm whether or not it help, it would be appreciated.

chantra · 2024-01-16T19:51:21Z

At first I thought that maybe a naive "just retry" in QgaWrapper::guest_exec_status would fix it (show in this branch that I made). But the error still appeared when running the vmtest compiled by that branch.

When I tried that change with the example scripts I mentioned in #40 (comment) , it did help a bit though.

chantra · 2024-01-17T07:08:19Z

one way I have found to reproduce the issue is to saturate both the host (4 cores) and the vm (2 cpus) using the command:
stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 100s

Saturating only one of the host/vm does not reproduce.

Now, if I run the host's stress command with a niceness of 19 (instead of default 0), while the vm is sluggish, as seen when running something like:

"stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done"

the issue is not manifesting.

So basically, this seems to happen when both the host and the VM are overloaded.

guest_exec_status is returning the error after 5 second, which matches

vmtest/src/qga.rs

Line 80 in 3844d8b

qga_stream.set_read_timeout(Some(Duration::from_secs(5)))?;

.
With a combo of retry and longer timeout, the VM would also show the same stacktrace as in #40 (comment)

Bumping the read timeout like crazy in the QgaWrapper is likely not a good approach as it would impact every command ran in the VM, even the ones used to set up the VM, in which case we probably want to fail fast enough.
But the timeout can be updated specifically when running the command, by doing something like:

qga.set_read_timeout(Some(Duration::from_secs(30)))?;

in Qemu::run_command

or the retry approach, which same here, we may not want to do a lot of retries for every run_in_vm calls.

On the other hand, prior to #27, read_timeout was not set, so essentially it was blocking for ever. To get back to that state, it would be as simple as setting up the stream to block indefinitely after all the commands uses to set up the VMs are done.

Something similar to https://gist.github.com/chantra/c09534363519d429018b095dcbb18cdd . In more details, other commands are run in vm later, so we would probably want to set the stream to no-timeout only during that call.

@danobi any thoughts?

chantra · 2024-01-24T18:50:27Z

@danobi WDYT of making the run_command call blocking, like it was prior to #27?

Fixes danobi#40 When both the host and the VM are saturated, the VM may not be scheduled for a while, causing the connection to qga to timeout. Prior to danobi#27 the unix socket would block indefinitely. This change brings back this behaviour while we run a command, and set the timeout back to what it was before running the command. This way, we can give a chance to the host/vm to recover. Tested by running: ``` stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s ``` in the host, and ``` stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done ``` in the guest. Output showsa that the VM was struggling to run its loop and print the date until the stress tool was done: ``` ===> Setting up VM [ 1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio ===> Running command Thu Jan 25 01:43:12 PM PST 2024 stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd [ 5.493242] hrtimer: interrupt took 3387058 ns [ 89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns [ 89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped! Thu Jan 25 01:45:00 PM PST 2024 Thu Jan 25 01:45:09 PM PST 2024 Thu Jan 25 01:45:12 PM PST 2024 stress: info: [85] successful run completed in 124s Thu Jan 25 01:45:17 PM PST 2024 Thu Jan 25 01:45:19 PM PST 2024 Thu Jan 25 01:45:21 PM PST 2024 Thu Jan 25 01:45:23 PM PST 2024 Thu Jan 25 01:45:26 PM PST 2024 Thu Jan 25 01:45:28 PM PST 2024 ``` Signed-off-by: Manu Bretelle <chantr4@gmail.com>

chantra · 2024-01-25T21:48:52Z

Proposed change in #59 .

Fixes #40 When both the host and the VM are saturated, the VM may not be scheduled for a while, causing the connection to qga to timeout. Prior to #27 the unix socket would block indefinitely. This change brings back this behaviour while we run a command, and set the timeout back to what it was before running the command. This way, we can give a chance to the host/vm to recover. Tested by running: ``` stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s ``` in the host, and ``` stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done ``` in the guest. Output showsa that the VM was struggling to run its loop and print the date until the stress tool was done: ``` ===> Setting up VM [ 1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio ===> Running command Thu Jan 25 01:43:12 PM PST 2024 stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd [ 5.493242] hrtimer: interrupt took 3387058 ns [ 89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns [ 89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped! Thu Jan 25 01:45:00 PM PST 2024 Thu Jan 25 01:45:09 PM PST 2024 Thu Jan 25 01:45:12 PM PST 2024 stress: info: [85] successful run completed in 124s Thu Jan 25 01:45:17 PM PST 2024 Thu Jan 25 01:45:19 PM PST 2024 Thu Jan 25 01:45:21 PM PST 2024 Thu Jan 25 01:45:23 PM PST 2024 Thu Jan 25 01:45:26 PM PST 2024 Thu Jan 25 01:45:28 PM PST 2024 ``` Signed-off-by: Manu Bretelle <chantr4@gmail.com>

chantra mentioned this issue Jan 25, 2024

run_command: wait for command to run indefinitely #59

Merged

chantra mentioned this issue Jan 25, 2024

run-vmtest: action to test bpf using danobi/vmtest libbpf/ci#117

Merged

chantra closed this as completed in #59 Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common flakiness caused by `Failed to QGA guest-exec-status` #40

Common flakiness caused by `Failed to QGA guest-exec-status` #40

nrxus commented Nov 16, 2023

danobi commented Nov 16, 2023 •

edited

Loading

nrxus commented Nov 17, 2023 •

edited

Loading

danobi commented Nov 17, 2023

nrxus commented Nov 17, 2023

danobi commented Nov 17, 2023

danobi commented Dec 11, 2023

chantra commented Jan 12, 2024

nrxus commented Jan 15, 2024 •

edited

Loading

chantra commented Jan 16, 2024

chantra commented Jan 16, 2024

chantra commented Jan 17, 2024

chantra commented Jan 24, 2024

chantra commented Jan 25, 2024

Common flakiness caused by Failed to QGA guest-exec-status #40

Common flakiness caused by Failed to QGA guest-exec-status #40

Comments

nrxus commented Nov 16, 2023

danobi commented Nov 16, 2023 • edited Loading

nrxus commented Nov 17, 2023 • edited Loading

danobi commented Nov 17, 2023

nrxus commented Nov 17, 2023

danobi commented Nov 17, 2023

danobi commented Dec 11, 2023

chantra commented Jan 12, 2024

nrxus commented Jan 15, 2024 • edited Loading

chantra commented Jan 16, 2024

chantra commented Jan 16, 2024

chantra commented Jan 17, 2024

chantra commented Jan 24, 2024

chantra commented Jan 25, 2024

Common flakiness caused by `Failed to QGA guest-exec-status` #40

Common flakiness caused by `Failed to QGA guest-exec-status` #40

danobi commented Nov 16, 2023 •

edited

Loading

nrxus commented Nov 17, 2023 •

edited

Loading

nrxus commented Jan 15, 2024 •

edited

Loading