Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common flakiness caused by Failed to QGA guest-exec-status #40

Closed
nrxus opened this issue Nov 16, 2023 · 13 comments · Fixed by #59
Closed

Common flakiness caused by Failed to QGA guest-exec-status #40

nrxus opened this issue Nov 16, 2023 · 13 comments · Fixed by #59

Comments

@nrxus
Copy link

nrxus commented Nov 16, 2023

We use vmtest to run tests in CI to check kernel compatibility with some eBPF programs. About 2 weeks ago we noticed that our CI was failing with:

Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)

At first I thought that maybe a naive "just retry" in QgaWrapper::guest_exec_status would fix it (show in this branch that I made). But the error still appeared when running the vmtest compiled by that branch.

I am more than happy to try to contribute a fix but I am pretty much at a loss on what is going on. I am assuming it is related to this recent PR: #27. This PR is from 6 weeks ago and we only noticed the error two weeks ago but we hadn't been touching our eBPF programs much lately and thus that CI hadn't been running so we wouldn't have noticed.

For context, this error occurs in both newer and older kernels in x86_64. (We test a few kernels starting from 4.14).
I don't think we do anything special in our tests when starting vmtest, but i do see that we run ip link set dev lo up prior to running our tests. I am not sure if related.

@danobi
Copy link
Owner

danobi commented Nov 16, 2023

Hi, thanks for the report. That sounds fairly mysterious - I haven't seen that error before.

Two things that would be useful to look at:

  1. The patch you provided only retries once. Perhaps retry ~30 times (or some other large number) and see if it changes anything. Just to rule out transient errors.
  2. Try to get qemu-ga to output logs to kmsg. There might be some hints there (not sure though). Right now it looks like it logs to stderr inside the vm -- effectively oblivion. Would be useful in general to log to kmsg so it makes it out to vmtest output. See:
    qemu-ga --method=virtio-serial --path="$vport"

@nrxus
Copy link
Author

nrxus commented Nov 17, 2023

The retry 30 times didn't do much but it looks like it gave the system more time to spew extra information. I am not yet sure how to do your second suggestion there.

Booting the kernel.
Booting the kernel.
[    0.000000] Linux version 5.4.243-kernel-test (root@6bc7a2b45a6d) (gcc version 11.2.1 20220219 (Alpine 11.2.1_git20220219)) #1 SMP Tue Jun 20 17:31:20 UTC 2
[    0.000000] Linux version 5.4.243-kernel-test (root@6bc7a2b45a6d) (gcc version 11.2.1 20220219 (Alpine 11.2.1_git20220219)) #1 SMP Tue Jun 20 17:31:20 UTC 2023
[    0.000000] Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 rw earlyprintk=serial,0,115200 printk.devkmsg=on console=0,115200 l
[    0.205446] PM: Registered nosave memory: [mem 0xbffe0000-0xbfffffff]
[    0.205657] PM: Registered nosave memory: [mem 0xc0000000-0xfffbffff]
[    0.205865] PM: Registered nosave memory: [mem 0xfffc0000-0xffffffff]
[    0.206363] [mem 0xc0000000-0xfffbffff] available for PCI devices
[    0.207360] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    1.214217] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1
[    1.222660] percpu: Embedded 56 pages/cpu s190552 r8192 d30632 u1048576
[    1.225527] Built 1 zonelists, mobility grouping on.  Total pages: 1032041
[    1.225678] Policy zone: Normal
[    1.225932] Kernel command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 rw earlyprintk=serial,0,115200 printk.devkmsg=on console=0,1
===> Setting up VM
===> Running command
[    8.831513] ip (82) used greatest stack depth: 12416 bytes left
[   16.453919] process 'ebpf-programs-t' launched '/bin/test' with NULL argv: empty string added
[   54.440202] rcu: INFO: rcu_sched self-detected stall on CPU
[   54.440202] rcu:     1-...!: (20246 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[   54.440202]  (t=21002 jiffies g=11069 q=25)
[   54.440202] rcu: rcu_sched kthread starved for 21002 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[   54.440202] rcu: RCU grace-period kthread stack dump:
[   54.440202] rcu_sched       R  running task    14904    10      2 0x90004000
[   54.440202] Call Trace:
[   54.440202]  __schedule+0x237/0x610
[   54.440202]  ? __mod_timer+0x19d/0x3c0
[   54.440202]  schedule+0x39/0xa0
[   54.440202]  schedule_timeout+0x89/0x150
[   54.440202]  ? __next_timer_interrupt+0xd0/0xd0
[   54.440202]  rcu_gp_kthread+0x4f9/0xd50
[   54.440202]  ? kfree_call_rcu+0x10/0x10
[   54.440202]  kthread+0x117/0x130
[   54.440202]  ? __kthread_bind_mask+0x60/0x60
[   54.440202]  ret_from_fork+0x35/0x40
[   54.440202] NMI backtrace for cpu 1
[   54.440202] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[   54.440202] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   54.440202] Call Trace:
[   54.440202]  <IRQ>
[   54.440202]  dump_stack+0x50/0x63
[   54.440202]  nmi_cpu_backtrace.cold+0x14/0x53
[   54.440202]  ? lapic_can_unplug_cpu+0x70/0x70
[   54.440202]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[   54.440202]  rcu_dump_cpu_stacks+0x81/0xaf
[   54.440202]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[   54.440202]  update_process_times+0x5b/0x90
[   54.440202]  tick_sched_handle+0x34/0x50
[   54.440202]  tick_sched_timer+0x51/0xa0
[   54.440202]  ? can_stop_idle_tick+0x90/0x90
[   54.440202]  __hrtimer_run_queues+0x128/0x2a0
[   54.440202]  hrtimer_interrupt+0x110/0x2c0
[   54.440202]  smp_apic_timer_interrupt+0x66/0x130
[   54.440202]  apic_timer_interrupt+0xf/0x20
[   54.440202]  </IRQ>
[   54.440202]  ? syscall_slow_exit_work+0x57/0x140
[   54.440202]  ? do_syscall_64+0xe4/0x110
[   54.440202]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  117.443201] rcu: INFO: rcu_sched self-detected stall on CPU
[  117.443201] rcu:     1-...!: (82231 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[  117.443201]  (t=84004 jiffies g=11069 q=25)
[  117.443201] rcu: rcu_sched kthread starved for 84005 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[  117.443201] rcu: RCU grace-period kthread stack dump:
[  117.443201] rcu_sched       R  running task    14904    10      2 0x90004000
[  117.443201] Call Trace:
[  117.443201]  __schedule+0x237/0x610
[  117.443201]  ? __mod_timer+0x19d/0x3c0
[  117.443201]  schedule+0x39/0xa0
[  117.443201]  schedule_timeout+0x89/0x150
[  117.443201]  ? __next_timer_interrupt+0xd0/0xd0
[  117.443201]  rcu_gp_kthread+0x4f9/0xd50
[  117.443201]  ? kfree_call_rcu+0x10/0x10
[  117.443201]  kthread+0x117/0x130
[  117.443201]  ? __kthread_bind_mask+0x60/0x60
[  117.443201]  ret_from_fork+0x35/0x40
[  117.443201] NMI backtrace for cpu 1
[  117.443201] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[  117.443201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  117.443201] Call Trace:
[  117.443201]  <IRQ>
[  117.443201]  dump_stack+0x50/0x63
[  117.443201]  nmi_cpu_backtrace.cold+0x14/0x53
[  117.443201]  ? lapic_can_unplug_cpu+0x70/0x70
[  117.443201]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[  117.443201]  rcu_dump_cpu_stacks+0x81/0xaf
[  117.443201]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[  117.443201]  update_process_times+0x5b/0x90
[  117.443201]  tick_sched_handle+0x34/0x50
[  117.443201]  tick_sched_timer+0x51/0xa0
[  117.443201]  ? can_stop_idle_tick+0x90/0x90
[  117.443201]  __hrtimer_run_queues+0x128/0x2a0
[  117.443201]  hrtimer_interrupt+0x110/0x2c0
[  117.443201]  smp_apic_timer_interrupt+0x66/0x130
[  117.443201]  apic_timer_interrupt+0xf/0x20
[  117.443201]  </IRQ>
[  117.443201]  ? syscall_slow_exit_work+0x57/0x140
[  117.443201]  ? do_syscall_64+0xe4/0x110
[  117.443201]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  180.446797] rcu: INFO: rcu_sched self-detected stall on CPU
[  180.446797] rcu:     1-...!: (144267 ticks this GP) idle=172/1/0x4000000000000002 softirq=5430/5430 fqs=0
[  180.446797]  (t=147015 jiffies g=11069 q=25)
[  180.446797] rcu: rcu_sched kthread starved for 147015 jiffies! g11069 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[  180.446797] rcu: RCU grace-period kthread stack dump:
[  180.446797] rcu_sched       R  running task    14904    10      2 0x90004000
[  180.446797] Call Trace:
[  180.446797]  __schedule+0x237/0x610
[  180.446797]  ? __mod_timer+0x19d/0x3c0
[  180.446797]  schedule+0x39/0xa0
[  180.446797]  schedule_timeout+0x89/0x150
[  180.446797]  ? __next_timer_interrupt+0xd0/0xd0
[  180.446797]  rcu_gp_kthread+0x4f9/0xd50
[  180.446797]  ? kfree_call_rcu+0x10/0x10
[  180.446797]  kthread+0x117/0x130
[  180.446797]  ? __kthread_bind_mask+0x60/0x60
[  180.446797]  ret_from_fork+0x35/0x40
[  180.446797] NMI backtrace for cpu 1
[  180.446797] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1
[  180.446797] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  180.446797] Call Trace:
[  180.446797]  <IRQ>
[  180.446797]  dump_stack+0x50/0x63
[  180.446797]  nmi_cpu_backtrace.cold+0x14/0x53
[  180.446797]  ? lapic_can_unplug_cpu+0x70/0x70
[  180.446797]  nmi_trigger_cpumask_backtrace+0x7c/0x90
[  180.446797]  rcu_dump_cpu_stacks+0x81/0xaf
[  180.446797]  rcu_sched_clock_irq.cold+0x1b3/0x39e
[  180.446797]  update_process_times+0x5b/0x90
[  180.446797]  tick_sched_handle+0x34/0x50
[  180.446797]  tick_sched_timer+0x51/0xa0
[  180.446797]  ? can_stop_idle_tick+0x90/0x90
[  180.446797]  __hrtimer_run_queues+0x128/0x2a0
[  180.446797]  hrtimer_interrupt+0x110/0x2c0
[  180.446797]  smp_apic_timer_interrupt+0x66/0x130
[  180.446797]  apic_timer_interrupt+0xf/0x20
[  180.446797]  </IRQ>
[  180.446797]  ? syscall_slow_exit_work+0x57/0x140
[  180.446797]  ? do_syscall_64+0xe4/0x110
[  180.446797]  ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)
FAILED

Based on this error I now wonder if the issue is actually in qemu (I don't think the kernel since it fails sporadically on different kernels). I am also wondering if I am just starving it from memory or CPU resources. I still find it suspicious it only began to fail for us on newer releases of vmtest, however. We have a separate CI workflow for testing in aarch that uses an older and forked version of vmtest and that one never hits any issues but that might be a coincidence and work because due to aarch vs x86 instead of due to the older vmtest 🤷🏾‍♂️

@danobi
Copy link
Owner

danobi commented Nov 17, 2023

Looking at:

[  117.443201] CPU: 1 PID: 88 Comm: ebpf-programs-t Not tainted 5.4.243-kernel-test #1

It looks like your tests may be triggering some kind of kernel issue. Or qemu issue. Looks like there's some hints about RCU stall detection here: https://docs.kernel.org/RCU/stallwarn.html . Either way it doesn't look too much like a vmtest issue.

I am not yet sure how to do your second suggestion there.

I think qemu-ga has this flag we can use:

l, --logfile     set logfile path, logs to stderr by default

So could be as simple as passing additional -l /dev/kmsg based on if /dev/kmsg exists.

@nrxus
Copy link
Author

nrxus commented Nov 17, 2023

Fwiw, I downgraded to vmtest 0.8.1 in my CI tests and everything passed without any issues (normally I see at least one kernel fail out of 6 I test on). So while it is possible that the issue is in my tests, qemu, or the specific kernels I am testing on, it is something that wasn't being hit on older vmtest versions.

@danobi
Copy link
Owner

danobi commented Nov 17, 2023

That's pretty odd. If you could bisect it down to the commit that broke it that would help a lot.

@danobi
Copy link
Owner

danobi commented Dec 11, 2023

Fwiw I merged the --logfile bit in cffd94a

chantra added a commit to chantra/danobi-vmtest that referenced this issue Jan 9, 2024
Now that we can run `vmtest` within a given chroot, we could host the chroot
for a foreign architecture and run a vmtest in that chroot using a kernel matching
that ar hitecture.

```
$ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'
Linux surya 6.5.0-14-generic danobi#14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'`
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty danobi#40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
chantra added a commit to chantra/danobi-vmtest that referenced this issue Jan 9, 2024
Now that we can run `vmtest` within a given chroot, we could host the chroot
for a foreign architecture and run a vmtest in that chroot using a kernel matching
that ar hitecture.

```
$ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'
Linux surya 6.5.0-14-generic danobi#14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'`
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty danobi#40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
chantra added a commit to chantra/danobi-vmtest that referenced this issue Jan 9, 2024
Now that we can run `vmtest` within a given chroot, we could host the chroot
for a foreign architecture and run a vmtest in that chroot using a kernel matching
that ar hitecture.

```
$ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'
Linux surya 6.5.0-14-generic danobi#14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'`
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty danobi#40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
chantra added a commit to chantra/danobi-vmtest that referenced this issue Jan 9, 2024
Now that we can run `vmtest` within a given chroot, we could host the chroot
for a foreign architecture and run a vmtest in that chroot using a kernel matching
that ar hitecture.

```
$ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'
Linux surya 6.5.0-14-generic danobi#14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'`
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty danobi#40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
danobi pushed a commit that referenced this issue Jan 9, 2024
Now that we can run `vmtest` within a given chroot, we could host the chroot
for a foreign architecture and run a vmtest in that chroot using a kernel matching
that ar hitecture.

```
$ uname -a && cat /etc/lsb-release && RUST_LOG=debug cargo run -- -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'
Linux surya 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/vmtest -a aarch64 -k arm64/kbuild-output/arch/arm64/boot/Image.gz -r ../rootfs/ubuntu-lunar-arm64/ 'uname -a && cat /etc/lsb-release'`
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
Linux (none) 6.6.0-rc5-ga4a0c99f10ca-dirty #40 SMP Thu Oct 26 18:28:11 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
@chantra
Copy link
Collaborator

chantra commented Jan 12, 2024

@mrxus do you have a repro you could share by any chance?

Fwiw, I downgraded to vmtest 0.8.1 in my CI tests and everything passed without any issues (normally I see at least one kernel fail out of 6 I test on).

It is unclear on which version is reproduces. Your fork is on top of v0.9.0, are v0.8.2 and v0.8.3 also affected?

I have seen this a few times during some of my tests. I believe this is likely due to resource exhaustion in the VM preventing.

For instance, I could repro with:

 $ vmtest -k kernels/bzImage-x86_64 "kill -STOP \$(pgrep qemu-ga) && while [ 1 ]; do echo hello; sleep 10; done"
=> bzImage-x86_64
===> Booting
===> Setting up VM
===> Running command
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)

other variants:

$ vmtest -k kernels/bzImage-x86_64 "s=1; while [ 1 ]; do echo sleeping \$s seconds; kill -STOP \$(pgrep qemu-ga); sleep \$s; kill -CONT \$(pgrep qemu-ga); s=\$((s+s)); done"
=> bzImage-x86_64
===> Booting
===> Setting up VM
===> Running command
sleeping 1 seconds
sleeping 2 seconds
sleeping 4 seconds
sleeping 8 seconds
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Resource temporarily unavailable (os error 11)
    3: Resource temporarily unavailable (os error 11)
sleeping 16 seconds

maybe ionicing would help?

@nrxus
Copy link
Author

nrxus commented Jan 15, 2024

Unfortunately I do not have a repro example I can share as I use it at my job so I cannot share that code and it is not open source. We use vmtest use this during CI for work to run tests that exercise eBPF programs on different kernels . The issue being caused due to it running out of resources sounds plausible although I haven't been able to find where that may be happening as it is only some times, on different kernels, and never when running the tests locally 😞

@chantra
Copy link
Collaborator

chantra commented Jan 16, 2024

I looked into nice-ing qemu-ga but as a side effect, it would also set the command ran by qemu-ga with high priority, so some fiddling needs to happen.

Assuming coreutils are installed (which I am not sure we can assume in case of image builds....), this is what I am thinking of:
chantra@3bc8079

I will try to repro, but @nrxus if you could also try this patch and confirm whether or not it help, it would be appreciated.

@chantra
Copy link
Collaborator

chantra commented Jan 16, 2024

At first I thought that maybe a naive "just retry" in QgaWrapper::guest_exec_status would fix it (show in this branch that I made). But the error still appeared when running the vmtest compiled by that branch.

When I tried that change with the example scripts I mentioned in #40 (comment) , it did help a bit though.

@chantra
Copy link
Collaborator

chantra commented Jan 17, 2024

one way I have found to reproduce the issue is to saturate both the host (4 cores) and the vm (2 cpus) using the command:
stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 100s

Saturating only one of the host/vm does not reproduce.

Now, if I run the host's stress command with a niceness of 19 (instead of default 0), while the vm is sluggish, as seen when running something like:

"stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done"

the issue is not manifesting.

So basically, this seems to happen when both the host and the VM are overloaded.

guest_exec_status is returning the error after 5 second, which matches

qga_stream.set_read_timeout(Some(Duration::from_secs(5)))?;
.
With a combo of retry and longer timeout, the VM would also show the same stacktrace as in #40 (comment)

Bumping the read timeout like crazy in the QgaWrapper is likely not a good approach as it would impact every command ran in the VM, even the ones used to set up the VM, in which case we probably want to fail fast enough.
But the timeout can be updated specifically when running the command, by doing something like:

qga.set_read_timeout(Some(Duration::from_secs(30)))?;

in Qemu::run_command

or the retry approach, which same here, we may not want to do a lot of retries for every run_in_vm calls.

On the other hand, prior to #27, read_timeout was not set, so essentially it was blocking for ever. To get back to that state, it would be as simple as setting up the stream to block indefinitely after all the commands uses to set up the VMs are done.

Something similar to https://gist.github.com/chantra/c09534363519d429018b095dcbb18cdd . In more details, other commands are run in vm later, so we would probably want to set the stream to no-timeout only during that call.

@danobi any thoughts?

@chantra
Copy link
Collaborator

chantra commented Jan 24, 2024

@danobi WDYT of making the run_command call blocking, like it was prior to #27?

chantra added a commit to chantra/danobi-vmtest that referenced this issue Jan 25, 2024
Fixes danobi#40

When both the host and the VM are saturated, the VM may not be scheduled for a
while, causing the connection to qga to timeout.
Prior to danobi#27 the unix socket would block indefinitely.

This change brings back this behaviour while we run a command, and set the
timeout back to what it was before running the command.
This way, we can give a chance to the host/vm to recover.

Tested by running:
```
stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s
```
in the host, and
```
stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done
```
in the guest.

Output showsa that the VM was struggling to run its loop and print the date until
the stress tool was done:

```
===> Setting up VM
[    1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio
===> Running command
Thu Jan 25 01:43:12 PM PST 2024
stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd
[    5.493242] hrtimer: interrupt took 3387058 ns
[   89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns
[   89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped!
Thu Jan 25 01:45:00 PM PST 2024
Thu Jan 25 01:45:09 PM PST 2024
Thu Jan 25 01:45:12 PM PST 2024
stress: info: [85] successful run completed in 124s
Thu Jan 25 01:45:17 PM PST 2024
Thu Jan 25 01:45:19 PM PST 2024
Thu Jan 25 01:45:21 PM PST 2024
Thu Jan 25 01:45:23 PM PST 2024
Thu Jan 25 01:45:26 PM PST 2024
Thu Jan 25 01:45:28 PM PST 2024
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
@chantra
Copy link
Collaborator

chantra commented Jan 25, 2024

Proposed change in #59 .

chantra added a commit that referenced this issue Jan 26, 2024
Fixes #40

When both the host and the VM are saturated, the VM may not be scheduled for a
while, causing the connection to qga to timeout.
Prior to #27 the unix socket would block indefinitely.

This change brings back this behaviour while we run a command, and set the
timeout back to what it was before running the command.
This way, we can give a chance to the host/vm to recover.

Tested by running:
```
stress --cpu 256 --io 256 --vm 4 --vm-bytes 1024M --timeout 1000s
```
in the host, and
```
stress --cpu 512 --io 512 --vm 4 --vm-bytes 1024M --timeout 100s & while true; do date; sleep 2; done
```
in the guest.

Output showsa that the VM was struggling to run its loop and print the date until
the stress tool was done:

```
===> Setting up VM
[    1.840931] 9pnet: Limiting 'msize' to 512000 as this is the maximum supported by transport virtio
===> Running command
Thu Jan 25 01:43:12 PM PST 2024
stress: info: [85] dispatching hogs: 512 cpu, 512 io, 4 vm, 0 hdd
[    5.493242] hrtimer: interrupt took 3387058 ns
[   89.331664] clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 31697570ns
[   89.465647] clocksource: wd-tsc-wd read-back delay of 2998850ns, clock-skew test skipped!
Thu Jan 25 01:45:00 PM PST 2024
Thu Jan 25 01:45:09 PM PST 2024
Thu Jan 25 01:45:12 PM PST 2024
stress: info: [85] successful run completed in 124s
Thu Jan 25 01:45:17 PM PST 2024
Thu Jan 25 01:45:19 PM PST 2024
Thu Jan 25 01:45:21 PM PST 2024
Thu Jan 25 01:45:23 PM PST 2024
Thu Jan 25 01:45:26 PM PST 2024
Thu Jan 25 01:45:28 PM PST 2024
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants